Hubzilla RSS Bot

 Fri, 02 Feb 2024 07:21 UTC

Hubzilla RSS Bot
Image: CC BY 4.0 by cybrkyd

A self-challenge: make a Python bot to post to Hubzilla. Having left Mastodon, I needed to re-enable my Al Jazeera RSS feed bot.

The concept is simple enough: (1) get the RSS feed, (2) look for new items, (3) post only the latest items, and (4) make note of the time when the last post was made.

Hubzilla is a powerhouse and it is possible to (1) follow RSS feeds and (2) mirror those as posts in a channel. However, there is a warning about RSS feeds in Hubzilla which reads “Heavy system resource usage”. Having previously used Hubzilla on a shared hosting platform, I can confirm that it does add some strain to the system. I’m now on a VPS so I could quite easily enable this and be done with it but where is the fun in that? Let’s keep the strain off the application and get Python to do the heavy lifting.

The challenge with this came with the formatting and how Hubzilla parses links. In Hubzilla, a link is a URL link and will be displayed as such, with no expansion to retrieve the Open Graph data. So, you post and that’s what you end up with in the post.

When posting from the UI, Hubzilla will retrieve the Open Graph data at the time of post creation, not after the post has been made, the way I think this works in Mastodon and other ActivityPub applications. I had to therefore retrieve not just the new item in the RSS feed but also, fetch the associated image of the article and explicitly include this in the post sent to Hubzilla.

The code

The full code is below. There are three unique variables to be changed: rss_url, api_url and auth = which will be the channel’s username and password.

The text file rssbot_last_run.txt is used to store the date/time of the last post transmission. This file needs to be writable.

The script itself runs on a crontab job, once every 20 min. My cron looks like this:

*/20 * * * * cd /var/www/html/hubzilla-bot && python3 >/dev/null 2>&1

Finally, if there is more than one post to make (there will be at first run), there is a delay of 20 seconds set between posting.

#!/usr/bin/env python3

import time
from datetime import datetime
from dateutil import parser
import requests
import xml.etree.ElementTree as ET
from newspaper import Article
from urllib.parse import quote
import subprocess

# Setup variables
last_run_path = "./rssbot_last_run.txt"
time_format_code = '%a, %d %b %Y %X'
now_dt =
now_str = now_dt.strftime(time_format_code)
now_tim = time.mktime(now_dt.timetuple())
rss_url = ""

# Fetch the image URL from the article using newspaper3k
def fetch_image_url(article_link):
        article = Article(article_link)
        return article.top_image
    except Exception as e:
        print(f"Error fetching image URL: {e}")
        return ""

# Get last run date/time
    with open(last_run_path, "r") as myfile:
        data =
        if not data:
            # Set last run date on the first run if the file is empty
            with open(last_run_path, "w") as myfile:
                myfile.write("%s %s" % (now_str, LOCAL_TIMEZONE))
            print("Wrote %s" % (last_run_path))
            with open(last_run_path, "r") as myfile:
                data =
except Exception as e:
    print(f"Error reading last run file: {e}")
    data = "%s %s" % (now_str, LOCAL_TIMEZONE)

lr_dt = parser.parse(data)
lr_tim = time.mktime(lr_dt.timetuple())

print("LAST RUN: %s" % (lr_dt))
lrgr_entry_count = 0

# Get RSS feed and new entries
new_entries = []
response = requests.get(rss_url)
if response.status_code == 200:
    xml_data = response.text
    root = ET.fromstring(xml_data)

    for item in root.findall(".//item"):
        link = item.find("link").text
        title = item.find("title").text
        description = item.find("description").text

        # Check if entry is new
        pub_date_str = item.find("pubDate").text
        pub_date = parser.parse(pub_date_str)
        pub_tim = time.mktime(pub_date.timetuple())
        if pub_tim > lr_tim:
            lrgr_entry_count += 1
            print("New Entry: %s" % (title))
                "title": title,
                "link": link,
                "description": description

# New entries found
if len(new_entries) > 0:
    toots_attempted_count = 0
    for entry in new_entries:
        link = entry["link"]
        title = entry["title"]
        description = entry["description"]

        # Fetch the image URL from the linked article using newspaper3k
        image_url = fetch_image_url(link)

        # Make POST request using requests library
        api_url = ""  # Replace with your URL path
        payload = {
            "body": f"[url={link}] {title}[/url]<br>{description}<br><br>[img]{image_url}[/img]",
        auth = ("USER", "PASSWORD")  # Replace with your actual username and password
        headers = {"Content-Type": "application/x-www-form-urlencoded"}

        # Introduce a 20-second delay between posts

        response =, data=payload, auth=auth, headers=headers)
        if response.status_code == 200:
            print(f"Successfully posted: {title}")
            print(f"Error posting: {title} - Status Code: {response.status_code}")

# Save new last run if new entries
if lrgr_entry_count > 0:
    with open(last_run_path, "w") as myfile:
        myfile.write("%s %s" % (now_str, LOCAL_TIMEZONE))
    print("No New Entries")