FetchRSS version 2

Tue, 25 Nov 2025 19:04 UTC

Someone got in touch recently for assistance in getting FetchRSS working. That presented me with the opportunity to thoroughly review the code and start thinking about a few improvements.

The original implementation worked well enough but it could be better. Having not used it since changing my site generator, I immediately noticed how slow it was to load in-browser, especially with 20+ feeds. It was quite painful to watch. Time for an upgrade.

Version next

Why not have the heavy lifting take place on the backend and serve one filtered file to the browser? That makes sense and makes this fast.

Download the RSS feeds.
Use Python to filter the feeds and create one combined feed file for the browser to consume.
The browser can then load one ‘static’ feed file (which takes milliseconds).

So that’s what I did.

Here are step-by-step instructions to implement a simple HTML RSS feed reader and aggregator with JavaScript and Python.

The first step is to create a directory called fetch_rss which will serve as the working folder.

Download the RSS feeds

Create a file called rss.sh and add the following to it:

#!/bin/bash

wget -q -T 10 -t 2 -U "FetchRSS/2.0 (+https://cybrkyd.com/rss-reader)" -O cybrkyd.xml https://cybrkyd.com/index.xml
wget -q -T 10 -t 2 -U "FetchRSS/2.0 (+https://cybrkyd.com/rss-reader)" -O bbc.xml https://feeds.bbci.co.uk/news/technology/rss.xml

Looking at the BBC line above, with wget:

-q: quiet mode. Suppresses most output; only errors are shown.
-T 10: timeout of 10 seconds. If the server does not respond within 10 seconds, the connection is aborted.
-t 2: retry count of 2. After a failure, wget will try up to two more times (total of 3 attempts).
-U "FetchRSS/2.0 (+https://cybrkyd.com/rss-reader)": User‑Agent string. Sends this custom identifier along with the request. Some servers treat requests differently based on the User‑Agent.
-O bbc.xml: output file name. Writes the downloaded content to bbc.xml rather than the default filename derived from the URL.
https://feeds.bbci.co.uk/news/technology/rss.xml: target URL. The RSS feed to be fetched (BBC Technology News).

Use Python to filter the feeds

Let’s use the awesome power of Python to filter and create one combined feed file for the browser to consume.

Create a file called rss_agg.py and add the following to it:

#!/usr/bin/env python3

import feedparser
import json
from datetime import datetime, timedelta
import time

# Configuration
FEEDS = [
    {'url': 'cybrkyd.xml', 'label': "Cybrkyd's RSS"},
    {'url': 'bbc.xml', 'label': "BBC Tech"}
]

ITEMS_PER_FEED = 4
MAX_AGE_DAYS = 10
OUTPUT_FILE = 'fetchrss.json'

def parse_feed(feed_url, feed_label):
    parsed = feedparser.parse(feed_url)
    items = []

    cutoff_date = datetime.now() - timedelta(days=MAX_AGE_DAYS)
    entries = parsed.entries[:ITEMS_PER_FEED]

    for entry in entries:
        pub_date = None
        for date_field in ['published_parsed', 'updated_parsed', 'created_parsed']:
            if hasattr(entry, date_field) and getattr(entry, date_field):
                pub_date = datetime.fromtimestamp(time.mktime(getattr(entry, date_field)))
                break

        if not pub_date or pub_date < cutoff_date:
            continue

        link = getattr(entry, 'link', '#')
        title = getattr(entry, 'title', '(No title)')

        items.append({
            'title': title,
            'link': link,
            'date': pub_date.isoformat(),
            'source': feed_label
        })

    return items

def main():
    all_items = []

    for feed in FEEDS:
        try:
            items = parse_feed(feed['url'], feed['label'])
            all_items.extend(items)
        except Exception as e:
            print(f"Error processing {feed['label']}: {e}")

    all_items.sort(key=lambda x: x['date'], reverse=True)

    with open(OUTPUT_FILE, 'w') as f:
        json.dump(all_items, f, indent=2)

    print(f"Completed")

if __name__ == "__main__":
    main()

The script fetches the previously downloaded RSS feeds and writes them to a JSON file.

It extracts four of the newest entries from each, ignores any items older than ten days, and gathers each entry’s title, link, ISO‑formatted publication date, and the feed’s label. It then merges all valid entries, sorts them by date in descending order, and writes the resulting array to fetchrss.json as pretty‑printed JSON.

The heavy lifting is complete. Here is a sample of the output JSON array from fetchrss.json:

[
  {
    "title": "Cryptology firm cancels elections after losing encryption key",
    "link": "https://www.bbc.com/news/articles/c62vl05rz0ko?at_medium=RSS&at_campaign=rss",
    "date": "2025-11-25T12:26:43",
    "source": "BBC Tech"
  },
  {
    "title": "Ofcom vows to name and shame platforms over online sexism",
    "link": "https://www.bbc.com/news/articles/cly52dd9lnmo?at_medium=RSS&at_campaign=rss",
    "date": "2025-11-25T11:51:18",
    "source": "BBC Tech"
  },
  {
    "title": "Anyone seen the hyphens?",
    "link": "https://cybrkyd.com/post/anyone-seen-the-hyphens/",
    "date": "2025-11-18T09:06:29",
    "source": "Cybrkyd's RSS"
  },
  {
    "title": "Debian 13 server",
    "link": "https://cybrkyd.com/post/debian-13-server/",
    "date": "2025-11-17T08:55:03",
    "source": "Cybrkyd's RSS"
  }
]

The browser and JS

The last mile is to load the combined RSS JSON into the page.

Create a new file called feed-loader.js and add the following to it:

document.addEventListener('DOMContentLoaded', async () => {
  try {
    const response = await fetch('fetchrss.json');
    const allItems = await response.json();

    const ul = document.createElement("ul");
    for (const item of allItems) {
      const li = document.createElement("li");

      const a = document.createElement("a");
      a.href = item.link;
      a.textContent = item.title;
      a.target = "_blank";
      a.rel = "noopener";

      const sourceSpan = document.createElement("span");
      sourceSpan.className = "source-label";
      sourceSpan.textContent = `${item.source}`;

      // Format date (YYYY-MM-DD)
      const dateObj = new Date(item.date);
      const formattedDate = dateObj.toISOString().split("T")[0];

      const dateSpan = document.createElement("span");
      dateSpan.className = "date-label";
      dateSpan.textContent = ` (${formattedDate})`;

      li.appendChild(a);
      li.appendChild(document.createElement("br"));
      li.appendChild(sourceSpan);
      li.appendChild(dateSpan);
      ul.appendChild(li);
    }

    const feedBox = document.getElementById("rss-feed-box");
    if (feedBox) {
      feedBox.innerHTML = "";
      feedBox.appendChild(ul);
    } else {
      console.error("Element with id 'rss-feed-box' not found");
    }
  } catch (err) {
    console.error("Error loading combined feed:", err);
    const feedBox = document.getElementById("rss-feed-box");
    if (feedBox) {
      feedBox.innerHTML = "<p>Error loading feed. Please try again later.</p>";
    }
  }
});

In a nutshell, the above formats each entry from fetchrss.json as a list with clickable titles, source and date, and injects the resulting list into the page element #rss-feed-box.

Finally, call feed-loader.js from the page where the RSS aggregated feed is to be displayed.

For example, in the same folder, make a file called index.html and add the following to it:

<!DOCTYPE html>
<html>
<head>
</head>
<body>

FetchRSS

<div id="rss-feed-box"></div>
<script src="feed-loader.js"></script>

</body>
</html>

CRON jobs to run it automatically

The downloading of the external RSS feeds and the Python script need to run on a timer to ensure the displayed FetchRSS is updated regularly. Here is an example of my own CRON jobs to achieve this:

0 */6 * * * cd /path/to/fetch_rss && bash rss.sh >/dev/null 2>&1
3 */6 * * * cd /path/to/fetch_rss && /path/to/mypython_env/bin/python3 rss_agg.py >/dev/null 2>&1

Every sixth hour at 0 minutes, rss.sh is executed to perform the external RSS downloads.
Every sixth hour at 3 minutes past, rss_agg.py is executed to format the downloaded RSS files and output the combined fetchrss.json file.

Since feed-loader.js is always static, whatever it finds in the refreshed fetchrss.json it will display on index.html.

And there we have it; version 2 of FetchRSS by Cybrkyd.

Enjoy.

Tagged in: #RSS #JavaScript #Python #JSON

Visitors: Loading...