RSS Reader Using Hugo


Hey πŸ‘‹,

inspired by this RSS Server Side Reader idea I created my own within Hugo.

Find it at /reader!

Why?

For the last year, I have been trying to find interesting blogs to follow. Every time I found one I added it to Feedly. That was nice, but if I didn’t check it in a while content would pile up.

After reading the aforementioned post, I thought that was a nice and simple idea. And I love nice and simple ideas so… I decided to implement it for myself.

Generating the feed

With the help of Claude Code I created the following PHP script at scripts/generate-reader.php. I used PHP because it’s the one I am most comfortable with, but any similar script should work.

πŸ’‘ You can ask your favorite AI tool to rewrite it into your preferred language. Example: python, javascript, etc.

<?php

// Local feeds configuration - replace with your desired RSS feeds
const FEEDS = [
    [
        'rss_url' => 'https://aaron.com.es/index.xml',
    ],
    // Add more feeds here as needed
];

const OUTPUT_FILE = __DIR__ . '/../content/reader/index.html';
const CACHE_DIR = '/tmp/generate-reader-cache';
const CACHE_DURATION = 24 * 60 * 60; // 24 hours
const MAX_AGE_DAYS = 30; // Only show entries from last 30 days

function getCacheKey($url) {
    return CACHE_DIR . '/' . md5($url) . '.cache';
}

function fetchRSSFeed($url) {
    // Check cache first
    $cacheFile = getCacheKey($url);
    if (file_exists($cacheFile) && (time() - filemtime($cacheFile)) < CACHE_DURATION) {
        echo "Using cached data for: $url\n";
        $content = file_get_contents($cacheFile);
    } else {
        echo "Fetching fresh data for: $url\n";
        $context = stream_context_create([
            'http' => [
                'timeout' => 10,
                'user_agent' => 'Mozilla/5.0 (compatible; PHP RSS Reader)'
            ]
        ]);

        $content = file_get_contents($url, false, $context);
        if ($content === false) {
            return null;
        }

        // Ensure cache directory exists
        if (!is_dir(CACHE_DIR)) {
            mkdir(CACHE_DIR, 0755, true);
        }

        // Save to cache
        file_put_contents($cacheFile, $content);
    }

    libxml_use_internal_errors(true);
    $xml = simplexml_load_string($content);
    if ($xml === false) {
        return null;
    }

    return $xml;
}

function parseRSSFeed($xml) {
    $items = [];

    // Handle RSS format
    if (isset($xml->channel->item)) {
        foreach ($xml->channel->item as $item) {
            $pubDate = (string)$item->pubDate;
            $timestamp = $pubDate ? strtotime($pubDate) : 0;

            $items[] = [
                'title' => (string)$item->title,
                'link' => (string)$item->link,
                'date' => $timestamp ?: time(),
                'dateString' => $pubDate ?: date('Y-m-d')
            ];
        }
    }
    // Handle Atom format
    elseif (isset($xml->entry)) {
        foreach ($xml->entry as $entry) {
            $published = (string)$entry->published;
            $updated = (string)$entry->updated;
            $dateStr = $published ?: $updated;
            $timestamp = $dateStr ? strtotime($dateStr) : 0;

            $link = '';
            if (isset($entry->link)) {
                if (isset($entry->link->attributes()->href)) {
                    $link = (string)$entry->link->attributes()->href;
                } else {
                    $link = (string)$entry->link;
                }
            }

            $items[] = [
                'title' => (string)$entry->title,
                'link' => $link,
                'date' => $timestamp ?: time(),
                'dateString' => $dateStr ?: date('Y-m-d')
            ];
        }
    }

    // Sort by date descending and take first 3
    usort($items, fn($a, $b) => $b['date'] - $a['date']);
    return array_slice($items, 0, 3);
}

function generateHTML($allEntries) {
    $html = "---\ntitle: Reader\ntype: page\n---\n\n";
    $html .= "<link rel=\"stylesheet\" href=\"/css/reader.css\">\n\n";
    $html .= "<p>These are the most recent articles from the blogs/sites I follow. Updated daily.</p>\n\n";

    // Group entries by date
    $entriesByDate = [];
    foreach ($allEntries as $entry) {
        $date = date('Y-m-d', $entry['date']);
        if (!isset($entriesByDate[$date])) {
            $entriesByDate[$date] = [];
        }
        $entriesByDate[$date][] = $entry;
    }

    // Generate HTML grouped by date
    foreach ($entriesByDate as $date => $entries) {
        $html .= "<h3 class=\"reader-date-header\">{$date}</h3>\n";
        $html .= "<ul class=\"reader-list\">\n";

        foreach ($entries as $entry) {
            // Escape HTML for safety
            $articleTitle = htmlspecialchars($entry['title'], ENT_QUOTES);
            $articleUrl = htmlspecialchars($entry['link'], ENT_QUOTES);
            $blogTitle = htmlspecialchars($entry['blog_title'], ENT_QUOTES);

            $html .= "<li>\n";
            $html .= "  <a class=\"reader-link\" href=\"{$articleUrl}\" target=\"_blank\">{$articleTitle}</a> ";

            $html .= "<span class=\"reader-blog\">{$blogTitle}</span>";

            $html .= "\n</li>\n";
        }

        $html .= "</ul>\n\n";
    }

    $html .= "<p><em>Generated on " . date('Y-m-d H:i:s') . "</em></p>\n";

    return $html;
}

try {
    echo "Processing local feeds...\n";
    $feeds = FEEDS;

    $allEntries = [];
    $cutoffTimestamp = time() - (MAX_AGE_DAYS * 24 * 60 * 60);
    $feedsWithoutRecentEntries = [];

    foreach ($feeds as $feed) {
        if (empty($feed['rss_url'])) {
            echo "Skipping feed with empty rss_url: " . ($feed['url'] ?: 'unknown') . "\n";
            continue;
        }

        echo "Processing: {$feed['rss_url']}\n";

        $xml = fetchRSSFeed($feed['rss_url']);
        if ($xml === null) {
            echo "Failed to fetch or parse: {$feed['rss_url']}\n";
            continue;
        }

        $items = parseRSSFeed($xml);
        $recentItems = [];

        foreach ($items as $item) {
            if ($item['date'] >= $cutoffTimestamp) {
                $item['blog_title'] = parse_url($feed['rss_url'], PHP_URL_HOST);
                $recentItems[] = $item;
                $allEntries[] = $item;
            }
        }

        if (empty($recentItems)) {
            $feedsWithoutRecentEntries[] = [
                'url' => $feed['url'] ?: $feed['rss_url'],
                'title' => parse_url($feed['url'] ?: $feed['rss_url'], PHP_URL_HOST),
                'oldest_entry_date' => !empty($items) ? date('Y-m-d', max(array_column($items, 'date'))) : 'No entries found'
            ];
        }

        echo "Found " . count($items) . " total items, " . count($recentItems) . " within last " . MAX_AGE_DAYS . " days\n";
    }

    // Sort all entries chronologically (newest first)
    usort($allEntries, fn($a, $b) => $b['date'] - $a['date']);

    // Ensure output directory exists
    $outputDir = dirname(OUTPUT_FILE);
    if (!is_dir($outputDir)) {
        mkdir($outputDir, 0755, true);
    }

    // Generate and write HTML
    $html = generateHTML($allEntries);
    file_put_contents(OUTPUT_FILE, $html);

    echo "Generated " . count($allEntries) . " total entries in " . OUTPUT_FILE . "\n";

    if (!empty($feedsWithoutRecentEntries)) {
        echo "\n⚠️  NOTICE: The following sources have no entries within the last " . MAX_AGE_DAYS . " days:\n";
        foreach ($feedsWithoutRecentEntries as $feed) {
            echo "  - {$feed['title']} (latest: {$feed['oldest_entry_date']})\n";
        }
        echo "\n";
    }

    echo "Done!\n";

} catch (Exception $e) {
    echo "Error: " . $e->getMessage() . "\n";
    exit(1);
}

This script:

πŸ“ The script above is not exactly the same I am using. I am retrieving the list of feeds from my beloved API instead of the local FEEDS constant.

Note-to-self: I seriously need to talk about the features I add to my API.

Apart from the script I have added:

How to make it run?

You can, of course run it manually php scripts/generate-reader.php && hugo. But I already have a GitHub Action that deploys my blog on every new commit to the main branch. So I added a new workflow to generate the reader daily at 00:00 UTC at .github/workflows/generate-reader.yml:

name: Generate Reader RSS

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at 00:00 UTC
  workflow_dispatch:

jobs:
  generate-reader:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
        with:
          fetch-depth: 0
      
      - name: Setup PHP
        uses: shivammathur/setup-php@v2
        with:
          php-version: '8.1'
      
      - name: Generate reader RSS
        run: php scripts/generate-reader.php
      
      - name: Setup GIT configuration
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com          
      
      - name: Commit and push changes
        run: |
          git add content/reader/
          if git diff --staged --quiet; then
            echo "No changes to commit"
          else
            git commit -m "Auto-generate reader RSS feed"
            git push origin master
          fi          

One could easily add hugo generation in the workflow itself (since the script is only generating the content/ file). But in my case I wanted to reuse my existing “build and deploy” workflow. So I edited it, so that it is triggered when this new workflow completes.

on:
  workflow_run:
    workflows: ["generate-reader.yml"]
    types:
      - completed

πŸ“ This might sound confusing because I haven’t shared my “build and deploy” workflow yet. If this is strange to you send me a comment mentioning this and I will try to clarify better or give a more complete example.

Hope you like this! And feel free to peek on what I am reading anytime! You might find the blogs as interesting as I do.

As always, if this was helpful. Please, click the like button below!