blog/blogs/2022/6/3/rss.md

4.8 KiB

RSS

I recently had a GitHub issue suggesting I add an RSS feed for my blog. RSS is a technology that tickles me just right (distributed, tiny, multi-purpose, open), and one I've been interested to learn more about for ages. So, let's do that!

An aggregator

Before I can start implementing an RSS generator into my website (which is a story for another day), I should have a way to test it. That means finding myself an RSS aggregator - I settled on newsboat for now, but I'm also interested in sfeed for maximum automation potential. Automation is a rabbithole unto itself however, so I'll content myself with a terminal UI application for the purposes of this post.

Understanding a feed's format

Poking around a couple of blog RSS feeds (1), (2) and the relevant w3schools tutorial I've determined that the feed itself needs a title, link and description, and each post therein needs the same. There seem to be a bunch of optional tags too, of which I'll include pubDate but omit the rest for now, to keep things simple.

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">

<channel>
    <title>Just Testing</title>
    <link>https://ktyl.dev/blog/index.html</link>
    <description>Vaguely technical "blogging" from Cat</description>

    <item>
        <title>RSS</title>
        <link>https://ktyl.dev/blog/2022/6/3/rss.html</link>
        <description>A foray generating a blog's RSS feed</description>
    </item>
</channel>

</rss>

Phew! Writing XML by hand makes me pine for JSON, but with any luck we shouldn't have to do any more of that.

Dynamic XML Generation

It's at this stage that most sane people would turn to their CMS, site builder, or some other library to generate their feed and be done with it. Sadly, I am not one of those people, and so roll up my sleeves to consider modifications to my blog build process (which again, I'll cover in more detail some other day) to see where I can shoehorn in an RSS generator.

While looking at others' feeds, I thought about what ought to trigger rebuilding my feed, as that will decide where it ought to go in the build process. Primarily I want to use it for new blog posts, but I thought maybe I should retain some kind of manual-update capability in case I wanted to talk about site updates. However, thinking about that for another few seconds, I realised I could just make blog posts for those things. If I can think of a reason later I'll change it, but for now I'll just automatically generate my feed from my blog content.


To start with, I'll just generate an empty feed with the correct title and links, to make sure I can get a feed onto the site in the right place and format. My strategy for generating a lot of my site basically amounts to enthusiastic deployment of Python's print() and shell redirections into files, all trigged from rules in a Makefile. So, the first step is to add a Make rule to generate the appropriate file.

BLOG_RSS = $(BLOG_OUT_DIR)/index.xml

$(BLOG_RSS): $(BLOG_PAGES)
    python scripts/mkblogrss.py $(BLOG_PAGES) > $@

blog: $(BLOG_TARGETS) $(BLOG_RSS)
    ...

mkblogrss.py is a Python script which takes the path of every post in $(BLOG_PAGES) and spits out the finished XML. I can then make this rule a dependency of my already existing blog rule and have the index be generated every time the blog is rebuilt. I've omitted a lot of the Makefile plumbing, but if you're interested you can see the whole thing here.

Now that I have the index.xml file being generated with some static XML, it's time to generate it dynamically from the state of my blog source files. My blog is built by taking source Markdown files and converting them to HTML with Python's markdown module. Using my existing Python scripts as reference, I used mostly regex to format an XML item for each Markdown file. While testing in newsboat against a local webserver, I found my feed was displaying without any formatting, unlike other feeds I was checking for reference. This turned out to be because I wasn't encoding the HTML tag characters < and > any differently, so newsboat was likely interpreting them as just more XML, which must have been confusing for it. This was easily fixed with a little regex:

import re

description = re.sub('<', '&lt;', description)
description = re.sub('>', '&gt;', description)

That's it! At least as far as I can tell, I now have a working RSS feed. All that's left to do is to add a link to the website proper, resolve the GitHub issue and publish this blog post. You can find my shiny new RSS feed here!

Laters!