generating an rss feed

This commit is contained in:
ktyl 2022-06-03 15:50:38 +01:00
parent 5c0f8d5f7c
commit 63ac308620
1 changed files with 91 additions and 0 deletions

91
blogs/2022/6/3/rss.md Normal file
View File

@ -0,0 +1,91 @@
# RSS
I recently had a [GitHub issue](https://github.com/ktyldev/ktyl.dev/issues/1) suggesting I add an RSS feed for my blog.
RSS is a technology that tickles me just right (distributed, tiny, multi-purpose, open), and one I've been interested to learn more about for ages.
So, let's do that!
## An aggregator
Before I can start implementing an RSS generator into my website (which is a story for another day), I should have a way to test it.
That means finding myself an RSS aggregator - I settled on [`newsboat`](https://newsboat.org/) for now, but I'm also interested in [`sfeed`](https://codemadness.org/sfeed.html) for maximum automation potential.
Automation is a rabbithole unto itself however, so I'll content myself with a terminal UI application for the purposes of this post.
## Understanding a feed's format
Poking around a couple of blog RSS feeds [(1)](https://drewdevault.com/blog/index.xml), [(2)](https://reese.ovine.xyz) and the [relevant w3schools tutorial](https://www.w3schools.com/xml/xml_rss.asp) I've determined that the feed itself needs a title, link and description, and each post therein needs the same.
There seem to be a bunch of optional tags too, of which I'll include `pubDate` but omit the rest for now, to keep things simple.
```xml
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Just Testing</title>
<link>https://ktyl.dev/blog/index.html</link>
<description>Vaguely technical "blogging" from Cat</description>
<item>
<title>RSS</title>
<link>https://ktyl.dev/blog/2022/6/3/rss.html</link>
<description>A foray generating a blog's RSS feed</description>
</item>
</channel>
</rss>
```
Phew!
Writing XML by hand makes me pine for JSON, but with any luck we shouldn't have to do any more of that.
## Dynamic XML Generation
It's at this stage that most sane people would turn to their CMS, site builder, or some other library to generate their feed and be done with it.
Sadly, I am not one of those people, and so roll up my sleeves to consider modifications to my blog build process (which again, I'll cover in more detail some other day) to see where I can shoehorn in an RSS generator.
While looking at others' feeds, I thought about what ought to trigger rebuilding my feed, as that will decide where it ought to go in the build process.
Primarily I want to use it for new blog posts, but I thought maybe I should retain some kind of manual-update capability in case I wanted to talk about site updates.
However, thinking about that for another few seconds, I realised I could just make blog posts for those things.
If I can think of a reason later I'll change it, but for now I'll just automatically generate my feed from my blog content.
---
To start with, I'll just generate an empty feed with the correct title and links, to make sure I can get a feed onto the site in the right place and format.
My strategy for generating a lot of my site basically amounts to enthusiastic deployment of Python's `print()` and shell redirections into files, all trigged from rules in a Makefile.
So, the first step is to add a Make rule to generate the appropriate file.
```make
BLOG_RSS = $(BLOG_OUT_DIR)/index.xml
$(BLOG_RSS): $(BLOG_PAGES)
python scripts/mkblogrss.py $(BLOG_PAGES) > $@
blog: $(BLOG_TARGETS) $(BLOG_RSS)
...
```
`mkblogrss.py` is a Python script which takes the path of every post in `$(BLOG_PAGES)` and spits out the finished XML.
I can then make this rule a dependency of my already existing `blog` rule and have the index be generated every time the blog is rebuilt.
I've omitted a lot of the Makefile plumbing, but if you're interested you can see the whole thing [here](https://github.com/ktyldev/ktyl.dev/blob/main/makefile).
Now that I have the `index.xml` file being generated with some static XML, it's time to generate it dynamically from the state of my blog source files.
My blog is built by taking source Markdown files and converting them to HTML with Python's `markdown` module.
Using my existing Python scripts as reference, I used mostly regex to format an XML item for each Markdown file.
While testing in `newsboat` against a local webserver, I found my feed was displaying without any formatting, unlike other feeds I was checking for reference.
This turned out to be because I wasn't encoding the HTML tag characters `<` and `>` any differently, so `newsboat` was likely interpreting them as just more XML, which must have been confusing for it.
This was easily fixed with a little regex:
```py
import re
description = re.sub('<', '&lt;', description)
description = re.sub('>', '&gt;', description)
```
That's it!
At least as far as I can tell, I now have a working RSS feed.
All that's left to do is to add a link to the website proper, resolve the GitHub issue and publish this blog post.
You can find my shiny new RSS feed [here](https://ktyl.dev/blog/index.xml)!
Laters!