It takes an array with the entries as input, not a web page. But I guess the HTML parsing should take no more than another few lines? For HTML parsing, I have good experiences with the lxml module which is in the Debian repos. It is fast and works pretty well.
Looks like you're hosting this on fly.io - PAYG model. You could probably host this for free on Cloudflare Workers; 100k requests/day on the free tier; static content (the homepage) is free & unlimited.
Edit: The catch is the 10ms CPU cap per request - you'd need a super lean implementation. Django's too heavy for that.
I wrote a similar thing in go (using chromedriver, so it could handle things that need JS).
Handled most things nicely, but I found a few sites where I wanted multiple selections to be combined into one document.
I emailed the result to myself, turning any images into attachments; this meant my “feed reader” had read/unread tracking that synced across devices, some html support, folders, offline viewing, etc.
In some ways a good thing, no? Shows you've got work to do on optimisation for large audiences. A free stress test (unless you're on a host that charges per hit or bandwidth excess), as you will.
Did load eventually for me, thought it was broken as no styles but looks like it's intentional.
I made a CGI program that ran CSS selectors against URLs and returned the output. I debated making it public and then realized I probably didn't want to run an open proxy. I'm curious how long this will last.
Not the same but this gives me an idea… what if there was a map reduce for doms as a web primitive. Like imagine if I could make a dom (or feed) that was some selection and transformation of another dom
The few times I actually tried it, it worked badly, with huge chunks of text content missing from the page. Makes me wonder if with modern web the task has became so difficult even a browser couldn't pull it off, or if they just wasn't trying to do a good job with the feature.
Dates shouldn't matter. The feed has ID elements which is what identify entries. Atom has no guid element. So I would expect this to work with any reader.
That is a good idea.
59 requirements, including Django, seems pretty heavy though?
For my own RSS feed, I use this 48 line Python file with no dependencies outside the standard library:
https://github.com/no-gravity/atomfeed.py
It takes an array with the entries as input, not a web page. But I guess the HTML parsing should take no more than another few lines? For HTML parsing, I have good experiences with the lxml module which is in the Debian repos. It is fast and works pretty well.
I recently added the python-feedgen module for creating feeds in my blog generator: https://github.com/oxalorg/genox/commit/3a73013ffe82930b1a7e...
I always love removing dependencies and simplifying software. I will try and switch to a simpler implementation like yours, thanks for sharing!
Glad you’re find the tool interesting! A short blog post behind it: https://kschaul.com/post/2023/04/16/feedmaker-quickly-genera...
And the GitHub url (hopefully easy to host your own instance): https://github.com/kevinschaul/feedmaker
Looks like you're hosting this on fly.io - PAYG model. You could probably host this for free on Cloudflare Workers; 100k requests/day on the free tier; static content (the homepage) is free & unlimited.
Edit: The catch is the 10ms CPU cap per request - you'd need a super lean implementation. Django's too heavy for that.
Well, someone already did with JS: https://github.com/ProfessorManhattan/rss-worker
Python alone is many milliseconds to start. Unless they give you some allowances for interpreter overhead.
I wrote a similar thing in go (using chromedriver, so it could handle things that need JS).
Handled most things nicely, but I found a few sites where I wanted multiple selections to be combined into one document.
I emailed the result to myself, turning any images into attachments; this meant my “feed reader” had read/unread tracking that synced across devices, some html support, folders, offline viewing, etc.
The good news: made it to the front page.
The bad news: so did the 503 page.
In some ways a good thing, no? Shows you've got work to do on optimisation for large audiences. A free stress test (unless you're on a host that charges per hit or bandwidth excess), as you will.
Did load eventually for me, thought it was broken as no styles but looks like it's intentional.
Seems to be hosted using fly.io
https://github.com/RSS-Bridge/rss-bridge is what I've been using for the same purpose.
You can just use an XSLT stylesheet like this: https://wwwcip.cs.fau.de/~oc45ujef/misc/src/atom.xsl xsltproc includes a handy --html flag that lets you just process the source file directly.
Can you also generate+use the XSLT stylesheet dynamically from a form input so that you can use a single meta-stylesheet for multiple sites?
Oh, and is you brother coming to the party?
I made a CGI program that ran CSS selectors against URLs and returned the output. I debated making it public and then realized I probably didn't want to run an open proxy. I'm curious how long this will last.
Not the same but this gives me an idea… what if there was a map reduce for doms as a web primitive. Like imagine if I could make a dom (or feed) that was some selection and transformation of another dom
You have just re-invented XLST.
*XSLT
https://www.w3schools.com/xml/tryxslt.asp?xmlfile=cdcatalog&... give it a whirl!
Should be able to achieve this without selectors with HTML to Markdownish (something like Firefox's Reader mode).
Oh, so this is what Reader mode does.
The few times I actually tried it, it worked badly, with huge chunks of text content missing from the page. Makes me wonder if with modern web the task has became so difficult even a browser couldn't pull it off, or if they just wasn't trying to do a good job with the feature.
I love this.
Has anyone tested to see if it works with Blogtrottr which will email you whenever there's a new item in an RSS feed?
Just since this doesn't seem like it even includes a date field in the RSS? And of course no guid. So I'm wondering how compatible it winds up being.
Dates shouldn't matter. The feed has ID elements which is what identify entries. Atom has no guid element. So I would expect this to work with any reader.
But is this producing ID elements? And if so, based on what, since they don't seem to be coming from any CSS selectors? That's my question.
It seems to use the link as the ID based on clicking a few examples on the site. An ok option for this type of thing.
I wish they had concrete, accurate id and created_at. IIRC these attributes are fixed in AT.
Same can be done wirh freshrss