> I tried several Python libraries to extract HTML content, but none worked as well as the readability one used by Firefox. Since it’s a JavaScript package, I had to resign myself to introducing an optional dependency on Node.js
Mmm, no you don't. In fact, even if you install it with NPM—which you don't have to do, of course—Readability was designed to run in the browser, not on Node.
The "JavaScript" → "Node" logical leap that people make (even in starkly inappropriate circumstances like this one) shows how much damage the warped traditions of the package.json cult have done to good, clear-thinking reasoning.
fwiw my reasoning wasn't javascript -> node, it was more that I was biased towards doing all significant work on the backend. So it didn't even occur to me that I could add some ad hoc logic to fetch the article HTML from js, pass it to readability and insert the in the UI, which I guess is what you suggest.
Having this logic available in the backend was convenient for me, anyway. I'm using it also for the "send to kindle" feature, which I couldn't have if content cleaning was done browser side. Having it in the backend also opens the option to save the content in the db to skip loading/parsing latency and preserving it long term.
> it didn't even occur to me that I could add some ad hoc logic to fetch the article HTML from js, pass it to readability and insert the in the UI
I don't know what you mean by "ad hoc". Again, Readability was written to run in the browser. It operates on the DOM, not HTML. If there's anything ad hoc going on, it would be (a) the fake DOM that Gijs wrote[1] so you can run it when all you have is HTML instead of an object graph, and (b) logic involved in shelling out to a separate NodeJS process from Python. These are hacks on top of hacks.
> which I guess is what you suggest
I wasn't suggesting anything. I'm making an observation about how illogical the JS,-therefore-Node cliché is. If I were going to suggest anything, it would be "don't use Readability" since it isn't a good fit for this use case. If "use Readability" were a requirement, then I would suggest, for the benefit of yourself and your brethren, rewriting Readability in Python or creating a binary Python module using either QuickJS and Readability plus Gijs's fake DOM, or Haxe.
> If I were going to suggest anything, it would be "don't use Readability" since it isn't a good fit for this use case.
It is a perfect fit for the user experience I was going for, even if it adds development and operational complexity --both of which were a lower priority for this project, as I stressed in the blog post.
Mmm, no you don't. In fact, even if you install it with NPM—which you don't have to do, of course—Readability was designed to run in the browser, not on Node.
The "JavaScript" → "Node" logical leap that people make (even in starkly inappropriate circumstances like this one) shows how much damage the warped traditions of the package.json cult have done to good, clear-thinking reasoning.