I used to do this for job postings in target companies and a few other purposes ...

jldugger · on Oct 9, 2020

Oh, and I should mention, it's pretty cool to see how companies change job ads on a daily basis. Beyond just the spelling fixes, I've seen places advertise help wanted postings for like a day and remove them the next day on a regular basis. Or tack on a 'on call 24/7/365' statement. Or reuse a job posting but tack on the word 'Manager.'

simonw · on Oct 9, 2020

That's so interesting. Next time I'm job seeking I'll definitely look into doing this.

peterburkimsher · on Oct 9, 2020

If you're interested in Australia or New Zealand, we can chat. I have a scraper running on my spare laptop now, and am going to try porting it to Github Actions so it continues without me.

The source data is all HTML not JSON though, and I have to scrape pages, then parse job IDs, and then re-scrape the job listings themselves. Having it as a SQLite database is more helpful than the default search: e.g. all jobs that don't include the phrase "right to live and work in this location", all jobs that have email addresses, GROUP BY advertiser - features I wish but don't expect would ever be added to the source site.

jldugger · on Oct 9, 2020

In retrospect it mostly didn't work. What I really learned from this is that help-wanted pages are a formality. I figure places that hire people who read HN pay recruiters, and applying directly with the company less often 'skips the line' and more often finds a direct line to the trashcan, since h1b hiring laws require employers try to find local talent work first.