Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> (BTW, if this is gonna be done, we should also block all AI bots and search engines at the IP address level.)

That's not possible or necessary.

Not possible, because it simply isn't possible in the general case to differentiate real papers from scrapers, without using device attestation. For an extreme example of this, see the Recap the Law project[1], which gives real human users an extension which scrapes as they browse.

Not necessary, because scrapers for AI training data an entirely separate problem completely unrelated to marketing, and because robots.txt will serve to stop the majority of search engine indexing, which is all that we need. Actual blocking of engines isn't necessary, because all the big ones are well-behaved, and after they stop indexing HN, marketers won't care about HN for the purposes of SEO-related influence campaigns any more.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: