Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Follower Factory (nytimes.com)
432 points by gregkerzhner on Jan 27, 2018 | hide | past | favorite | 165 comments


The New York Times can graph and identify bots, and other publications / bloggers have been able to identify networks of them. Supposedly, the best and brightest get recruited by companies like Twitter and Facebook, but for some reason they're incapable of identifying and shutting these things down?

What is all the hubbub about machine learning, and things like neural networks, if they aren't being actively employed by the tech giants?

There's only a couple possible scenarios I can come up with for why this continues to occur:

1) The best and the brightest actually don't work for any of these companies; they're just constantly trying to catch up to teams of developers more highly skilled than them. They are bright, and decent, but ultimately mostly average.

2) The developers in these companies are on par with those who can graph and identify these networks of malicious and fake content makers, but they just don't care because # == $.

3) The companies actually have their head so far in the sand that they don't have the technology, or the resources, to combat this.

I find it hard to believe that people outside of the ecosystem of these companies have more skill, knowledge, or capabilities (1). It also pains me to think that hundreds of millions or billions of dollars simply cannot run a company that can combat this (3).

So that seems to leave complacency (2), which is disturbing and definitely should send up more red flags about whether we should be giving these networks any attention at all.

I'd love to hear other possibilities or maybe more information to back up any of the other scenarios.


I used to work at Twitter, so not sure I can get into specific numbers, but they remove hundreds of thousands to millions of accounts every day (literally every day). It's particularly tricky because bots aren't inherently against the ToS (e.g. the earthquake bot cited in the article), so you can't just ban based on "Is this account behaving like a human?".

Many "bots" though are actually giant networks of humans being orchestrated and they share things that are compelling to a subset of real users (e.g. Hillary did this bad thing... lock her up), so now real users are retweeting and muddying any signal that was there to detect a bot network. It's an incredibly challenging problem at scale and you basically only see the < 1% that Twitter missed.

They've got a pretty sizable team that works on it so hopefully in a few years they'll have solved it much like Gmail has mostly solved spam.


Maybe it's the "one click to share" behavior - an extremely low-resolution action - that is the problem? Perhaps things would be easier to fix if the human engagement component were more than "can you press a button?"


> much like Gmail has mostly solved spam.

Gmail has not solved spam. Gmail and Microsoft dump so much legitimate e-mail that it's basically made e-mail totally unusable:

http://penguindreams.org/blog/how-google-and-microsoft-made-...


No they don't. They, like most abuse sensitive recipients, don't accept mail from small indie servers, so people know not to use small indie servers to send mail.


I’ve read this a lot, often as a counter argument to running your own mail servers, and I believe it to be mostly FUD.

I run a small SaaS and it sends out quite a lot of notification emails to customers (they are paying me for these emails, they aren’t spam or newsletters). I was originally using SendGrid but their deliverability rates were pretty bad and I was about to hit a ceiling for my plan, which would meant the monthly cost would more than double, which was unaffordable for me.

I decided to try setting up my own mail server. Everyone said don’t do it, you emails won’t be delivered, it’s too much work, etc. I setup a VPS with OVH as I’ve heard their policies regarding email spam are pretty strict (I contacted them before to clarify - if they find out you are spamming they cut your off ASAP to protect their other customers). The IP I got was actually on one blacklist, but I got that resolved in a few days.

The deliverability rate was better than SendGrid and it was costing me 1/10th of the price. One thing that I would assume helped is the domain had already been in use for a few years, and I setup proper DKIM and SPF records.

The server has been running for a year and a half now, and the only bounces I get are due to invalid or suspended email accounts. Most of my customers are small to medium sized businesses, a lot of which run their own Exchange servers - with SendGrid those were usually the ones who were bouncing my mail due to being on a blacklist. Now admittedly just being accepted doesn’t mean it is ending up in a users inbox, but I haven’t had any complaints from paying customers.


It's not just small indie servers. They do "reputation scoring" of senders that penalize them for things like sending plaintext mail. See https://groups.google.com/forum/#!msg/mozilla.governance/WWK... for more details of one example.


Right! We can't trust those "small indie servers" on the world wide web. Who knows where they've been?! Make sure you use mail servers furnished by Google, Microsoft, and other NSA-approved facilitators. They only cost $10 / mo / user, that is easily affordable for anyone who is not a Russian agent tricking the unwashed masses into preventing Hillary Clinton from ascending to her rightful place as ----- errrm, uh, from "corrupting democracy".

Any non-Google, non-Microsoft mailers will be blocked, lest users be asked to renew their VIAGRA prescription from a Canadian pharmacy, or a non-compliant political message be received. It's for your own good, citizen!

Thank you, wise and great Google! Thank you, wise and great Microsoft! I wouldn't know how to use this free, non-controlled computer network without your benevolent work!

unsnark: Seriously, the way that Google et al have captured "anti-spam" mechanisms and converted them into "pay a big company for your mail" mechanisms is extremely sad, and speaks to the true forces at work behind all the warm and fuzzies about an "open web".


Even if that were true, then your parent's statement still holds: "basically made e-mail totally unusable"

If it's down to a few big senders and the rest is rejected, they might as well get rid of SMTP, close it off, and just talk a proprietary protocol among each other.


I'm not convinced about this generalization. Case in point is the mail relay server of our company which definitely counts as a small indie server and whose email is accepted and not marked as spam.

Of course we have configured SPF, DKIM and a proper reverse dns lookup on both ipv4 and IPv6, but that’s just being a good email-citizen anyways.


I think it was the 99% Invisible podcast that did an interesting report about whole networks of people who were paid in Mexico to spread election messages; and these networks are often reused by gangs or other organizations to promote stuff or harass opponents. So in this case, they're not bots. They're actual humans, because it's cheaper to do that than pay people to write scripts.


I don't remember that 99pi. Was it Reply All?

https://gimletmedia.com/episode/112-the-prophet/

After Andrea is attacked by a stranger in Mexico City, she just wants to figure out who the guy was. Investigating this question drops her right into the middle of one of Mexico’s biggest conspiracies.


Ah yes, sorry you're right. It was Reply All.


Additionally, you can train a human to become better at astroturfing over time, and train your network as a whole to be resilient to partial takedowns.

These things are much easier to orchestrate than bots, especially if your specialization is not CS but scamming / manipulating opinions, and in many cases, especially with scale, human labor can actually be more expensive.

Of course, subsidizing the labor to impoverished countries definitely helps; these networks tend to lack the sophistication used in more sensitive political or industrial topics.

Sometimes for national or global campaigns you'll find a mix of both cheap labor, a small team of experienced astroturfers with a good grip on their persona acquisition and management, and bot networks, each targeting a different set of demographics. If your target demo is sufficiently ignorant or radicalized in their beliefs, it doesn't take much to convince them and bots will do just fine.


It seems to me the simplest answer is that they're fully aware and not doing anything about it. It doesn't take the best and brightest to figure it out. I've purchased thousands of twitter and facebook followers before for $5 on fiverr just for fun. There's no reason twitter or facebook can't do the same and identify the fake accounts. The logical conclusion is that they don't want to.


I agree. Just buy them. If they spend $100K a year on buying fake accounts and then banning those accounts, that will be more effective than messing about paying a team of 10 engineers to do pattern analysis. If they spent $2M a year on buying and then banning fake accounts, they would squash it completely.

They're not doing it because they like the big follower counts themselves. They're in on the con.


How does paying the botters market rate raise their cost of entry?


The price of a single bot is amortized across thousands upon thousands of follows. If it costs a penny to have one bot issue one follow, and a single bot performs 5,000 follows, then that bot has earned $50. If Twitter were to buy follows for the market rate (one penny) then ban those accounts, it would drive the market rate for bots up to (using our hypothetical example) $50 per follow. (This is an oversimplified example, of course, but reducing the average number of follows that a given bot can perform would indeed have the effect of raising the market rate, though we can continue to argue over the magnitude; it's self-evident that if generating bots didn't have fixed cost overheads, then we'd see fewer bots with incredulously inhuman numbers of followed accounts.)


This seems liable to cause unintended side effects...

https://en.wikipedia.org/wiki/Cobra_effect


I disagree. The cobra effect occurred because the English provided a market for cobras where there was no market before. There's already a large market for followers and twitter adding itself to that market in such a limited way does not add significant new demand.


The Reagan administration tried this in the Iran-Contra affair. They only succeeded in incentivizing more terrorism and abduction of US citizens.


Twitter has a much higher cost for false positives. If 1% of the accounts this article thought were bots were actually legitimate, no big deal. If Twitter banned that 1%, that's ten thousand pissed off people.

Also, bot makers don't care about fooling the New York Times, they care about fooling Twitter. If Twitter applied this analysis, the bot makers would adapt to it. It only looks like it works until you start to use it.


> If Twitter applied this analysis, the bot makers would adapt to it.

The problem with the arms-race argument here is that it implies that just because an arms race is inevitable that the only acceptable course of action is total capitulation. Twitter is bigger than any of the botters, and if it wanted it could raise the cost of entry of botting high enough that nobody would be able to afford the bots in any quantity large enough to make a difference. They just simply don't want to.

(As for false positives, I've seen plenty of people whose accounts were suspended for reasons unclear to them; Twitter seems to care very little whether its captive userbase is pissed off or not.)


I don't think it implies that unless you've already accepted they aren't doing anything. I think it's entirely possible that a lower class of bots are routinely blocked, but a higher class are always mutating to get past the latest filtering, so has a large fairly stable population.


> Twitter is bigger than any of the botters, and if it wanted it could raise the cost of entry of botting high enough that nobody would be able to afford the bots in any quantity large enough to make a difference.

How would they do this without triggering enough false positives to destroy their product?

I mean seriously, if you are capable of doing this that well, you could probably walk into a job paying >>$750k/yr at any of the large tech companies, because they'd be falling all over themselves trying to get at your magic.


> How would they do this without triggering enough false positives to destroy their product?

This implies that Twitter has demonstrated that they care about practicing discretion when suspending accounts, and that they care what 99% of their (human) userbase thinks, and that Twitter's product is Twitter accounts. But they haven't; they don't; and it's not. The state of being pissed off at Twitter is the perpetual existence of all Twitter power users. Unless you're a celebrity, Twitter doesn't care what you say about Twitter. And the only thing important to Twitter's business model is keeping celebrities on Twitter. The reason that they don't give a damn about cracking down on bots isn't because they don't have enough $750k engineers, it's because dealing a blow to the egos of their Influencers by banning their botted followers would threaten their bottom line, both in terms of outrage from the 1% of Twitter accounts that matter and in terms of depressing their wildly inflated metrics.


TWitter trashed my very real account that was tied to a disused email address after I tried to log in from my phone and misremembered my password and refused to lift a finger to help me get it back despite me offering to provide all sorts of other verification information.


That’s nonsense. It’s trivially easy, and as noted elsewhere they can just add a verification check for suspect accounts.

It’s easy to see how much harder Facebook and Google accounts are to create bots for than Twitter. Twitter doesn’t care.


Seems like, if you detect a bot, you should impose a captcha on every x actions. Not enough to ruin your customer's experience, but enough to make botting harder.


I would be absolutely stunned if Twitter et al. hadn't already tried this...


I wouldn't be suprised at all. Go find an openly racist account and report them. Two weeks later Twitter will agree with you...but not ban the account


I’ve come to the conclusion after reporting hundreds of obvious spam/racism/hate/etc accounts that twitter just doesn’t care. It increases their metrics and that’s all they want. Makes the share holders happy to see numbers going up and sadly in turn it does increase ad views.


I would be curious to know how this failed, if they did try it. Are captcha automatable? You could make it a bit better by challenging every X actions, and have X decline with a failed captcha.


Captcha has issues in that:

- captcha that isn't hard for humans is often easy to automate

- captcha that is hard to automate frustrates a lot of humans too

- and to add insult to injury you can "automate" the hard ones too by paying like $1 per 1000 captchas or something (think Amazon Turk like stuff)


> They just simply don't want to.

This really does seem to be the case. And I find it incredibly distasteful that they were used by DJT in his campaign and continues to be used. It was rarely a place where politicians posted anything, and now it seems to be the first place where any of them do.


> It was rarely a place where politicians posted anything

Twitter was founded in 2006 and Barack Obama joined in 2007. Today he has 15.4k tweets. Seems like a lot to me.


There are plenty of ways besides suspending accounts to make the cost of botting too high to be profitable. A captcha for example is fairly frictionless for a typical user.


It seems to me that there may be a more benign explanation: Facebook/Twitter/etc are in a large-scale, iterative, adversarial game with many opponents.

It might be relatively easy for humans to catch clusters of fake followers, but that doesn’t scale. If you try to create heuristics to catch the fake folllowers, the adeversaries will try to adapt. If you try to learn rules, you can update your heuristics faster, but you need trading data, your adversaries could learn too, and adversaries could try to do data poisoning attacks.

It seems like a really tricky situation for Facebook and Twitter.


I think that you are right, but only in the qualitative sense, not the quantitative. Spam networks have shown us that taking out a few big players - the largest few sub-graphs of the network - can have dramatic effects on spam rates, reducing them by e.g. a factor of 20. I’m confident that Twitter could identify and eliminate a large percentage of the problem overnight. So far they not even acknowledged that there is a problem.

O.P. asked for alternative explanations and people are trying to give these companies the benefit of the doubt. There are actually some much less generous explanations, such as that they are knowingly in the pay of forces adversarial to US interests.


How do you "take out" an attacker? They will just create more accounts. The accounts themselves don't have to cluster with each other. And let's say you do find a large set of accounts and ban them. New fake accounts are being constantly created and sold and used.


That's where various fingerprints (the sort of thing, incidentally, AI/ML should be pretty good at sorting out) should come in to play.

Spammers, ultimately, act like spammers, and bots like bots. Unlike legitimate accounts, which typically either a) follow a small number of related accounts or b) are fairly-well-known high-profile accounts (and yes, pseudonymous accounts may be high-profile), bots either need to act within specific networks or be very indiscriminate in whom they follow / link with. And they're tremendously active.

Whilst Google+ is not Facebook, Stone Temple Consulting ran an analysis of 500,000 profiles in 2015 to see what posting activity looked like.[1] I'm fairly familiar with the methods used as I'd pioneered most of them on a smaller, 50k sample.

Looking at public posts only (not private posts, and not comments), there are roughly 53,000 G+ users posting 100+ times monthly, and 106,000 posting 50+ times monthly.

From an independent analysis, I've estimated that Facebook traffic is, very roughly, 10x greater than G+ (as of August, 2015). So you might bump these values up by an order of magnitude to get Facebook-level scales of operation. Twitter is actually far closer to G+ in scale.[2]

That is, the search space for bots is roughly 500k - 1m accounts of major active players. And it should be possible, even with only computer-assisted human intervention, to make a pretty good cut through that space.

So the problem is not all that intractable.

________________________________

Notes:

1. https://www.stonetemple.com/real-numbers-for-the-activity-on... STC's Eric Enge credits me on that page.

2. https://www.reddit.com/r/dredmorbius/comments/3hp41w/trackin...


How did we tackle spam?


SPF, DKIM and DMARC?

I jumped down the rabbit hole and set up my own mail server about a year ago. With postfix configured to look at the above I get zero spam. Looking at logs, I do get lots of connection attempts from random IPs in China and such, but my postfix unilaterally rejects anyone that doesn't have good DNS settings.


spammers can and do trivially setup SPF, DKIM and DMARC


My belief is that they are still profiling their spammers, and are not acknowledging anything specific as a sort of 'poker face.'

>people are trying to give these companies the benefit of the doubt.

I think that's a good thing; we're all too susceptible to cynicism. Being charitable is perfectly fair (and ideal) when dealing with people who haven't demonstrated a propensity to deceive (or operating with malicious intent).


Perhaps they should try shadow-ban-plus. Banned accounts are only visible to, and can see a random subset of users, at random times. If a cluster member looks for other bots he may or may not find them because either he is banned or the target is banned, or both, or there is a glitch, and there is no way to tell.


It would be awesome if the New York Times shared the tools they've developed for this article to identify and analyze bots!

I usually hate those scroll-responsive animated web pages, but the scrolling illustrations and data visualizations in this article were particularly well done and pretty amazing. Not just pretty (the initial face sequence) and clever (the scrolling iPhone) but also actually useful and relevant (like the subscriber graph data visualizations expanding over time). I would love to know more about the tools they used to make those too.

A plea to NYT from a subscriber: Please share some of that great software you've developed, and publish it on your github account!!!

https://github.com/NYTimes


Investigations graphics editor here — as it happens a lot of the tools we use are open source! The interactive components were all built with Svelte (https://svelte.technology) and Rollup (https://rollupjs.org/guide/en).

We'd like to eventually open source some of the stuff we built to do the analysis as well, though it depends on time and priorities.


Thank you for those links, and your great work. If Twitter refuses to do the kind of investigation and analysis that you performed because they're making too much money from bots, then the free press and open source community needs to take up the slack!


Hey Rich! The graphics are on point for this article. I really liked the unfading faces at the start. Also, the times should put a tip jar on these kinds of articles, I'm not an avid reader to subscribe, but would want to appreciate any good articles I find.


On the graphics, it looks like the density changes for the rest of the dates whenever the identified bot families show up (creating the differently colored columns). When the identified bot families stop, the density changes back. Is that an artifact of how you're displaying it, or is it an indication that the followers before/after are also bots, but they've been mixed with bots created at other dates, where the bot families identified are just the low quality (non-mixed) bot families?


[flagged]


You gotta at least give Rich credit for contributing critical bug fixes back to open source tools he uses like Svelte, even when he's under a deadline!

https://twitter.com/Rich_Harris/status/956961714578317312

https://github.com/sveltejs/svelte/pull/1137

It seems like Svelte is worth taking a good look at!

https://github.com/sveltejs

I've been shopping around for a "magical disappearing UI framework" to re-implement my old jQuery pie menu component. I was considering Polymer because it was minimal and followed the latest web standards for components, shadow dom, events, etc. How does Svelte compare and interoperate with Polymer?

http://www.donhopkins.com/mediawiki/index.php/JQuery_Pie_Men...


NYT has a history of hiring engineering rockstars. I believe creators of libraries like BackboneJS, UnderscoreJS, and D3 were at NYT when they had invented those amazing pieces of work.


I think Bostock had already created D3 by the time the NYT hired him: https://www.reddit.com/r/dataisbeautiful/comments/3k3if4/hi_...


In fact, the creator of Backbone and Underscore — Jeremy Ashkenas — worked on this article until he left the NYT last year (to join the creator of D3, Mike Bostock, at https://observablehq.com)


Your argument that NYT is better at finding bots only makes sense if you assume that NYT found all the bots that exist. A more likely explanation is that everyone who goes looking for bots uses a different methodology and finds a different subset of the bots that exist.

Thus NYT can find some bots that Twitter missed, especially by using labor intensive methods that don't scale to all of Twitter, but there are plenty that NYT missed too. Their methods aren't necessarily better than Twitter's, just different (and likely much more expensive). Note also that the NYT people don't have to care about their false positive rate and don't have to adapt their methods to adversaries since this is a one time analysis.


Think of it like cancer. Identifying cancer, saying “You have a growth”, is one problem is medicine. There are ways of doing this and the cancer is just fine being identified, it is none the wiser.

The moment you start to treat cancer is the moment you realize that identifying and treating the problem are leagues apart. Cancer mutates, cancer squirms, cancer goes incognito until the storm has blown over. By attempting to treat cancer, we’ve made it our adversary and it now will fight against our treatment. UV treatments cause mutations that the cancer might use to avoid further treatment, some drugs target specific causes of cancer which the cancer will then change to, and some treatments just kill the patient (Just as Twitter can’t just ban everyone). There is a balance that these large tech companies need to find just as a physician needs to determine the correct path.

I know this is a very optimistic view of the problem, but it aligns with my experience working, generally, in this space at similar scale.


For the longest time, spammers and fake accounts on Twitter were just a name with a long ass random number on the end. I guess that particular cancer hit a local optimum where Twitter was pumping its numbers aka misleading the shareholders.


4) It's not hard to identify bots with 95% accuracy, which is good enough for reporting, but you need 99.9% accuracy in order to delete their accounts or else too many genuine accounts will be wrongly terminated.


Super easy to solve: identify likely bots, suspend account, and let owner do something trivial that only a human can do. False positives solved but the cost of maintaining large amounts of bots are prohibitive. Also, high-profile non-bot acounts should be trivial to identify with near certainty.


They could join the rest of the internet and serve a Captcha to the suspicious accounts!


They already do this but instead of a CAPTCHA they require a SMS. I don’t know if they had the option to enter a CAPTCHA instead?


How would they know on scale what removed accounts were controlled by a human? Some people may just not want to return after the ban. Bots can probably create new accounts and complain too and a few complaining high profile twitter users that would filter out to media to be verified by journalists don't really make a good study.


Interventions programmes of almost any sort face this problem, and it's part of an ongoing monitoring and evaluation effort.

Ultimately: you monitor direct and indirect effects, offer an appeals and communications process, and try to sort out who is and is not participating in good faith.


There are actions other than deletion which can be undertaken.

You might target suspect accounts for increased validation -- Captcha or other elements, or simply logging them out more frequently. Changing programmatic elements such that they cannot be scripted as easily (this is one of the primary credible arguments I've seen against APIs for large public services).

(Craigslist, and several other services, have applied what's effectively a service-degredation level to accounts suspected of malicious activity. It's not a hard fail, and it's possible for a real human to get past the hassle, but it slows down attackers considerably.)


It shouldn’t be too hard to get at networks. If you identify an account with suspect followers and then look at the followers common follows and follow dates. They can spread out the follows but still to get it done it any reasonable time the follow times are going to be pretty close.


I mean prolly somewhere near 40/50% of Twitter's 330 Million MAU are fake, the company will drop 50% or more in value if they ever decide to delete those accounts, incentives are aligned for Twitter to encourage bots by making it easy as possible (not adding captchas, etc.) and not enforce policy


All relevant parties in FB, twitter, Yelp, TA are very much aware of fake accounts, fake review, and any other bot-related traffic. Yes, they do have teams dedicated only for that. So the real question should be - what actions are being taken to counteract social bots. This is not something any company would want to go public about. So basically - we don't know.


Your premise is centered around tech companies wanting to shut these things down. Is that actually true?


And that they want it to a degree that exceeds the cost of doing it, and of keeping up in that arms race. Chances are they don't.


> The New York Times can graph and identify bots, and other publications / bloggers have been able to identify networks of them.

Because it takes a lot of manpower to do so on a wide scale and almost always raises the question of fairness - the NYT (and others) focus on the egregious examples, but if Twitter would do the same (and kill the followers) their customers (the big celebs) would complain, and when they say "okay, bye Twitter", the normal users who want to follow them would disappear... thus reducing the popularity of Twitter for advertisers. The bots don't count as advertisement recipients, so Twitter doesn't have advertisers complaining about fraud from these fake followers.

In addition, the existing (moderation) manpower is focused on keeping the system being overrun by actual, human-powered Nazi and other troll accounts - fake followers are waaaay on the bottom of the priority list. They don't generate bad press headlines about shitstorms or harassment.


For (2), developers aren't the ones making the decision to deprioritize bot detection. The decision makers don't care because the ROI on detecting/removing bots is likely negative, whereas using those ML people to improve ad targeting and engagement is much higher.


Yeah exactly. All the devs are working on business prioritized projects. Removing bots is not a business priority. Nothing to do with how smart the devs are.


(2) except the blame falls on company leadership - developers are not in charge of funding teams.

Leadership funds teams that generate positive ROI. A successful bot-hunting team will likely decrease revenue... although kicking out bots makes the platform more sustainable.

It's not just social media. The same question kicks around retail sites like Amazon - why are external researchers better at identifying fraudulent reviews? Because review factories propping up shoddy Chinese imports are very profitable for Amazon. These days customer obsession is just a "nice to have".


This is not a technological challenge, but a management one. Why would a product built to expand social graphs delete nodes that further it's size? Not gonna happen. Ever.


> I find it hard to believe that people outside of the ecosystem of these companies have more skill, knowledge, or capabilities

The methods described in this article would be corporate espionage. I don't know what the legal implications would be, but ethically, I'm personally a bit frightened by the idea of a company as large and in possession of so much personal information as Twitter getting into the espionage business.


The object of a political botting campaign is to promote views that are complementary to ones' own political agenda and remove those that are not. With an automated system that removes discourse that corresponds to patterns used by bots, a social adversary would only have to adjust their bots to resemble the voices they want to silence, for a perfectly platform-sanctioned way to silence people they disagree with.


It's not in the company's interest to ban these bots but if it really is that easy to track, we should do it from a third party.

A Twitter/Facebook/whatever client plugin that flags or hides bots seems feasible.

It's even in a certain business interest to track bot traffic for more accurate click through tracking so perhaps there's even a market need for this.


The work of finding exploits to these algorithms probably requires less talent than the work of researching and implementing them.


> There's only a couple possible scenarios I can come up with for why this continues to occur:

Or....this has nothing to do with talent and more to do with market dynamics - they lack any monetary incentive to change.


Q: What do you get when you cross the Weekly World News with a Baleen whale? A: The American electorate.


4) The tech is yet to achieve acceptable level of accuracy.


Irrelevant to the content of the article, but I am such a fan of the direction the NYT has taken with their interactive articles. Breaking up long-winded articles via relevant images is no longer sufficient to maintain most readers' attention. The NYT may just reach their 10 million subscriber goal [1] via these very enjoyable (often scroll-triggered) animations and charts/graphics and I hope they continue with these efforts.

[1] https://digiday.com/media/new-york-times-enlisting-interacti...


I like it, but it's also a bit confusing. I use a read it later service (Instapaper) to read articles from everywhere and interactive ones like this tend to break it.

That in itself is okay, it's just that you don't realize it's broken unless you open the full article and look for interactive bits.


I understand what you are getting at, but it's really a problem with Instapaper. We shouldn't really expect NYTimes to not exploit the full power of web technologies simply because read-it-later services may not present them in the same way. The readers have to adapt to the content, and not the other way around.


To some degree, yes, but then it's also a problem at Pocket, Evernote, Readability, ReadKit, Safari Reading List, ...

If the content breaks all of their parsers and scrapers because it's tied up in these custom dynamic components, then I wouldn't really expect all of the parsers to handle that themselves. (Of course one pure text standard would be nice.)

The bigger issue to me is that the omissions are silent.


I think what we need is a standard packaging format that will allow services to download interactive articles and all their dependencies easily. Then they can just render it in a web view.


> interactive [articles] like this tend to break [Instapaper].

I would argue it is squarely the other way around.


Completely agreed. This was an enjoyable experience for me, even on a phone. And it didn’t seem to come at the cost of journalistic quality.


Hmmm... It seemed to lose track of the scroll position completely by the end for me (Firefox on android). Possibly ad blocker related. Interesting piece, though.


> Reporting was contributed by Manuela Andreoni, Jeremy Ashkenas, Laurent Bastien Corbeil, Nic Dias, Elise Hansen, Michael Keller, Manuel Villa and Felipe Villamor. Research was contributed by Susan C. Beachy, Doris Burke and Alain Delaquérière.

I already knew he worked for NYT, but still: I wonder how much it matters that the creator of Coffeescript was second author on this article.


Those names are alphabetical, so I think he just wins by virtue of his natural name advantage :)


Hah, I've worked for academia for too long :)


Some branches of academia (high energy physics) list authors alphabetically.


I don't think you can really compare papers with thousands of authors to papers with a handful.


Opposite side. I opened the article, tried to scroll, and switched to Firefox's Reader Mode instead.

Just give the the motherlovin' text.

Side note: NYTimes formats tables in nonstandard ways that break both Firefox Reader Mode and Pocket (these use the same parser AFAIU). That's annoying. Standards exist for reasons.


Completely agree. Having enjoyed these pleasant improvements, I am considering subscribing again.


The "Families of bots" chart was so incredibly cool it floored me.


The social media platforms are not 'struggling to respond.' They have ignored the issue almost completely because it inflates their metrics and enables them to over-bill advertisers. This in turn corrodes the trust that people have in the web in general.

The prevalence of bots really puts stress on the whole idea that pageviews on free websites are the same thing as subscribers to paid services. It is very simple to mass manufacture fake views and fake users on free services. It is not so simple to massively fabricate metrics around paying users.


They do combat it—but only just enough to keep the system in balance.

IMO they should completely ignore it. If they let the bots run completely wild then like/follower/retweet/comment counts will all become completely useless and probably disappear completely.

The internet was a better place before every piece of content had a number associated with it. A number that serves no real purpose other than to manipulate you into thinking it is more or less legitimate/important than you would otherwise know at first glance.


Looks like this story has already triggered an investigation by New York Attorney General Eric Schneiderman!

"New York attorney general launches investigation into bot factory after Times exposé. Louise Linton, Randy Bryce, and Clay Aiken all bought fake Twitter followers from a company called Devumi."

https://www.vox.com/policy-and-politics/2018/1/27/16940426/e...

Eric Schneiderman @AGSchneiderman Retweeted The New York Times

Impersonation and deception are illegal under New York law. We’re opening an investigation into Devumi and its apparent sale of bots using stolen identities.

https://twitter.com/AGSchneiderman/status/957289783490957312...


It's been amusing to see fake follower buying explode on Instagram this last few years in the photography / modeling space.

Photographers buy 50-100k fake followers, which you can easily see because each one of their posts barely gets 200 likes, but people who don't check or don't know how the game is played think you're a big deal. With your 100k followers you can now reach out to models, who also bought a ton of followers themselves, and you can work together because now you're at a similar level of perceived clout, neither one of you is stepping down to collaborate with a "nobody".

When you work together, you think you will get exposure to each other's fan base, but in reality 95% of your followers are artificial, nobody wins in the end. Maybe you get some insta-fame from regular people casually using IG who have no idea how this stuff works.

You'll notice that brands rarely ever sponsor the type of people above because every time they let them run a promotion nobody ends up buying their products as a result of the shoutout. And that's because these people purchased their digital fame, and fake followers don't have credit cards to buy flat tummy tea or hair gummies or whatever they're peddling that day.

It's reminiscent of steroids, you probably wish you didn't have to take them, but if you're going to be playing that specific game, you have to take them or you'll be left out by those who do cheat.

Growing your instagram following organically these days is crazy hard. Both I and people I know with all sorts of numbers are mostly seeing constant loss of followers (fake accounts being taken down?) despite constant posting schedules of quality content. Discoverability through hashtags is dead as far as I can tell. It's reminiscent of trying to get big an an iOS app developer through the App Store alone, it's just not going to happen anymore, it's not 2007.

People (specifically photographers) are looking for an alternative to IG, but there's not one in sight. Ello is OK, but it's an artist circlejerk. 500px is a botfest. Steemit seems interesting but it's also being overrun by garbage content.


Why don't the fake followers automatically like those posts?


It's actually a service that these companies offer, but it's significantly more expensive over time since it's a continuous activity vs. a one time thing. I imagine it's a lot more maintenance on their end.

It's one thing to have x fake accounts follow you once, but it's another thing altogether to script the same x accounts to like every one of your posts every time.

It's actually a downward spiral for people playing this game: you buy 100k followers, not cheap, but you can take that hit once. However if you want to look legit, you have to constantly pay for those accounts to like your posts, which over months can add up to a really unpleasant bill. It's especially unpleasant because none of those followers are monetizable, since they're not real, and you're now on the hook to keep paying for likes in perpetuity, or people might call you out.

It's a reason why often you will see the nouveau-famous influencers claim that "IG shut down my account because my content was too saucy" as a way to save face and stop paying for likes. In reality they closed the account themselves and started from scratch because they could no longer afford the bill.

Again, most have no idea, watching this from the outside.


Thanks for clarifying. I had assumed it was all a big scam but didn't know exactly how it worked.


These companies have sold investors on the value of their social graph. Nobody was quite sure what, exactly, that value is, but it I think it is clear that it has one.

Early on, celebrities organically grew followings. This gave them a direct, bidirectional channel to their followers - there’s a value of the service, more than the graph. Later arrivals to the platform had a problem, and so did the service provider: how do you promote yourself to a broad subset of the network, quickly?

Recommendation systems have been around for a long time, so that’s part of the solution. But if you wanted a quick spin up, word of mouth and algorithmic solutions weren’t going to go that far. And what if you had an unpopular brand and we’re looking to improve it? Those things almost worked against you. Deliberate promotion is another outlet, but people are looking for viral magnification of their message. This is, I would argue, a real value to the network. Spontaneous memetic propagation.

This is possibly the only real value of the network at scale. But you can’t sell it, at least not directly. The appearance of deliberate, centralized control does not yield the desired result. People can distinguish emergent network behavior from facile promotion very easily, subconsciously even. The value of the network is in social ties. So people whose friends genuinely like something - they are more likely to like that thing, too.

If the service provider is going to leverage the power of the network for themselves, they need these bot networks. They need to sell indirect access to the network, because direct access is not the desired product. And the network effect is very powerful, as we increasingly see isolated subgroups creating virtual echochambers, and others paying to exploit and shape those shape those sub graphs.

These networks are powerful and potentially dangerous artifacts. I think we should have asked in 2009, could something like the Arab spring happen in a developed, stable nation? But nobody was asking that, I certainly wasn’t. It is a interesting time to be alive.


The guy behind the top 1 restaurant in London on TripAdvisor said that he used to write fake reviews.

Right now its not too difficult to identify patterns that tell real accounts or comments from fake ones, but as AI learns to write better and create more believable images and videos, I wonder how long it will take to create a bot account indistinguishable from a real one.

Maybe this should be the Turing Test of our time.


It depends what you are looking for. A bot today might even have a real insight, just by the luck of mixing two random points together. Just look at the top picks of r/subredditsimulator.

But it won't be able to hold a longer conversation - that seems like a far ways off.


> A bot today might even have a real insight, just by the luck of mixing two random points together

When playing Apples to Apples or Cards Against Humanity, and everyone puts their best attempts in a pile, add a random card from the deck. In many of the games I've played the deck wins by a long-shot.


>I wonder how long it will take to create a bot account indistinguishable from a real one.

They are already pervasive. That’s why I stopped using Twitter. It is literally impossible to tell if you are reacting to an actual human’s tweet, or just being manipulated by bots.


It's a little strange to me that this is news honestly. You've been able to go on fiver and get fake followers for almost a decade now.


Everyone knows that, but that doesn't tell you anything about the distribution and quality of people's online followings.


I think they were expecting something more sinister or nefarious to pop up from the investigation, but it didn’t. So they ran a long piece without much of a point, but with some cool animations and random celebs sprinkled in for the outrage factor.


Yup. Almost every bit of 'trickery' people and companies use to promote themself or their works has been a thing since the days of internet forums and mailing lists. Whether that be sockpuppet accounts, fake reviews, fudging the stats counters or anything else.

The only reason it's seen as some 'new' thing is because many journalists seemingly didn't use the internet that much prior to the days of social media, and like many people online seem to think a lot of patterns are newer than they really are.


Or they’ve been using it for awhile and thought of the problem as the guys posting blatant herbal viagra & work from home scams, not realizing that the spammers weren’t just the bottom feeders any more.


What if we all agreed that Internet points just don’t mean that much? It is ridiculous that in my lifetime the general public flipped from “Internet is for dweebs” to thinking numbers on a website are actually worth caring about.


> if we all agreed that Internet

Who is the 'we' in that exactly? This is like saying 'what if we all agreed that a fancy car or trappings of wealth don't mean anything' or 'what if we all agreed that an Ivy Degree doesn't mean anything'? [1]

[1] I am offering some marketing help gratis for a wealth advisor today (side project). I am going to tell her to buy a fancy car because it will work better than the soccer mom vehicle she now drives in landing and impressing clients that she is trying to tap. I don't need any research to prove that out either. I know people respond to cues like this (I own said cars) so I know it's a fair bet that 'we' is never going to get together and not care about that type of thing. So it's an investment in marketing and advertising not a cost of transportation. Now maybe you or hn readers don't think the same way and it would sway you and you don't care. Doesn't matter. Enough people care to make it a viable strategy. If I see a large amount of twitter followers I might browse the first page or so and quickly determine how 'legit' they are. But most people don't do that. They focus on the number only, right?


What I don't get is how appropriating someone's persona and account details (photo, etc) is not fraudulent?

I suppose it is hard to prove - since they purchased these profiles from foreign entity. But surely Devumi knew many of these profiles were stolen from real people without their consent.


The pseudocode in the article is a curious touch. I wonder where they're going with this. You've gotta love the convenient 'when' statement.


I was wondering about that. It magically sets up an event handler and callback with the next statement? Yes please.


It's pretty good that this kind of open secret gets coverage in the mainstream media, but it's disappointing that this article contents itself with a bit of celebrity shaming and only hammering on Twitter, probably because they're a bit more transparent (technically speaking) and easier to take down.

Other networks (i.e. Facebook/Instagram) are probably just as bad, and if so they are generating untold amount of ad revenue based on fake traffic and accounts.


Not sure the coverage will do any good anyway. First thing I did after reading the Times piece was to see just how easy it is. Not only is it easy, it's CHEAP! I paid GB £8 for 500 Instagram followers. Within 10 mintes I had just shy of 1,000 new followers. So they under promise and over deliver, all the while playing to people's ego. That buying followers is a thing does not surpise me.

I wonder how many others bought followers in response to this article.


It's sad that such dubious companies exists, but the saddest part is that celebs are paying literally tons of money to keep themself 'uberfamous' while these companies are counting the cash they just earned from them.

Even with newly created twitter accounts you get followers in like a day, whereby some of them are fake accounts, mostly using erotic profile pictures of some kind...

P.s. I like the NYT interactive articles, very nicely done with lots of details.


Everybody is in on it except for the casual folk, who I imagine make up the vast majority of the user base and have no clue what's going on behind the scenes.


Will Twitter ever handle this? Given how easily this journalist found this pattern I'm pretty sure it would be trivial to start banning these accounts with their internal metrics.


From the article:

"In an email, Mr. Leal said that buying followers for his business generated more than enough new revenue to pay for the expense. He was not worried about being penalized by Twitter, Mr. Leal said. “Countless public figures, companies, music acts, etc. purchase followers,” he wrote. “If Twitter was to purge everyone who did so there would be hardly any of them on it.”


> “If Twitter was to purge everyone who did so there would be hardly any of them on it.”

Ban a few, even just for a few weeks, and the others will stop. That's how society works most of the time.


Google made websites manually go through and disavow low quality inbound links to have ranking penalties removed. Twitter could suspend people and have them disavow followers. People having to clean out hundreds of thousands of fake followers would probably kill the practice pretty quickly.


Why would they want to do that? If they allow this to occur they can post better growth numbers to their shareholders.


Bingo. They have the capabilities to clean up their platform. But that would mean huge drops in “engagement” metrics.


Basic capitalistic economic theory says that you must seek to maximize profits long term.

If you allow fake accounts to flourish, you tarnish your brand and hurt your platform in the long term.

Of course, theory is one thing but the people in charge of making decisions might have no motivation to prioritize the long term viability of the company if they themselves don't really care about the company and the shareholders but just want to extract maximum amounts of value for themselves...


Yes, basic capitalistic economic theory is wrong. Many companies will do whatever it takes to maximize short term profits, as long as they can reasonably get away with that (i.e., not go to jail).


I can't speak to the past, but going forward, information businesses don't need to last forever, they can be useful in "waves". See Myspace/Groupon. Both have fallen out of "favor" but the brands still have value to "late adopters" or maybe even better "those lost to tech". I think we'll see more info businesses do this wave approach that is not about building the next GE, but really about short term exponential capital growth.


Hey, reading the article made me realize there's space for a tool to clean out all those bots from your follower list. People like @chefsymon who are now a bit embarrassed by their previous acquisitions.


I just searched for tools like this out of curiosity. There are a few but they all seem pretty sketchy, like only allowing one run or requesting dubious twitter permissions to run an analysis that shouldn't require auth-ing at all.


Manageflitter is pretty good. It identifies likely fake followers and lets you block them. https://manageflitter.com/


Devumi is a reseller. Services they use are near 0.1/1000 follower, even less, with pool of 200k+ users per Services, 3/4 pools (just use Google, for Facebook many are even real, collect tokens is fairly easy). And people that sell these services at those prices, are reseller themselves using API to control those bots, in really EVERY services you think about (even telegram); revenue made by likes/etc may be really a small part, just think about the power in elections + the the really small price


I don't care about people paying for extra followers. If that makes you feel special, be my guest. What I very much do care about is these bots that are being used by foreign nations and other bad actors to influence our national conversations and even our cultural zeitgeist. To propose a variant on Newton's third, for each powerful positive innovation, there is an equally powerful negative consequence. I wonder what can be done to stop this.


The US has being doing this for ages. The difference is that they used to pay mainstream media people in foreign countries to spread their vision of the world, and therefore poison the well of international news and commentary. Now the US is tasting a little bit of that in the form of social media, which is more polarized and difficult to control.


That doesn't make it OK for the individual people who live with it. This idea that it's somehow just payback for past bad actions by a government that most of them weren't aware of is no help to regular folk who are drowning in a sea of misinformation.


This is not a justification, it is just an explanation of what is happening. It is a well-known state of affairs throughout the world, where foreign powers constantly jockey to spread political disinformation. Only now this is also becoming reality inside the US because of changes in patterns of media consumption spearheaded by companies such as FB and Google.


Sorry for misreading your tone. You make a good point and I agree with your objective assessment.



I have found that my YouTube account, that has no intention of being famous and had a couple followers, is now getting a constant trickle of new subscribers every day, with funny names that look autogenerated. It may have some relation with changes in YouTube monetization that will demand a minimum number of subscribers. I hope my account will not be jeopardized by this.


What I'd like to know is what is the difference between paying an influencer to tweet about your product with their account vs paying Devumi to tweet about your product with their accounts?

Is the first any more legitimate?


At the least, disclosure.


I was personally amused at the number of purchasers of these fake followers "did not respond to request for comment"


From the article: Devumi sells Twitter followers and retweets to celebrities, businesses and anyone who wants to appear more popular or exert influence online. Drawing on an estimated stock of at least 3.5 million automated accounts, each sold many times over, the company has provided customers with more than 200 million Twitter followers, a New York Times investigation found.


When any new social network arises, is it worth building up a network of fake accounts on it?


To this day maintain that social media is a failed experiment and should be left behind in the dust for objective measurements.


Social networking IS the Internet in many ways. It’s been a natural extension of the Internet since the early days of forums and things like IRC.

To say social networking is a failed experiment is to say that the Internet is a failed experiment - nonsense.

Now I think what you mean to say is that you don’t like the UX of algorithmic feeds and firehose-style content/advertising. That’s fair.

But to write off social networking as a concept? That’s silly. The internet is so powerful because it is social.


The internet is inherently social, we don't need to reify that aspect in explicit sites for 'social networking'. It's a weird cargo-culty skeumorphism of a natural thing. I think we need maybe point-to-point and broadcast comm standards but not social networking sites that try to trap you and own you


I agree in the sense that any social media website which aims to create social interactions on a larger scale than the vast majority of humans are used to is a breeding ground for behavior like anonymous harassment etc. IMO the forum-type model is the best type of social media that i've seen from a community and usability perspective


Love Adam Ferris's work -- he did the collages at the start of this piece, highly recommend checking out his website [1]

[1]:https://adamferriss.com/


Wow, thanks for the link. His stuff is very cool.


Is that you, Adam Ferris?


just a bot


beep


bloop


You should have signed up for Old Glory Robot Insurance!

https://www.youtube.com/watch?v=KXnL7sdElno


Is there an executive summary of the article?


Upvoting you because this was a long read.

My summary - There are a lot of twitter users with fake followers. This company, Devumi, behind selling fake followers claims they're not really fake because they buy the followers on an influencer marketplace so they're "real" to the buyer (obviously seems questionable). Some people have their social identities stolen which are in turn used to create these fake profiles. Lots of celebrities, athletes, and other famous people have bought fake followers — some directly, others through marketing agencies, and others possibly completely without their knowledge. Most people that the NYT called out denied having knowledge of the purchase of fake followers or blamed a rogue employee. The bots interact in rings with each other so their activity patterns are detectable with analysis. Twitter is not necessarily incentivized to solve the problem because there are probably some bots it has not detected and considers real active users. The founder of the company lists a fake address as residence and fake university degrees as credentials.

It is a long read, but the article has several interactive features that keep it interesting.



Just read it.


there's bots on twitter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: