Around 2010 I was working at Urban Airship (now just Airship) offering an API for mobile app developers to send push notifications. We put a ton of effort into making sure we didn’t drop pushes and our API was HA. At one point very early on I was restoring pushes out of error emails in mutt (not great, I know, I know).
The reason? Not some unrealistic view of ourselves as Google or Amazon, but because we knew we served pushes for a popular pill reminder app (among other reasons).
So just because you don’t have Google or Amazons scale, the data you store and process may be as precious to your users as Amazon’s shopping carts are to them. Over 10 years later there is a lot more medication in my life, and I’m really proud of the sometimes-ridiculous lengths we went to to get the pushes delivered.
Ironically, Urban Airship (now just Airship) itself is great example of something that people use even though they don't need it - due to hype. You have a mobile app and it needs to send notifications, but you don't have any in-house expertise on mobile notifications. Management says, "ok, let's outsource this to somebody who has some expertise!" They look around and find Urban Airship - the leader in mobile app notification handling!
So you "integrate" with Urban Airship, per management dictates (the same reason you used Hadoop, Cassandra and Kafka). They give you this big complicated presentation but you discover that, in order to use their "service" you have to:
1) instrument your application to send their API a notification every time something happens in your application.
2) set up a workflow in their system that tells them when to tell you that something happened that a notification needs to be sent for
3) instrument your application to read a dataset from them telling you when to send a mobile notification
4) actually send the mobile notification
You realize that they're charging your company a ton of money to do... absolutely nothing. You're still doing everything, responsible for everything, and they're nothing but completely empty overhead. They could have been replaced by a Postgres instance running on a single underpowered EC2 node (and it would be more reliable). But management sees "leader in mobile app notifications" the same as they saw "leader in distributed computing loads" and insisted that you use this thing that exists because it exists.
It's entirely possible that for your use case, Airship provided you with no value. But, just as a thought experiment:
- Did you also build the audience segmentation system (and UI) for your marketing team?
- Did you build a scheduled / triggered push system with per-device (or user account) rate limiting?
- Did you build the streaming data interconnect to pipe audience data back to anywhere else in your organization?
- Did you build the capability to remove devices that uninstalled your app from your notification list (surprisingly tricky, at least a few years ago, and APNS and Google will start getting very upset with you if you don't do this)
- Did you build an embeddable notification inbox with state synced across devices?
- Did you build a CDN for in-app notifications with video / images?
Because Airship did, and they will sell you all of that in one package. They'll charge an arm and a leg for it, but they do a lot more than just sit as a middleman between you and APNS when sending notifications. The relatively simple backend tech is definitely not their product, and if that's all you used from them then I would understand your frustration.
I built, all on my own, a dual-platform multi-app on-demand push notification service around the same time.
I can assure you, Urban Airship was not providing "absolutely nothing". If they hadn't been so expensive, we'd have used them and it would have saved me a shitload of time. Open-source support for push at the time was rudimentary (I assume that's gotten better? I haven't looked at it in about a decade) and in any case, doing it at any scale and even somewhat-good reliability involved a lot of work and dragging in tons of extra services (at least, a database and some kind of message queue service, unless you wanted to write those yourself)
I built one and then was forced to port it over to Urban Airship for "enterprise" reasons. I can assure you, Urban Airship is providing "absolutely nothing".
If you had certain scale assurances—a set number of apps you're sending to, never more than low-five-figures pushes in a batch, never overlapping sends at the same time, et c.—it could be a lot simpler because you could skip most of the queue management work, so sure, for some workloads and usage patterns UA probably wasn't helpful enough to be worth the money. It wasn't for us, either, but that's because we needed to charge for it and couldn't mark it up even higher than UA already did.
[EDIT] Now, somewhat after that 2010 date, maybe 2012 or 2013 IIRC, UA re-structured their pricing and offering a ton which made them even less appealing to us, since their focus seemed to be value-add features for marketers on top of push, which was something we had no use for whatsoever—it's possible that by then they were indeed more trouble than they were worth if you just needed push messaging, even putting aside the cost of the service itself. If your marketing folks were big on using that as a "channel", though, it was probably very much worth it. They had a lot of features for that kind of thing.
Seems wrong to edit this in, so I'll make this post for lookers-on to see what creating a reliable push messaging service looked like in the early '10s:
1) Your app needs to register with Apple to get a push token, using the app's push certificate (which you'd have to generate if your app used push messaging). It needs to send that to a server you control, because that's basically your "to" address to send messages. These were not to be treated as permanent, so to do it right you also had to be ready for these to change.
Then, similar story for Android, but all shoddier, less efficient, and worse-engineered, but also higher-level—so, situation normal for Android vs. iOS in (at least) the early '10s :-) So you're looking at two similar but necessarily entirely separate implementations of this sort of functionality, to cover both platforms.
2) You need to store all those tokens your apps are sending you.
3) To send the push, you need to generate a bunch of individual messages for each address. For Apple, you'll be using the app's push cert to authenticate over a raw (I think? Memory's fuzzy) UDP connection, then blindly firing a stream of data at it (no immediate return statuses, it's a one-directional communication channel). For Google, you'll need to manually chunk these into sets of a certain max size and post the chunks one at a time to some Web endpoint. For an app with one million users sending a broadcast notification to all of them, that's one million messages each time you send. Google's system required that you implement exponential backoff, too, if you didn't want to get banned.
4) Oh, if you're sending to multiple apps, you need a way to manage (add, update) the certs (or, for Google, IIRC, an API token of some kind?) for them and to select the correct one for a given push.
5) Google would tell you at send time if an address was bad (IIRC—it has been about a decade since I touched this stuff). For Apple, you had to wait a while after a send, then open a new connection and ask for a list of bad tokens for that app, which it will happily spit at you as another UDP data stream. These will be uninstalls, folks who've turned off push for your app, whatever. You have to remove those from your token list, facing vague threats of service bans if you keep sending to bad addresses for too long.
6) If your scale is non-tiny and you need any amount of delivery guarantees or reliability, it's plain by now that you need a queue of some kind. So if you're over that line but not too far over it, you hack something together in PostgreSQL or what have you, probably make some minor but OK-for-your-needs errors in the implementation, and live with it. If your needs are greater than that, this is where you start looking at shit like AMQP, which solves a ton of your problems while giving you an ongoing maintenance headache. Retries, fanout to multiple workers, aggregating failure data, et c. There's a lot going on there.
7) Do your senders want to schedule pushes for the future? Now you get to implement cron-for-push-messages, including a UI for it.
8) Do your senders want to send individualized messages? Now you need to hook into some kind of datastore that ties those push tokens to other user data so you can fill that stuff in while generating messages. And you need at least a rudimentary templating system.
9) Do your senders want to send messages in response to user actions in the app? Now you need a way to ingest potentially very bursty and high-volume traffic from a ton of clients, and connect those to configurable push-send triggers. Probably this'll be another thing you need to route through your queuing system, if you don't want to badly over-spend on servers while not having great reliability under load.
10) Push history and stats reporting is probably gonna be needed, at some point, by someone.
Yeah, sure, that's all true - but Urban Airship doesn't handle any of that stuff for you. You still have to do all that stuff, but call Airship somewhere in the middle of all of that.
This lines up very well with my recollection of the state of things circa 2011 when I evaluated the options for cross-platform push notifications and ended up recommending UA. I actually really _wanted_ to build the system in-house, because it seemed interesting, but I couldn't justify our tiny startup spending all that time that would be better spent on features our clients were asking for.
3) and 5) in particular were significant hurdles -- allowing UA to handle the API and TOS differences between Apple and Google was a major benefit. It wasn't just a matter of not wanting to implement it ourselves, but the risk of implementing it incorrectly and getting banned from the service.
Small nitpick: "UDP is considered a connectionless protocol because it doesn't require a virtual circuit to be established before any data transfer occurs."
Well, sure. Logically, your language/libraries almost certainly present it as something resembling a persistent connection, though, probably not a ton different from how they present TCP (just with fewer features).
However, technically correct is the best kind of correct :-)
I may also be mis-remembering, and APNS might use TCP. Either way, you're spraying bits at a socket and not getting any kind of feedback (until later, on a separate connection), aside from getting dropped if your cert doesn't check out (again, IIRC).
Particularly in the earlier days before APNS revamped their APIs to provide better feedback, yeah - you could spray and pray and honestly it'd work pretty well, but oh boy was doing anything complex difficult. UA legitimately simplified a solid chunk of that, and you could just hammer their API however you pleased rather than needing to maintain stable connections to APNS.
Where I worked did eventually drop them and start using our own redis queue + long-running process and that was plenty sufficient for what we did. We lost the "+1 to iOS's counter" feature (this was prior to background processing of notifications existing at all), but we had solid evidence that it simply wasn't providing us much benefit for the complexity/cost compared to just a numberless dot.
Oh no I forgot about badges! IIRC early on on iOS all you could do was set them to a specific number, so you had to track that state server-side (by having the app report when a notification was seen). There was no "take whatever it is and add one", you just (optionally) set an exact integer in the message payload and that number would go on the badge. That was a pain.
Yep, that's exactly what it was. UA had a feature that let you say to increment/decrement by N, and they maintained a cache + cleared it appropriately when the app launched.
Obviously we could do that too with some additional storage, but handling it across multiple OSes and weird iOS + Android lifecycle details was enough of a pain that we didn't end up doing it.
Slightly unrelated, but can anybody explain why private RSS feeds are not a more popular choice for "client push" when push lag of a couple of hours isn't a big deal? The "pill reminder" app made me think about how easy it would be to host a private RSS feed for each client that the backend updates on a schedule, and clients just... check the RSS feed for an update.
Maybe I'm missing something here, but it feels like you don't need some fancy realtime queue service for most push notifications, so why not use something that offloads the checks to the client?
RSS is a neat standard, but if you're writing your own app, you might as well use a custom JSON payload that's exactly what you need. If you use RSS but never actually use an RSS reader or expose it to the public, you may end up deviating from the standard, or doing extra work to be compliant with the standard that never gets used.
As far as push notifications, the reality is that "push" is a bit of a marketing term. Every time your phone wakes up and gets on the network, it contacts Apple or Google and sees if there are any pending pushes to be delivered. By using the "push notification" systems, you get to piggyback on that highly optimized connection already provided by Apple or Google. These pushes have to be much smaller than your typical RSS feed and are intended for high priority delivery.
An alternative, if you don't want your data going through the "push" networks, is to have your app "background fetch". You might want to do this if it's too much data (like an RSS feed would be) or if it's sensitive data (Apple and Google can obviously read all push messages). In this case, the iOS and Android have special "background fetch" use cases, which gives your app a chance to pre-load data. News apps are the canonical example.
> it feels like you don't need some fancy realtime queue service for most push notifications, so why not use something that offloads the checks to the client
Mostly because the realtime queue already exists, it's provided mostly for free, it comes with extra features like sounds and badges, it was available before background fetches, and business people know to ask for "push" by name.
I think it’s because most of the time a lag of a couple hours is in fact a big deal. Let’s look at some common use cases:
- If I want push notifications for meetings, I want to get them exactly 10 minutes before my meeting, not 2 hours after, or 10 minutes before but +/- 30 minutes
- If a customer is submitting a critical bug to my bug tracker, I want to know immediately so I can start damage control.
- If a developer on my team responds to my comments on a Pull Request at 3PM, I don’t want to get the notification after I’ve clocked out at 5, causing us to essentially lose a day of work.
In practice, I don’t think there are many use cases where people don’t care about the punctuality of push notifications. And if you start by tackling those use cases and then an important customer says “hey, I actually need punctuality for this new class of notifications”, then you’ll suddenly need to rebuild your whole notifications stack.
Yeah, I didn’t look at them at all today. I’m still alive. As a user, I don’t care about how punctual they are since I don’t look at them until the end of my work day.
At least for today, for you. Phone systems go down all the time and when that happens, people DO die. No one dies because a notification gets delayed a few hours.
Not that there is anything wrong with that, and in many use cases it would be a fine alternative. But sometimes you really do need to push data to the user even if they aren’t polling for it.
Pill reminders would be something that you’d want to push. You can’t be sure the user is running your app (even in the background), and the actual state changes are rare. So some kind of mechanism where the client doesn’t initiate the check would be better.
Using a long poll for email or slack or something like that can make more sense in some
scenarios.
I think they are differentiating between systems that perform polling and systems that can receive interrupts (maybe not quite the right word here), I don't think that has anything to do with being publicly addressable here, in either case you sort of have to assume that devices can find each other before the difference even makes sense.
Good point! In the mobile space, a certain walled-garden platform traditionally killed off any app 10 minutes after you left it, and even to this date does not guarantee that your app will be allowed to call home on any semi-regular basis. This makes it impossible to guarantee delivery without using the officially sanctioned push system. And if we're doing push already, why bother doing it differently on other platforms? (even if they support cross-app coordination to make the polling battery-efficient)
> Maybe I'm missing something here, but it feels like you don't need some fancy realtime queue service for most push notifications, so why not use something that offloads the checks to the client?
Depending on the use case, you may well depend on the real-time aspect of push notifications.
If my door bell rang while using headphones, I'd want to know immediately.
I'd also be pretty pissed if a messenger took more than 5 seconds between sending and receiving on the other end, event if "closed" or in background.
Totally fair. I mostly meant this as a replacement for notification systems that aren't super time dependent. Even 5 minute pings would probably be enough for most use cases.
I have build a notification sending service / gateway, and it is a pain to do. Just because Apple has no knowledge how to do proper web based apis.
For small apps, that's another service to build and maintain. Airship serves well that purpose.
Now, if you are a large company, and have a large volume of pushes, then it doesn't makes sense to pay Airship, but build a service yourself, and have a team to maintain it. Usually, even the smallest service, if you are doealing with Apple's (and Google's) external APi, will have trouble time to time, and you will have to have a team to 'passively' monitor and maintain it.
For a company with less than 100 people, that doesn't really make sense. But with a later stage company, it totally makes sense to have your own service.
Anwyay, just my 2cents, as I have done both, build my own Notification service for a startup, and used Urban Airship for my own personal projects.
Is that how it works? I would've thought it would be something like Twillo where you hit a endpoint when you want to send a message, maybe with an extra step to set up notifications on the users end in the first place.
In Twillio's case after a half hour of you could end up with some sort of send_text(number, msg) function/method that you can just use all over the place and if anything changes in the text message world they handle it.
Id rather do that than have to deal with the carriers, etc. The way you describe sounds like a pain with vendor lock in on top.
When it comes to mobile push notifications, your mobile app registers itself with the platform's (Apple or Google) push notification service and gets a token. It then has to send your server that token and your server can send pushes to it by talking to the platform's notifications endpoint and quoting that token.
These third-party push notification services are essentially overlays on top of the above, don't actually change the workflow at all (the above is the bare minimum you'd need for the functionality to work) but introduce extra complexity, moving parts, and more importantly, extra rent you now need to pay to them.
The main point of this article wasn't that these things are never needed or don't solve a real problem, but rather that a lot of people use them "because hype" without a good understanding of their problems, the strengths and weaknesses of these tools, etc.
UNPHAT is a good approach. Any webshop essentially has Amazon's "a failed add to cart will lose us money"-problem and there are a lot of webshops out there, so it's even a common problem! Cassandra may be a good choice even for a smaller webshop (I'm not familiar with it myself, and may be a poor trade-off with respect of ROI, but that's another thing), but it's definitely not a good fit for a lot of other things like batch jobs.
At my current $dayjob we run a low-volume high-profit B2B business; we have a few hundred customers, and it'll never be more than a few thousand – several ten thousand at the most if we end up wildly successful. It's a whole forest of microservices and will scale to the moon, but it's also pretty complex and difficult to work with. All that for a fairly small customer base using a fairly basic web UI they rarely log in to. We do actually have high-availability requirements in the core business, but that's separate from the management UI. It's a good example of not having understood the actual problems and just using something "because I read about it on Hacker News".
I would actually go farther and say the comment you are responding to supports the article's thesis.
They talk about remediating issues with email alerts. That is exactly the kind of approach you can take when you are nowhere near Google's scale, instead of building a complex system to ensure all of these remediations can happen automatically without any human intervention. Sounds like they took exactly the right approach of not over-engineering a solution prematurely.
I intentionally wrote the story without referring to the article to let people think for themselves about the meaning, and it makes me very happy that this was what you thought!
Ironically we kind of relied on e-mail alerts for too long and would kinda sorta break Gmail. While it was probably just the web/IMAP interface or browser/client itself struggling, trying to bulk delete tens of thousands of emails while hundreds more per second were coming in often had incoherent results.
I'm confused why a pill reminder app is driven by the internet at all. Wouldn't it make more sense to have the pill reminder app send a local notification at the appropriate time?
Excellent comment! A lot of projects already failed the "too much complexity" problem by just using the Internet when the Internet is unnecessary.
I remember a project I worked on in the past where the requirement was to have the app display a notification to the user at a certain time, like an alarm clock. Pretty simple, and could be implemented with a line or two of code as a locally-generated notification.
But no, the lead declared that we needed a push notification from a server in order to do this properly. Even after the difference between a local notification and a push notification was explained to him (he didn't know that local notifications existed), he insisted that notifications = push notifications, and we must have a server. AND that server needed to run a complex web service. AND it needed a way for devices to log in with credentials, so it needed a database. AND it needed to be fault tolerant since this was a critical user journey in the app. AND the clients had to check in with the server periodically, since their location might have changed to a different time zone, therefore the push notification time had to change. AND this AND that AND so on. All for what was essentially an alarm clock--something you could implement in the client code running on the device in 5 minutes.
Can we assume that the unstated purpose of push vs local was the ability to query various notifications without the need to separately upload queries somewhere else?
> app send a local notification at the appropriate time
IIRC this didn't exist at the time but is indeed the ideal solution. (some more discussion of this downthread)
If pill reminders were sync'd between devices (or between mobile and web verisons of the product), then push notifications would still be necessary to inform one side or the other when to resync data if nothing else.
Still, 2010 was a wildly different time in mobile. We were all still figuring a lot out (clearly)!
Man, not to take away from your point or efforts, but imagine missing your pill reminder because you’re on a road trip and you didn’t have cell coverage in Montana.
My recollection is also that local notifications didn't exist yet in 2010 on iOS, or had just been added. I tried to confirm it but it's actually pretty hard to find that information, it seems. Could be wrong. But, I was heavy in mobile and push notifications just after 2010, so I'm fairly sure that's right. I distinctly remember that as a feature that was added, at some point.
It's entirely possible the app author made a poor choice. We have no control over their code. All we could do is try to deliver our meager little component as competently as possible.
What I’ve seen more often is software engineers preparing for their next job (or the job they want next) by pushing to build a project with a tool chain that they want to have on their resume. It’s not so much that they want to use X because Google does and it solves their problem, but that they want to look like they know how to use X because then maybe Google will be interested in hiring them—and if not then at least they know that they’re as smart as all those stuck up Googlers anyway.
Me and a good friend of mine are principal engineers at our company. I tend to think we are well respected and have done a good job over the years. A few years ago we read our company’s job ads and realized that neither of us should even bother applying for our own jobs because our resumes didn’t have all the latest stuff that was asked for. Since then I constantly add new stuff to new projects even if it’s not strictly necessary’. One factor is to keep the resume up to date. Another factor is that you learn something and the new stuff may actually be better. You just have to be willing to back out. Microservices would be an example where a lot of people wanted to do it but it showed that in our case this architecture only added overhead without benefits so we scaled it back.
Resume driven development is a completely rational choice for employees. And companies have brought it on themselves with stupid job requirements and hiring policies.
There's a compromise to be made, and it does happen on the rare occasions when management can pull their head out of their asses long enough to realize that optimizing for throughput is a bad case of Goodhart's Law.
Sometimes you let engineers do things that don't necessarily need to be done because it makes them happy.
Not everything that might make them happy, not the biggest most ridiculous thing they say will make them happy (but objectively don't actually know until they try), but a few things, and consistently.
What I see happen often enough is that someone who finally who has no fucks left to give takes big gambles to get permission to work on something, and what makes it resume fodder isn't that it does in fact look good on the resume, but that they have already begun to check out before the project even started.
With the exception of some progressives, almost everyone who talks about retention or revolving doors only start talking about it after the proverbial barn doors are open and all the best animals have already left. Nothing you do at this point will get back the people you couldn't afford to lose and already did. You're really fighting the last war at this point.
I see that as a cultural or communication failure in your organization, not something intrinsic to software development.
If you have to sneak tech that isn't vetted into projects, then you haven't made the case to management well enough, or management is running their company into the ground.
You'd be able to accomplish a lot more with a lot less headache and cost by vetting new tech before going full-RDD it in a critical system.
This is the main reason, and you can't blame engineers. The only way to raise your salary is to job hop, so the only point of having a job is to train for your next job.
Also, as a company trying to hire people you need to use the latest fads to attract new employees. It is a vicious cycle for sure.
There is no room in this dysfunctional industry to use simple proven technology that is appropriate to your size.
I think you can definitely blame them and in fact I'd expect CTO's and managers to discourage employees from doing that. A small silly decision can later cost millions in rewrites or maintenance.
In the end we'll all benefit if not every brochure website needs to have 50 micro services and use the latest nosql solution. We're the ones who end up maintaining these systems.
Yeah, I don't get this "You can't blame people for doing the totally self-centered thing." I can and will. That's basically the point of blame: to shift consequences for a harmful action back to the person who made the choice.
That's not the only thing we should do, of course. Systemic problems are better fixed with systemic changes. But there's nothing wrong with holding people accountable.
Unless they have a real stake in the company (as in non-trivial equity grants) why the hell would they prioritize making you additional money (or saving money) by using tooling that will make them a less valuable hire in the future?
They are getting paid to accomplish a task for you, they are NOT getting paid more if they use tools that would be more effective but hurt their future prospects.
You can set guidelines around tooling if you want to prevent engineers from doing this, but otherwise assume that part of the reason they're willing to show up for the salary they get paid is to learn more and eventually make more.
If they can't make more through the company making more (equity) don't expect them to prioritize that.
I disagree. You're not paying engineers to develop their own careers, you're paying them to act strictly in the best interests of the organization they are working for. It's not just about "accomplishing tasks", but about using their judgement to accomplish them in a manner that best serves the company's stated objectives. That is what "professionalism" means, at least to me.
Is it in the best interests to lose interested and engaged engineers but to save 10% on cloud computing fees?
Is it simply making the most money?
Is it picking a tool you're comfortable with, but is hard to hire for?
Is it accomplishing the "stated objectives" even when those objectives are wrong, or immoral, or illegal?
Basically - unless I have equity (and more than .01 of a %) then the "organization I work for" is ME. I'm managing my time, and renting it out to the current highest bidder. I will absolutely work to ensure that bidder is happy and satisfied with the results of hiring me - but I WILL NOT devalue future earning to do so.
Is there anything more to your comment then a long elaboration of a cash-focused self-centeredness?
Certainly you seem aggressively unaware of anything other than "the company", setting aside other relevant parties, like colleagues, users, and society at large. While also ignoring quite a lot of things that even engineers value in their work besides money.
Econ 101 is an ok first-cut model, but it's pretty bad as a religion. There are more things, Horatio.
> Is there anything more to your comment then a long elaboration of a cash-focused self-centeredness?
Big ooof, given the entire discussion is predicated around using simple and plain tooling to save a company money, and then berating developers who dare to learn new tools on the job. It turns out part of learning is picking the wrong tool occasionally. Who knew?
You can discourage that. And people move on to jobs where they can pad their resume, and it gets harder and harder to hire people because they don't see a future in your "boring tech stack".
Is the chicken or egg first? Are companies constantly changing to hip new stacks to attract talent, or are developers constantly scrambling to gain experience on hip new stacks in order to be employable by companies who are constantly changing to hip new stacks?
Developers complain "I need to learn NewTech because OldTech is old and no cool company uses it anymore." And hiring managers complain "We need to move from OldTech to NewTech because we can't find developers interested in OldTech anymore."
My observation is that both occur and the balance tends to vary by company. I've been in the room when technology N+1 was selected because there weren't many developers for technology N, but I've also seen developers working cool and trendy things into projects that don't strictly need it (where there is a sliding Overton Window around what is "cool" or "trendy").
Just as there are people who'd leave if the stack is "boring" there are people who'd leave if the tech stack is insane. Personally I wouldn't want to work in a team that prioritizes resume driven development. It's usually teams that care very little about the company or product, their only motivation is tinkering with shiny tech. That's not for me.
Means to an end. I'd work in a team that prioritises resume-driven development for about a year, to brush myself up on all the new stuff. I think it'd be foolish not to, honestly.
It comes down to whether you see the software you write or yourself as the important product of writing software. Paradoxically, I see myself as the most important product. Usually my work is a tradeoff between being productive and teaching myself. If I'm being too productive, I back off and experiment more and teach myself concepts.
You can learn Kafka, Map Reduce, Elixir, DynamoDB and shiny new JS framework X like Meteor or whatever, or you can just go deeper on the fundamentals and the traditional tech you already know (learning SQL to a deeper level, becoming really good in Java/.NET/PHP/vanilla JS etc etc).
I think the job market has room for both; and to me the second path is more enjoyable and sustainable but to each his own.
I agree but we can put more focus on the product, code quality and users if we don't practice resume driven development that much. And still earn a nice paycheck.
For sure. It costs when you put it in. It costs in maintenance (or in working around unmaintainable systems). And then it costs to take it out again.
I was brought in on a cleanup job a few years back and discovered that key parts of their infrastructure went through a system nobody understood and were scared to touch. It was using Docker, but for no obvious reason, and it was version-locked to a very specific release in the 0.9 series.
After some archaeology, I found the original commits. They were from an engineer who wasn't there long. On a hunch, I googled him. I found a video of a conference talk on Docker where he exaggerated the scope and value of what he had done. So I ripped it out again, making the system simpler, more reliable, and less scary for the staff.
“Management requirements were that the system would scale to that level. Probably overly optimistic in their part, but it did give me the experience to work at a really high-scale place like here!”
I know a project where management mandated a super-scalable key-value store, and the total amount of data will neber exceed 4 Mb (yes, simple megabytes).
"I would have used other solutions such as....but there were people in the room who had more 'tenure', 'experience' and 'clout' who insisted on mapreduce so that's what my leadership team bought in on".
I'd love to be able to say this in an interview one day. And I'd be telling the truth (not about Mapreduce but something else, let me tell you about the time I lost an argument with a Senior/Staff Engineer who wanted to write an entire PHP library for log files on a single-serving linux host instead of using cron and logrotate).
But like you mentioned, it's a type of question I've yet to be asked. Depending on the room (ala "read the room") I might volunteer that bit of information in a more 'interview friendly' way.
I ask the equivalent DS question in a lot of my interviews.
Generally people tell me about the hot new XGBoost model they fitted, and I ask them how much better it was than logistic regression.
9/10 they don't have a good answer.
That being said, my favourite answer was from a candidate who told me that Kaggle contestants didn't use log reg, and therefore it was a bad choice.
Oddly enough, we didn't hire that candidate.
OTOH, I've worked at places where they wouldn't hire you if you didn't talk about your awesome neural networks/XGboost experience, so it really depends on the interviewer.
Especially as more senior engineer I would love to discuss issues like this and talk about pros, cons and trade offs of different decisions because that’s the world I live in. But from my experience most interviewers aren’t interested.
Also known as "career driven development". Controversial idea: I don't think this is necessarily a bad thing. If an org is using sexy tooling that people want to use and looks good on a CV, then it keeps engineers happy and attracts new engineers who want to work with it.
Disclaimers: provided it's not an actually terrible fit for whatever you're trying to do, or the overhead is enormous, etc.
The funny thing is, Google doesn't care that you have three years' Kubernetes experience. They want you to solve leetcode problems.
So, it would have been better for everybody if you chose a boring technology, kept the system running with no hassle and no 3am pages, and spent the quiet evening studying on algorithm questions. But apparently that's not sexy enough.
Years ago my current team was horrible about this.
Perhaps the worst case (but there are so many).. we were 99% java. We hit containerization and a team member hard sells introducing a container that did a relatively small job but was written in Go instead of Java/Python/whatever that is already present.
Team member literally resigns to go work at a Go-Centric job the day they got the merge request approved. I heard about the resignation IN the code review meeting.
Luckily it is/was a tiny thing that hasn't needed much maintenance.
One of the funny things about this being about the "cool kids" doing stuff is the people doing the following were often viewed as the "cool kids" but could only stay that way for a certain amount of time until their tech debt from over-engineering everything caught up with them.
So in other words... we actually are being hyper-rational, it's the people who discard any experience in anything that's > 5 years old that are creating the problem.
It's called RDD (Resume Driven Development) in my parts.
It's sad that software engineers do it to each other, because this year's RDD project is next years legacy rewrite.
If people could be mature and thoughtful and select technologies that have a natural sympathy for the domain and problem they are trying to solve, we'd all be a bit happier.
In my mind, you are shirking your duty if you practice RDD.
The paradoxical thing about this for me is that both the most junior and senior engineers fall victim to this line of thinking.
I have done so myself, on both ends of the experience/skill spectrum. As a junior it was because the tech was exciting, and the cool kids were doing it so obviously it was the Right Way to do things. As a senior because I'd been a part of the scramble to scale short-sighted systems in a startup that had found product/market fit and didn't want that pain again.
Turns out you can build the most scalable thing and people still won't flock to your product. And you wasted all that iteration and learning time.
What I have learned is only to plan for the next order of magnitude from a sensible starting point. If you are creating a new e.g. AirBnB, you will need a minimim viable performance, let's say 1000 users at a time on the site. After that, you plan for 10000, then 100000 etc. You might get multiple orders of magnitude from the same improvements or changes but you only need to get one more step.
Why? Your team and expertise will be different in 3 years. You might get bought-out or a new thing might come along that suits your model really well. Eventually, you might have enough money to have an entire data centre with 100 support staff but you certainly can't plan for that on day one.
> You might get bought-out or a new thing might come along that suits your model really well.
Once you get bought out, you'll likely need to integrate or replatform to the buyer's tech. Just look @ Google acquisitions. They spend the next few years paying down tech debt and scaling, so don't waste your time on that pre liquidity event. Spend that period of time on product improvements and market expansion.
> What I have learned is only to plan for the next order of magnitude from a sensible starting point.
This is exactly what Google suggested. In particular, it was Jeff Dean who shared this piece of advice in one of his talks on scaling Google's infrastructure.
I do the same. But I've done it enough times now that I usually start at the 10k number, just to survive a huge press surge without trouble. Hardware and software is good enough now that this isn't usually much of an overhead.
I have spent more time in my career than many people think is healthy trying to plot out contingency plans for 'what if' situations. It stops being irritating the moment the building is on fire and all of a sudden people want to listen to me.
My relationship with YAGNI has ebbed and flowed over the years, and as with any 'problematic' relationship, you may or may not see your own relationship clearly but people who have it worse are still easy to identify. Whether you then ask if you're like that is a matter of wisdom.
What we hear about, what we interview about, what we write about is these cool huge projects from the heroes we want, but the heroes we need are the people who figure out how to solve problems in ways that leave the option of solving them differently in the future. It's a little dangerous to say things like that out loud because lots of people hear that as "I'm going to build a configuration engine that I can use to swap out implementations at startup/runtime," which is pretty much the merry-go-round we are all stuck on half the time.
No, what I mean is go back way old school, taking some notes from Bertrand Meyer, and arranging your code so that there are 'spots' where major changes in functionality seem to naturally fit. I may have mangled this story in my head over the years into a parable, but I still recall hearing my uncles waxing poetic about, the Chevy straight 6 engine block that was popular during the height of the muscle car era. This engine did not have a particularly high horse power to displacement ratio. But in those days engine bays were fairly empty, and the Straight 6 was overbuilt just enough that it was a dream to modify it, and easy enough to work on that many people did. They bored it out for higher displacement, modified it for higher compression ratios (naturally aspirated or blown), hung additional accessories off of it, you name it. Many people were running around with cars that had over 50% more horse power than the stock version, and some crazy bastards who went considerably higher.
In short, this engine was not that particularly great, but it was full of potential.
As someone who may or may not become the next Google, (it's been a long time since I worked for anyone who shared opinions like that, when Google was smaller and only dreamed of being as big as they are now), I don't want a system full of features. I want a system full of possibilities. IF we wanted to do this, you would add it here, and verify it here. But we don't, not yet, so if you could just not break 'here' please, that would be great.
Nobody is collecting those stories and patterns. They're hidden away in Meyer, Fowler, dropped as throw-away lines in lectures, and discussed at length over coffee, noodles, or beers but never written down. Elsewhere they are really hidden in older aphorisms like 'premature optimization', "single responsibility" and that ilk, so much so that they risk becoming bad advice.
There's a little too many of his examples I've seen in the real world.
We where trying to bid on hosting an application that required Cassandra and Kubernetes. The application was built that way, that was how it currently ran. The customer wanted to move to Azure and try to save a bit on hosting. We reached the conclusion that they could use CosmosDB with the Cassandra API enabled and given that it was just a single container it could actually just run in Azure Websites, no need to Kubernetes. Then we started to dig into what the application did, and how much traffic it received. It was just a few hundred http requests per day, and a stable 2GB of data. This should just have been a tiny Java application and SQLite or MariaDB. We did not win the bid, neither the customer nor the developers seemed please with our conclusions.
Kafka is another fan favorite with developers. It's not that Kafka isn't great, it is, but if you are only pushing a 2500 messages per day, it's a bit heavy on infrastructure, maybe just stuff those messages in your database.
We've build infrastructure for large national projects (in a small country) and you can easily service an entire population on a single MariaDB on a modern server and few VM for the application.
Depends on what you do with it. For our needs it great, low maintaince, easy to deploy and stable, but I do understand where you’re coming from. Kafka can be terrifying when it goes bad.
There's also "You Are Not Microsoft", i.e. you do not have millions of users who would be impacted if this API changed. Sure, if you're at a scale where a few thousand people would be impacted, or at the severity where whoever is affected would be really affected (e.g. medical software), do not break compatibility. But if your product is used by a dozen people and it's not keeping them alive, it's maybe okay if it breaks. Sure, you alienate those dozen people, but you can't keep a company running on a dozen users.
And besides, if you have a dozen users, you can literally afford to spend time fixing their specific setups. That's the other flip side. People assume that because Microsoft/Google/etc. provide impersonal, poor customer service, they should provide equally impersonal, poor customer service. Bad customer service is a trade-off you make to have scale. If you don't have scale, you can avoid that trade-off.
I don't think the reason I'm about to discuss is behind all of these stories, but I think it is a persistent force in our industry.
I am somewhat convinced that a not-insignificant portion of engineers is under-employed in an intellectual sense. When $BIGCO announces the release of a new tool that supposedly helps with their planet-scale needs, the engineer just lights up at the thought of needing to dig into a new challenge. (The impulse to grow is well and good!) It also helps that places like HN buzz with new tools, it is good resume material, and a vague feeling of not being left behind if said thing takes off.
The real fix is to get away from domains/companies that bore you in this way, but it is not an easy realization. There are a ton of interesting problem domains out there! I did this and am much happier with my career.
I'm not sure I agree. I'd prefer working on a boring MVC software, being proud of it being reliable and having the time to work on features my customers will genuinely like rather than extinguishing fires every weeks and adding band-aids everywhere..
OK. But... some of us do need to use Cassandra? Some of us do need to use Kafka? Like, more than 5 companies definitely do work that requires that level of scale...
Of course bad engineers make bad engineering decisions, and most engineers are bad, but that isn't really interesting to point out imo.
> How much data do you have exactly?
Petabytes or exabytes at every company I've worked at, none of which are names mentioned in this article.
> But have you done the math?
Yeah.
I mean, I've read this before, so I'm gonna stop here. I get it. Engineers are bad at their jobs so they default to picking the technologies they've heard of instead of the technologies that make sense. But like... working with massive amounts of data is not that rare anymore. It wasn't in 2017, it definitely isn't in 2022.
You're essentially agreeing with the author's premise. They're saying that you should think about your business domain problem and choose technologies that solve that problem, even if they aren't the big, flashy technologies used by Big Tech.
If your business domain involves handling peta/exabytes of data, then by all means Cassandra is right for you! Most companies don't handle nearly that much data however, so using Cassandra for a database to manage only a few gigabytes of data is overkill.
Of course I agree with that. I just think that it boils down to:
a) These tools are for companies that do XYZ
b) Lots of engineers are bad at choosing tools
both of which are pretty obvious. Except it also sorta makes this other point "you aren't Google" but... a lot of us actually do stuff at scale. A lot of us. A lot don't, I'm sure.
This is a good point but the end is wrong. Your new hires won't go to Google, you just won't get any hires. Because I guarantee you're not paying Google wages.
This is such a great thing to keep in mind. As a junior dev trying to build independent projects for the first time, one of the biggest issues is figuring out what the appropriate level of complexity should be. Building a shed in the backyard doesn't require advanced skyscraper architectural design software. You just want the shed to robust, not flimsy, and for that much simpler tools (a level, a square, a plumb line) work fine. Otherwise you end up never building anything because all your time goes into trying to configure the advanced tools you read about, and it turns into another dead-end mess.
It doesn't hurt to know that such advanced tools exist, but I'd think that part of getting an introductory job at any decent firm would involve being trained on how their advanced tools and systems are used in practice.
Some developers underestimate the cost of building slow webapps.
For example Magento. Magento is extremely slow. I know companies who pay hundreds of dollars for hosting per month to make it faster.
Working with Magento always means thinking about scalability and complex architecture.
But in reality webservers are extremely fast, even cheap ones. If Magento was built as a good app you would only start to worry about scalability at the time you had a million dollar webshop.
I interpret the problem a little differently. I don't think the reason people want to copy Google is out of a misguided delusion that their app is going to have the same scale. At least not most of the time. The reason is that software is so bad. Plain and simple. Google is one company whose products seem to be reliable. Most software is a soul-sucking exercise in futility to use them. They error out and do unexpected things randomly. In contrast, Google does a great job in providing a quality experience. People naturally want to know how they do that so well. And to do so at such high scale just adds another layer of impressiveness.
people want to copy google because 1) they like smart, complex things 2) they want to work on complex things.
nobody got an award and praise for choosing the simplest thing that could work, doing that and building a service that is so reliable that nobody knows it actually exists (ie no problems or incidents).
also, the tragedy of software development is that the squeaky wheel gets the grease and that pyromaniacs are working as firefighters. and are rewarded and promoted based on their firefighting skills.
in contrast a well build house, with proper fire retardant materials, sprinkler systems and proper ventilation is meh and not exciting.
i constantly have to ask the “kids” at work: do you need this? what tps do you need to support? how are you going to maintain this (at what cost)? do you really want to reinvent the wheel?
constantly. i am afraid that most people nowadays practice resume driven development.
k8s has its use cases, but i am always baffled by people refusing to use the abstraction the cloud they leverage provide (things like vms, load balancers, autoscaling groups) in the name of using some hot new tech and some benefits that are not clear to them (again nothing against using it if you understand it and fits your use case, but this is not what I reach for as the first thing)
Managed Kubernetes has such low overhead though, and gives you so many options to play with later on (across clouds, on-prem, serverless). Some cloud providers don't even charge you for the control plane, so you're not even paying for the overhead.
If you've got a legacy application that doesn't play nice in containers, I get why you would stay away, but if you're writing something new- it's so easy to take advantage of K8s.
Kubernetes is a good tool to run a service inside a VM and front the service with an LB. And if you ever have a need for more than that- you're ready.
I think your argument is that K8s adds a lot of complexity, but that's just not true if you're using a managed service. If all you need is a couple of workers and ingress, you're set up in a few minutes
using a managed service for what? the k8s control plane?
so what are you actually using? containers?
> And if you ever have a need for more than that- you're ready.
If you ever need more than that you can evaluate it at that time. K8s is creating the maintenance nightmares of tomorrow today.
all clouds have services dedicated to orchestrating resources (cloudformation in aws land, deloyment manager in gcp, etc etc). K8s is not better suited for this purpose.
I did a job interview today and showed one of my projects and its code, explaining some of the reasons why it's written the way it is.
Much of it is fairly straightforward and simple. That's fine, because it wouldn't really get a lot of advantage from more complexity. Even so, I felt almost ... embarrassed by the simplicity. It's not that I actually am, I think the code and overall project I showed is pretty good and something I'm proud of, it's just that explaining "yeah I just did it like this, pretty simple really" doesn't really sound all that impressive over "we integrated such-and-such fancy tools with the so-and-so pattern using the blockchain cloud nano-architecture running on o31h!"
This thinking you're Google is the opposite of technical debt, avoiding doing something and then paying a recurring price for the expediency. This is paying a price up front by doing something which is then never used.
I think his comment is spot on. Many problems have a naturally limited number of users/objects. You can scale up. Scaling up is seen as sacrilege, but scaling-out is expensive in human labor. Scaling up is inexpensive in human labor, and often inexpensive in hardware, and sufficient for many problems.
A terabyte of ram has been a technical solution to a number of problems in genomics for quite some time and the players were quite willing to pny up the $250K it took to buy that a decade ago. Amusingly, the problem can be converted to... mapreduce and it doesn't need the RAM.
People dont do this because they misunderstand they aren’t google.
But because it’s cool and will look good on a resume.
Esp when, you know, apply to Google.
The author is 100% correct here, but it's not going to change anything, and I'll tell you why.
Companies LOVE requiring all of these sexy over-engineered tools in the requirements for the jobs they post.
I've definitely used these overkill solutions in my own work, sometimes because they were the right tool for the job, but many times simply because I knew that one day in the future, I'd be applying for another job, and that job would likely require that I have experience in these technologies. It sucks, but it's the way the world works for now.
This has a couple of downstream effects too. If I'm trying to hire, I need to include at least some of those sexy tech stacks in my posting if I want to get good applicants. Why? for the same reason I need to use those things myself even when they're not the best tool for the job.
The folks applying to my role know that they're not going to work for me for the rest of their lives and want to been able to check the right boxes when it comes time apply for their next job. Then of course when they get hired in we have to use at least some of these technologies even when we don't totally need them, because after all, it said right on the job posting that we use those technologies.
I saw the inverse of this too. There was a team managing a series of small-ish but incredibly important databases. They had these thing running on Microsoft sql server and were doing a beautiful job. There was exactly zero reason to migrate them to something else. But....when the folks in this department went to apply for other jobs inside the company or out, they were greeted with: "You're not using Snowflake??!!", "you're not doing Cloud?!!", "you don't even have an EMR cluster??!!".
Not surprisingly, after a while this group had a hard time hiring because folks saw it as a career dead end.
TL;DR
people do what they're incentivized to do. Stop incentivizing dumb behaviors.
“There was a team managing a series of small-ish but incredibly important databases. They had these thing running on Microsoft sql server and were doing a beautiful job. There was exactly zero reason to migrate them to something else. But....when the folks in this department went to apply for other jobs inside the company or out, they were greeted with: "You're not using Snowflake??!!", "you're not doing Cloud?!!", "you don't even have an EMR cluster??!!".
Exactly. A lot of people who are doing a great job for years wouldn’t even get hired by their own company. A few years ago I realized this was true for myself. Since then I am making an effort to keep myself hireable even if new tech wouldn’t necessarily make sense for the company. But it makes sense for my own career survival.
The solution is to separate "we use" from "we're looking for". Eg, a job post can say:
We use kafka, big-scale-databigness, BuzzwordDb, and CoolFramework. We're looking for smart engineers with an open mind and a commitment to building to high quality, robust systems. Minimum qualifications:
* Several years developing user-facing web applications
* Experience with OOP in a language like Java, C#, Smalltalk, or similar
> If I'm trying to hire, I need to include at least some of those sexy tech stacks in my posting if I want to get good applicants.
Idk , I think after 10-15 years in the business not all senior devs get so excited by buzzwords. Some do, sure, but others just want to work on a high quality code base with smart colleagues, interesting product and a good work life balance. I couldn't care less if you threw Kafka in there but hey that's just me.
But sure, the younger you are the more inclined you'll be to prioritize shiny tech.
From the article - (and I find this attitude rampant!)
Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person’s Hacker News comment to another's blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place.
> MapReduce/Hadoop is a soft target at this point because even the cargo culters have realized that the planes ain’t en route.
Side question: what is the state of Hadoop? I remember it had a huge ecosystem, but a lot of it seemed half-baked and is dying, so I'm curious how popular it still is, what other tools people are using for massive data transforms, and which parts of the ecosystem (Pig, Hive, etc.) ended up being popular vs. dead ends.
From what I've seen, the trend amongst today's same class of user (i.e. startups with analytics/log data) is to use Redshift or BigQuery, or just to farm out the equivalent work to a third party (e.g. Datadog).
I like UNPHAT idea, as long as it is executed in an agile way. That is, at a given moment, perform multiple stages that can be done safely without feeling blocked by other stages, then iterate, where in each iteration, one tries to do all possible stages better. For example, after reading papers, don't consider previous stages "DONE", and revisit understanding the problem based on the new understanding, and so on.
You may not have google’s scale problems, but even if you’re much smaller than then, if you have a 24/7 web presence you do still have the same browser compatibility, security, accessibility, latency, analytics, monitoring, dependency management, compliance, etc. etc. problems that Google has.
Not every piece of technical complexity is designed to solve a scaling problem.
I agree, I learned Kubernetes, we use it in our company and to be honest, we could live without it, but it was just ugly, due to lack of good standardization. Now we have GitOps and can easily use tools like Argo Workflows, Argo CD and Argo Events to execute practically any kind of reasonable activity, make it reusable and everyone can see how everything works, at least on dynamically created k3d cluster provisioned with Terraform for development/testing/learning purposes. It's also easy to show everything you create on high level and communicate it to others, because it's easily presentable with e.g. draw.io
It doesn't of course solve all kind of issue, because software development is very complex area, but it's making it easier to reason about some solution with others, so we can make what actually client wants without many dirty hacks and rediscovering the wheel.
Do we need all these tools and concepts to just build cooperatively, some secure APIs, some front-end, automatically test them and ensure that solution will be highly available under light or moderate load?
Probably no, but it would be much harder, even if our solution will be used only internally to handle up to few hundred requests per second and not millions of them.
With Kubernetes I can setup almost everything on my own, with just virtual machines or servers and some networking change requests and assuming that infrastructure is stable, solution deployed with Kubernetes will probably be also stable. There are also managed Kubernetes clusters provided in cloud, where cost might seem high for smaller companies or there might be worries about security of using cloud, but once you get it how to create proper cloud-native solutions with Kubernetes, everything becomes much simpler.
However learning curve is steep, so beginner might still have issues with efficient use of Kubernetes or other advanced setups, if there won't be good guidance on solution from documentation and/or meetings.
> ... one student’s company had chosen to architect their system around Kafka. This was surprising because, as far as I could tell, their business processed just a few dozen very high value transactions per day.
I found this immensely funny as a business case that could (and should) be done with a basic SQL-style database like SQLite.
I can one can even say for those in these larger companies that you are also not the part of the elite core products. You are not on Google search, or AWS cloud, and so on. You are on that crappy Analytics dashboard that’s slow and confusing.
The reason a lot of people chose something out of Google book is not because they get confused about not being Google (which is actually a pretty insulting and also idiotic assumption), but because they wanna benefit from Google putting a stamp of approval on certain tech and pouring resources into it. Tech changes fast, disappears quickly, there is a lot of it out there, most of it is readily available, and so we make use whatever signals we can gather to navigate this space.
Which doesn't mean that you can close your eyes and pick whatever tech for whichever use case (duh?) but Amazon or Google doing it differently in the past or a company not being as big as them is certainly not a great argument for not using parts of their stack.
But you do need to have _an_ argument for why using their tech is good.
> benefit from Google putting a stamp of approval on certain tech and pouring resources into it
On its own this is still not a great argument; Googling pouring resources into solving a problem does not help you unless you _also_ have that problem.
Choosing technology that's battle tested at scale and under crazy load seems extremely reasonable. Just from a "I need this thing not to have a shitload of bugs" perspective. If you were choosing between two libraries, and one was used by 5 people and one was used by 5 million, it'd be hard to go with the one that is used by 5 people even if it is a better fit for the problem.
Sure, but I think this is not the shape of situation people are generally facing when we say "you're not Google". E.g., often you see choices between NoSQL-esque solutions like BigTable, Dynamo, etc. and traditional relational DBs.
Off topic! This author combined the folksy 'aint' with the fancy 'en route'. Was it jolting for you too? Not only for the weird combo but for it's awkward meter. :) -- I apologize in advance for this comment.
The reason? Not some unrealistic view of ourselves as Google or Amazon, but because we knew we served pushes for a popular pill reminder app (among other reasons).
So just because you don’t have Google or Amazons scale, the data you store and process may be as precious to your users as Amazon’s shopping carts are to them. Over 10 years later there is a lot more medication in my life, and I’m really proud of the sometimes-ridiculous lengths we went to to get the pushes delivered.