A consulting customer came to me a year ago, with a growth from 200TB/year in data production to over 6PB/year and their budget couldn't sustain that jump (or anywhere close to it)
Having come from the mass-facilities and data center space with MagicJack, I knew the wholesale cost of bandwidth, power and drives were continuously falling.
There are certain clients and use cases that need access to their data all of the time and the very bones they are built on is based on collaboration (Genomics).
For example, this client is now storing 6PB of data with us, 3 copies in separate data centers. We are half the price of S3, and we include all the bandwidth for free, but limited to a 10GigE per PB stored. This has worked out extremely well - we were about 20% (!!!!) the price of Amazon after you factor in bandwidth.
There are lots of challenges we faced, like over zealous neighbors in the environment, storing lots of small objects and high usage of ancillary features like metadata but for customers of any size. By putting the "tax" on bandwidth, a lot of these business cases are solved. I see why Amazon does that.
AWS is truly great, but as you get into very high scale (specifically in storage - 2PB+), it becomes extremely cost prohibitive.
It makes a lot of sense to be able to run loss making products. Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out.
In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic.
"In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic."
Which AWS products are loss leaders?
S3 storage pricing is not exactly cheap. Neither is EC2 instance pricing.
"Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out."
No, S3 would do well, GCE would do well, Azure would do well. Providers only lose out to the extent their products no longer compete on merit alone.
I can imagine that this is a good reason. Otherwise they could make bandwidth cheaper so that people who cannot move everything can at least move part of their applications.
I think the three providers are smart enough to know why they charge that much for bandwidth. And this is the only reason I could think of why all 3 of them charge that much. And I'm pretty sure that some products run at a loss, they do for nearly every company. But AWS won't tell us which ones.
It's reasonable to think that S3 is loss making or about breakeven on its own but recoups costs due to bandwidth charges.
I guess the latency between AWS Frankfurt and GC Belgium should be low enough (5-10ms) to use it for most applications. E.g. storing large amount of data at one provider and renting compute engines for processing at the other one. The latency shouldn't be an issue there, as long as the throughput is high enough.
Can confirm on this, storage for a lot of stuff is in S3 and compute is GCP preemtpibles. Works if you have a small dataset which requires a large volume of compute.
Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology
Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!
Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology
Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!
A consulting customer came to me a year ago, with a growth from 200TB/year in data production to over 6PB/year and their budget couldn't sustain that jump (or anywhere close to it)
Having come from the mass-facilities and data center space with MagicJack, I knew the wholesale cost of bandwidth, power and drives were continuously falling.
There are certain clients and use cases that need access to their data all of the time and the very bones they are built on is based on collaboration (Genomics).
For example, this client is now storing 6PB of data with us, 3 copies in separate data centers. We are half the price of S3, and we include all the bandwidth for free, but limited to a 10GigE per PB stored. This has worked out extremely well - we were about 20% (!!!!) the price of Amazon after you factor in bandwidth.
There are lots of challenges we faced, like over zealous neighbors in the environment, storing lots of small objects and high usage of ancillary features like metadata but for customers of any size. By putting the "tax" on bandwidth, a lot of these business cases are solved. I see why Amazon does that.
AWS is truly great, but as you get into very high scale (specifically in storage - 2PB+), it becomes extremely cost prohibitive.