Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's exactly why I started Ploid.

A consulting customer came to me a year ago, with a growth from 200TB/year in data production to over 6PB/year and their budget couldn't sustain that jump (or anywhere close to it)

Having come from the mass-facilities and data center space with MagicJack, I knew the wholesale cost of bandwidth, power and drives were continuously falling.

There are certain clients and use cases that need access to their data all of the time and the very bones they are built on is based on collaboration (Genomics).

For example, this client is now storing 6PB of data with us, 3 copies in separate data centers. We are half the price of S3, and we include all the bandwidth for free, but limited to a 10GigE per PB stored. This has worked out extremely well - we were about 20% (!!!!) the price of Amazon after you factor in bandwidth.

There are lots of challenges we faced, like over zealous neighbors in the environment, storing lots of small objects and high usage of ancillary features like metadata but for customers of any size. By putting the "tax" on bandwidth, a lot of these business cases are solved. I see why Amazon does that.

AWS is truly great, but as you get into very high scale (specifically in storage - 2PB+), it becomes extremely cost prohibitive.



"By putting the "tax" on bandwidth, a lot of these business cases are solved. I see why Amazon does that."

However, S3 has the same egress pricing as EC2. Do you think it's really a "business case tax" they're applying across all services?


It makes a lot of sense to be able to run loss making products. Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out.

In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic.


"In the current world, they can keep prices for some products below costs but make their money with bandwidth and the other services people are forced to use to avoid egress traffic."

Which AWS products are loss leaders?

S3 storage pricing is not exactly cheap. Neither is EC2 instance pricing.

"Otherwise everyone would use S3 together with Google compute engine and Azure databases (let's assume they'd be cheapest). In this scenario all providers would lose out."

No, S3 would do well, GCE would do well, Azure would do well. Providers only lose out to the extent their products no longer compete on merit alone.


I can imagine that this is a good reason. Otherwise they could make bandwidth cheaper so that people who cannot move everything can at least move part of their applications.

I think the three providers are smart enough to know why they charge that much for bandwidth. And this is the only reason I could think of why all 3 of them charge that much. And I'm pretty sure that some products run at a loss, they do for nearly every company. But AWS won't tell us which ones.

It's reasonable to think that S3 is loss making or about breakeven on its own but recoups costs due to bandwidth charges.


There's still latency, you know ;).


I guess the latency between AWS Frankfurt and GC Belgium should be low enough (5-10ms) to use it for most applications. E.g. storing large amount of data at one provider and renting compute engines for processing at the other one. The latency shouldn't be an issue there, as long as the throughput is high enough.


Can confirm on this, storage for a lot of stuff is in S3 and compute is GCP preemtpibles. Works if you have a small dataset which requires a large volume of compute.


Is that cheaper than using Google for storage as well? Or are there other reasons for that setup?


Bit of both, no point moving it as the automation/clients that dump data to S3 make it quite hard to change.


GCS supports the S3 API modulo resumable uploads (we do them differently): https://cloud.google.com/storage/docs/migrating

Feel free to send me a note, my contact info is in my profile (I helped build preemptible VMs and I'm sort of fascinated you're doing this).


Yes - I think it directly applies to EC2 as well. Still an underlying commodity


Could you link to ploid please? Neither google nor bing can find it from that name.



Dead? I get either site not found or a domain parking page.


Maybe it's under a different domain name. I was hoping to draw @bkruse out to tell us where we can sign up for Ploid because I'm interested too.


(repost)

Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!


Yes, you've tickled my curiosity too but I can't find you.


Sorry for the delay - yes it's ploid.io - nothing up there yet. We've been in stealth mode while we've been building the system for our first client - HudsonAlpha institute for biotechnology

Feel free to ping me at brandon at ploid.io - happy to share any insight we've gained!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: