Whether Postgres (or similar) can handle the data depends not only on the data *...

anarazel · on May 31, 2019

> I.e., if you are generating reports, running aggregations over a large amount of data you definitely need some parallelism and Postgres isn't designed to handle these loads (certainly not petabytes). Even aggregating 100's of GB probably requires (or at least is more cost effective using) multiple machines.

Aggregating 100s of GB isn't much of a problem for PG these days. Yes, you can be faster - obviously - but it works quite well. And the price for separate systems (duplicated infrastructure, duplicated data, out-of-sync systems, ...) is noticable as well.

But yea, for many petabytes of data you either have to go to an entirely different system, or use something like Citus.

Disclaimer: I work on PG, and I used to work for Citus. So I'm definitely biased.

throwaway082729 · on May 31, 2019

I'm genuinely curious. Can PG handle hundreds of users querying 100s of GB, sometimes the same set of tables, at the same time?

anarazel · on May 31, 2019

Well. You're going to run out of CPU and memory bandwidth pretty quickly. So you'd need replicas to share processing load.

But I honestly don't think hundreds of users each querying 100s of GBs is all that common.

throwaway082729 · on May 31, 2019

Again, genuinely curious to learn if you've experience with enterprise companies using 'big data' technologies.

anarazel · on May 31, 2019