Whether Postgres (or similar) can handle the data depends not only on the data size but also on what you want to do with the data.
I.e., if you are generating reports, running aggregations over a large amount of data you definitely need some parallelism and Postgres isn't designed to handle these loads (certainly not petabytes). Even aggregating 100's of GB probably requires (or at least is more cost effective using) multiple machines.
Now Hadoop may not be a particularly efficient solution unless you need 100's of machines. But there is a limit to what a non-parallel single machine database can do. There are other solutions in-between.
And you really don't have to be twitter or google to handle significant amount of data these days. People are recording much more data in the hope of generating new insights and do need tools to process that data.
> I.e., if you are generating reports, running aggregations over a large amount of data you definitely need some parallelism and Postgres isn't designed to handle these loads (certainly not petabytes). Even aggregating 100's of GB probably requires (or at least is more cost effective using) multiple machines.
Aggregating 100s of GB isn't much of a problem for PG these days. Yes, you can be faster - obviously - but it works quite well. And the price for separate systems (duplicated infrastructure, duplicated data, out-of-sync systems, ...) is noticable as well.
But yea, for many petabytes of data you either have to go to an entirely different system, or use something like Citus.
Disclaimer: I work on PG, and I used to work for Citus. So I'm definitely biased.
I.e., if you are generating reports, running aggregations over a large amount of data you definitely need some parallelism and Postgres isn't designed to handle these loads (certainly not petabytes). Even aggregating 100's of GB probably requires (or at least is more cost effective using) multiple machines.
Now Hadoop may not be a particularly efficient solution unless you need 100's of machines. But there is a limit to what a non-parallel single machine database can do. There are other solutions in-between.
And you really don't have to be twitter or google to handle significant amount of data these days. People are recording much more data in the hope of generating new insights and do need tools to process that data.