I've yet to see empirical proof that stored procedures don't scale well. If you write an application in your SQL dialect of choice, that likely won't scale well, but putting data access behind an API should scale well no matter if that API is in Rails, Spring, or a stored procedure.
I don't know what you consider "empirical proof", but it seems obvious to me. And should be obvious to anyone competent who has ever had to scale stuff. A system fails to scale when it has a bottleneck, and that bottleneck gets overwhelmed. You make it scale better by scaling the bottleneck, which can be done by moving work out of the bottleneck, or by parallelizing it in some way.
The natural bottleneck for any system that has to synchronize data is the locking around synchronizing that data. That is because things that do not need synchronization can easily be parallelized. You therefore scale that bottleneck until the more fundamental one emerges.
In a standard database driven website, that bottleneck is always in the database. And therefore your scaling limit is the capacity of your database. As follows normal scaling advice, you need to move work out of the database, or remove the database as a scaling limit.
Moving work from stored procedures to the application is an example of moving work out of the database. So is having queries run against read-only replicas instead of the read/write master. Sharding your database and moving to a distributed NoSQL architecture are examples of removing the bottleneck.
Of the two approaches, the much simpler and safer one is to move work out of the database. Going NoSQL is cool, but unless you really know what you're doing, it is both unlikely to buy you what you wanted, and leaves you open to obscure data consistency problems.
When it comes to data validation, none of what you just said applies, because you need to perform your work inside a transaction. You can validate the data outside your database and then confirm that nothing's changed (optimistic concurrency control) but you can do that just as easily inside the database, with lower latency and greater throughput (and in situations with lots of contention this can lead to many more aborts than other concurrency control mechanisms, so be careful!) because many databases have OCC built in. If you can afford to relax consistency due to aspects of your data model, you can use a database with a relaxed consistency model and--again--get far better performance than an ad-hoc solution in your application.
It's hugely unclear to me why you think you skirting transactional requirements by performing work in your application is less complex than using a NoSQL database (or using a database that utilizes MVCC or can otherwise provide long-lived read snapshots).
Frankly, I also disagree that for most websites the bottleneck is the database. For many websites, database latency / throughput constraints don't ever become the dominant factor in end-to-end requests because of all the layers they have to get through in order to get to the database in the first place, combined with a relatively low number of requests per second (commodity relational databases on commodity hardware can easily handle many thousands per second, and IIRC Google Search only had to handle 40k rps from real clients in a recent press release) and inefficient code elsewhere in the stack.
Just put transactions in the application. With row level locking, the odds of running into contention problems are low. If you've been careful to always lock tables in the same order, then deadlocks are a non-issue. At that point, having longer running transactions does not matter. What matters is contention for the internal locking mechanisms inside of the database. And limiting how much extraneous stuff you have improves that.
Now it is easy to screw up an application. It is easy to screw up a database design. It is easy to screw up queries and query plans. But all of those are fixable in relatively straightforward ways. And once you do that, you will wind up with database throughput as your bottleneck.
As for NoSQL, the problem is this. Moving to that architecture requires taking an up front hit on transactional complexity, usually requires several times the hardware (data needs to be stored multiple times for hardware failure), puts a lot of stress on your network and latency, and is really easy to screw up. Just using a popular out of the box solution is not enough - see https://aphyr.com/tags/jepsen for a list of real failure modes on stuff that will look perfectly fine in testing.
It is a necessary challenge to accept if you want to go beyond a certain scale. But you should not accept that challenge unless you have good reason to do so.
The thing you're missing is that unless your validation logic is really complicated or you have a really fast network (think Infiniband, which is still not commodity), it will take less time and fewer resources to perform the validation logic (once the locks are taken, which has to happen regardless) than it does to marshall the data over the network to your application for processing, often by orders of magnitude. That is why more or less everyone trying to win benchmarks at transaction processing uses stored procedures.
FWIW, there's lots of concrete evidence of this beyond the fact that benchmarks almost always use stored procedures. If you look at the very best-performing storage systems, like MICA or FaRM, which achieve 100 million tps or more, you'll see that a huge amount of their optimization comes from circumventing the OS networking stack, taking advantage of built-in queueing mechanisms in NICs, and offloading work to avoid taking up cache lines and cores from the processing CPUs. Many extremely recent database designs also take this approach (separating the transaction processors from the executors) for similar reasons, like Bohm, Deuteronomy, and the SAP HANA Scale-out Extension, as well as deterministic database systems like Calvin. Given that these are literally the best-performing database systems out there, I have a hard time accepting that reducing network latency and doing as much processing work as possible in the database isn't the correct way to achieve high performance.
Using stored procedures also allows for (in theory) analysis of the allowed transactions and conflicts between them, which can boost performance further (e.g., if you can guarantee that they can't conflict with each other ahead of time, or if you can reuse cached data because nothing could have changed, or you might be able to order data in epochs and guarantee that there are no deadlocks, or you might be able to periodically check to see if a record was particularly highly contended and rebalance or split, if commutativity is possible--all of these options and more have been explored in recent database research). It's often difficult for an application to guarantee this because several web servers may be contacting the database at once, and they don't all know what the other web servers are doing, and this is doubly true if you allow ad-hoc queries. Not knowing whether a transaction will finish quickly (as is the case if an application is allowed to hold locks) also greatly increases chances of contention and/or disconnection, which can lead to further issues.
To be doubly clear--I think stored procedures are a massive pain in the ass and there are lots of good reasons to do exactly what you're proposing above. But performance is not one of them.
Please see my response there. It's not as simple as you're implying; often your throughput suffers if you can't handle requests fast enough. MICA, for instance, injects packets directly into L3 to avoid cache misses; it's important that MICA handle these requests as fast as possible because otherwise the L3 cache will fill up, the queues will start to back up, and ultimately they'll put backpressure on incoming packets and slow down the whole system (greatly decreasing throughput). The fact that it has tail latencies measured in microseconds is good on its own, but its real benefit is its effect on throughput!
I believe the poor performance of database is often caused by people who are new to the subject writing applications in such way that they do N+1 problem.
I think we're in agreement writing applications in the database isn't a good idea.
By "empirical proof", I mean that I have never seen anyone conclusively prove that switching data access from stored procedures to ad hoc SQL through an application was the cause of performance improvement. I've seen proof that removing complex application logic from the database fixed a problem, or that better indexing improved performance, but I've never seen anyone prove that removing stored procedures alone fixed the problem.
Further, there's nothing in your proposed solution of scaling out with readable replicas that precludes the use of stored procedures for data access. As a huge fan of both databases and separation of concerns, stored procedures make tremendous sense. They let a domain expert tune the database as needed. Even going so far as allowing that expert to re-write queries, provide optimizer hints, or even change the physical data model for better performance without ever changing application code.
Although I may or may not be competent, I have scaled "stuff" in a database. But I do appreciate you getting a vague ad hominem into the first sentence. Bravo.
The typical bottlenecks that I found were almost always IO - either through crappy storage, crappy indexing, or some combination of the two. Modern databases typically aren't bottlenecked by the lock manager.
I'd also agree that moving work out to a NoSQL database is particularly tricky. For three years I maintained the .NET Riak client and helped developers make better decisions when they were considering moving away from an RDMBS.
I like to say that scalability is like a Mac truck. Good if you have to move a lot of stuff, but not necessarily the best tool for getting your groceries. Moving logic from stored procedures to the application adds latency and network traffic. This is never going to be good for how fast you process an individual request. However it can let your system handle more requests per second.
That's only true if there aren't resources taken up by your in-flight transactions, and if those resources don't exceed the cost of just running the logic in-place. Otherwise, you're not sacrificing latency for throughput, you're just sacrificing latency and throughput. Unless you have a very heavy application-level thing to do that doesn't require periodic new access to data, that's usually not the case, especially if you're trying to process requests at line rate (NIC queues will get exhausted just trying to hold all the pending requests in memory, and you'll start to suffer from lock manager contention, page buffer swapping, etc.).
But my experience with Oracle specifically is that handling connections is just general overhead, while the real failure modes have to do with overloading random latches somewhere deep inside of the system. I also found that logic in stored procedures caused us to hit limits faster than leaving it in the application.
The specific case I saw this in was a simple set of queries to test if you were in an A/B test, and if not to assign you to a random variant. After I left my application logic was moved into a complex stored procedure, and they then had scalability limitations that they didn't have before. For political reasons they declared success and ran fewer A/B tests...
(I've used a lot of other databases as well, but Oracle is the only one I've really pushed to its scalability limits.)
The database is usually the bottleneck because of IO, not because of load on the actual database in my experience. Assuming you know how to design and index a database.