I don't follow. Reboot is downtime. Of course your architecture must allow for d...

jonhohle · on Dec 3, 2023

A reboot, a software deployment (kernel upgrade), server replacement, etc. are all the same process. That simplifies things dramatically. You can micro-optimize the 30s it takes to reboot a server, or you can simplify a runbook to have one process for any “deployment”. Different scenarios require different things but for most “web scale” things that need to be overprovisioned anyway, I’d take the simpler process.

nickstinemates · on Dec 3, 2023

These servers don't take 30s to reboot. Some servers take many minutes. It's a lot.

latchkey · on Dec 3, 2023

Worse, some just don't come back without manual intervention. Power supplies don't last forever and might run fine while the machine is on, but after a reboot... boom, gone.

yjftsjthsd-h · on Dec 3, 2023

I'd prefer kexec to kpatch, then

bradknowles · on Dec 7, 2023

Spin new servers up before you take the old ones down. Effectively zero loss of time for that service.

jononor · on Dec 5, 2023

Sounds like something to fix rather than to paper over?

tentacleuno · on Dec 3, 2023

Isn't it more significant at smaller scale? That is, if you have less computers running to serve requests, the downtime of the singular system will be more pronounced (as opposed to rebooting one machine out of 20 in a rack).

cortesoft · on Dec 3, 2023

If it isn’t an emergency patch, we do all our maintenance at low traffic times (e.g. the middle of the night local time for the data center). Your capacity planning is based on peak traffic, so you can afford to have more machines out during low traffic times.

j16sdiz · on Dec 3, 2023

Yes, you need to overprovision the server.a little bit.

But you got a much simpler process.

Process ain't free either.