uv and ruff are a great counterexample to all those people who say "never reinve...

CrendKing · 2025-06-23T21:31:20 1750714280

I believe most of the time this phrase is said to an inexperienced artisan who has no idea how the current system works, what's the shortcoming of it, and how to improve upon it. Think of an undergraduate student who tries to solve the Goldbach conjecture. Usually what ended up is either he fails to reinvent the wheel, or reinvent the exact same wheel, which has no value. The phrase certainly does not apply to professionals.

dwattttt · 2025-06-24T04:50:57 1750740657

Even then, you know what's a good way to learn about how the current system works etc, maybe even the best way? I've got many failed projects behind me, and 0 regrets.

eviks · 2025-06-23T17:58:07 1750701487

They didn't reinvent the wheel, "just" replaced all the wood with more durable materials to make it handle rotation at 10 times the speed

doug_durham · 2025-06-23T21:54:05 1750715645

A big part of the "magic" is that there is a team of paid professionals maintaining and improving it. That's more important than it being written in Rust. If uv were forked it would devolve to the level of pip over time.

socalgal2 · 2025-06-23T18:48:57 1750704537

I'd be curious to know exactly what changed. Python -> Rust won't make network downloads faster nor file I/O faster. My naive guess is that all the speed comes from choosing better algorithms and/or parallelizing things. Not from Python vs Rust (though if it's hard to parallelize in Python and easy in rust that would certainly make a difference)

ekidd · 2025-06-23T19:30:30 1750707030

I've translated code from Ruby to Python, and other code from Rust to Python.

Rust's speed advantages typically come from one of a few places:

1. Fast start-up times, thanks to pre-compiled native binaries.

2. Large amounts of CPU-level concurrency with many fewer bugs. I'm willing to do ridiculous threading tricks in Rust I wouldn't dare try in C++.

3. Much lower levels of malloc/free in Rust compared to some high-level languages, especially if you're willing to work a little for it. Calling malloc in a multithreaded system is basically like watching the Millennium Falcon's hyperdrive fail. Also, Rust encourages abusing the stack to a ridiculous degree, which further reduces allocation. It's hard to "invisibly" call malloc in Rust, even compared to a language like C++.

4. For better or worse, Rust exposes a lot of the machinery behind memory layout and passing references. This means there's a permanent "Rust tax" where you ask yourself "Do I pass this by value or reference? Who owns this, and who just borrows is?" But the payoff for that work is good memory locality.

So if you put in a modest amount of effort, it's fairly easy to make Rust run surprisingly fast. It's not an absolute guarantee, and there are couple of traps for the unwary (like accidentally forgetting to buffer I/O, or benchmarking debug binaries).

the8472 · 2025-06-23T19:05:08 1750705508

NVMe hungers, keeping it fed is hard work. Doing some serial read, decompress, checksum, write loop will leave if starved (QD<1) whenever you're doing anything but the last step. Disk IO isn't async unless you use io_uring (well ok, writeback caches can be). So threads are almost a must to keep NVMe busy. Conversely, waiting for blocking IO (e.g. directory enumeration) will keep your CPU starved. Here too the answer is more threads.

globular-toast · 2025-06-23T20:15:54 1750709754

There is a talk about it from one of the authors here: https://www.youtube.com/watch?v=gSKTfG1GXYQ

tl;dw Rust, a fast SAT solver, micro-optimisation of key components, caching, and hardlinks/CoW.

captnswing · 2025-06-23T20:47:52 1750711672

Extremely interesting presentation from Charlie Marsh about all the optimizations https://youtu.be/gSKTfG1GXYQ?si=CTc2EwQptMmKxBwG

socalgal2 · 2025-06-24T01:20:30 1750728030

Thanks. So from the video the biggest wins were

1. they way get the metadata for a package.

packages are in zip files. zip files have their TOC at the end. So, instead of downloading the entire zip they just get the end of the file, read the TOC, then from that download just the metadata part

I've written that code before for my own projects.

2. They cache the results of packages unzipped and then link into your environment

This means there's no files being copied on the 2nd install. Just links.

Both of those are huge time wins that would be possible in any language.

3. They store their metadata as a memory dump

So, on loading there is nothing to parse.

Admittedly this is hard (impossible?) in many languages. Certainly not possible in Python and JavaScript. You could load binary data but it won't be useful without copying it into native numbers/strings/ints/floats/doubles etc...

I've done this in game engines to reduce load times in C/C++ and to save memory.

It'd be interesting to write some benchmarks for the first 2. The 3rd is a win but I suspect the first 2 are 95% of the speedup.

jerpint · 2025-06-23T19:11:11 1750705871

From just my observations they basically parallelized the install sequence instead of having it be sequential (among many other optimizations most likely)

jerf · 2025-06-23T21:05:15 1750712715

It became a bit of a meme, especially in the web development space, that all programs are always waiting on external resources like networks, databases, disks, etc., and so scripting languages being slower than other languages doesn't matter and they'll always be as fast as non-scripting languages.

Even on a single core, this turns out to be simply false. It isn't that hard to either A: be doing enough actual computation that faster languages are in fact perceptibly faster, even, yes, in a web page handler or other such supposedly-blocked computation or B: without realizing it, have stacked up so many expensive abstractions on top of each other in your scripting language that you're multiplying the off-the-top 40x-ish slower with another set of multiplicative penalties that can take you into effectively arbitrarily-slower computations.

If you're never profiled a mature scripting language program, it's worth your time. Especially if nobody on your team has ever profiled it before. It can be an eye-opener.

Then it turns out that for historical path reasons, dynamic scripting languages are also really bad at multithreading and using multiple cores, and if you can write a program that can leverage that you can just blow away the dynamic scripting languages. It's not even hard... it pretty much just happens.

(I say historical path reasons because I don't think an inability to multithread is intrinsic to the dynamic scripting languages. It's just they all came out in an era when they could assume single core, it got ingrained into them for a couple of decades, and the reality is, it's never going to come fully out. I think someone could build a new dynamic language that threaded properly from the beginning, though.)

You really can see big gains just taking a dynamic scripting language program and turning it into a compiled language with no major changes to the algorithms. The 40x-ish penalty off the top is often in practice an underestimate, because that number is generally from highly optimized benchmarks in which the dynamic language implementation is highly tuned to avoid expensive operations; real code that takes advantage of all the conveniences and indirection and such can have even larger gaps.

This is not to say that dynamic scripting languages are bad. Performance is not the only thing that matters. They are quite obviously fast enough for a wide variety of tasks, by the strongest possible proof of that statement. That said, I think it is the case that there are a lot of programmers who have no idea how much performance they are losing in dynamic scripting languages, which can result in suboptimal engineering decisions. It is completely possible to replace a dynamic scripting language program with a compiled one and possibly see 100x+ performance improvements on very realistic code, before adding in multithreading. It is hard for that not to manifest in some sort of user experience improvement. My pitch here is not to give up dynamic scripting languages, but to have a more realistic view of the programming language landscape as a whole.

RhysU · 2025-06-23T22:08:34 1750716514

> Then it turns out that for historical path reasons, dynamic scripting languages are also really bad at multithreading and using multiple cores...

What would a dynamic scripting language look like that wasn't subject to this limitation? Any examples? I don't know of contenders in this design space--- I am not up on it.

Tuna-Fish · 2025-06-23T23:39:48 1750721988

The big difference from Python is probably having to use a real tracing GC instead of automatic reference counting. For a single-threaded program, refcounts are beneficial in multiple ways, being fairly cheap, having a smooth performance profile, maintaining low resident set size, and providing deterministic freeing.

But because of the way cache coherency for shared, mutated memory works, parallel refcounting is slow as molasses and will always remain so.

I think Ruby has always used a tracing GC, but it also still has a GIL for some reason?

jerf · 2025-06-24T13:40:45 1750772445

It would look pretty much the same. It would just have been written to be multithreaded from the beginning, and lack the long list of restrictions and caveats and "but it doesn't work with our C extensions" and such. There wouldn't be a dozen major libraries trying to solve the problem (which, contrary to many people's intuition, is often a sign that a language lacks a good solution). This is part of why I say there's no fundamental reason this can't be done, it's just a historical accident.

dgb23 · 2025-06-24T06:53:09 1750747989

There are dynamic languages that were built with concurrency in mind like Clojure. It’s also a surprisingly fast language considering it’s both dynamic and functional.

socalgal2 · 2025-06-23T21:24:34 1750713874

I'm not trying to suggest that you can't do faster computation in a lower-level language. But, a package manager doesn't do much computation. It mostly downloads, decompresses, and writes files. Yes, it has to solve constraints but that's not a bottleneck given most projects have at most a few 100 dependencies and not millions.

I don't know python but in JavaScript, triggering 1000 downloads in parallel is trivial. Decompressing them, like in python, is calling out to some native function. Decompressing them in parallel in JS would also be trivial (no idea about python). Writing them in parallel is also trivial.

jerf · 2025-06-24T13:44:59 1750772699

Congratulations! You have proved that it is impossible for uv to be way, way faster than Python-based package managers!

....

Unfortunately, there seems to be a problem here.

When reality and theory conflict, reality wins.

It sounds like you've drunk the same Kool-Aide I was referring to in my post. It's not true. When you're playing with 50x-100x slowdowns, if not more, it's really quite easy to run into user-perceptible slowdowns. A lot of engineers grotesquely underestimate how slow these languages are. I suspect it may be getting worse over time due to evaporative cooling, as engineers who do understand it also tend to have one reason or another to leave the language community at some point, and I believe (though I can not prove) that as a result the dynamic scripting language communities are actually getting worse and worse at realizing how slow their languages are. They're really quite slow.

socalgal2 · 2025-06-24T17:42:09 1750786929

You seem to be implying rust = fast, the end. I'm implying algorithms and design choices = fast. Those decisions generally (though not always) are far more effective at speed than language choice.

I watched the video linked above on uv. They went over the optimizations. The big wins had nothing to do with rust and everything to do with design/algo choices.

You could have also done without the insults. You have no idea who I am and my experiences. I've shipped several AAA games written in C/C++ and assembly. I know how to optimize. I also know how dynamic languages work. I also know when people are making up bullshit about "it's fast because it's in rust!". No, that is not why it's fast.

collinmanderson · 2025-06-25T17:07:59 1750871279

I agree there are lot of big wins in uv that tools written in python could take advantage of, and ultimately I think uv is fast because they're obsessed with making it fast, which is why they chose to use rust. I don't see that same level speed obsession with the other tools.

Instead of "It's fast because it's in rust", I'd say: "It's fast because they chose to use rust for their python tool, which means they care a lot about speed."

physicsguy · 2025-06-23T19:49:12 1750708152

The package resolution is a big part of it, it's effectively a constraint solver. I.e. if package A requires package B constrained between version 1.0 < X <= 2.X and Package B requires package C between... and so on and so on.

Conda rewrote their package resolver for similar reasons

0cf8612b2e1e · 2025-06-23T18:09:01 1750702141

The history of Python package management is clear that everyone thinks they can do a better job than the status quo.

psunavy03 · 2025-06-23T18:10:38 1750702238

In this case, they were right.

dwattttt · 2025-06-24T05:34:03 1750743243

I would say in many cases they were right; the history of Python package management is littered with winners as well as losers.

lmm · 2025-06-24T01:29:38 1750728578

Python package management was notoriously awful. The problem wasn't that people were trying to do things better, it was that they weren't; every new Python dependency management tool just repeated the mistakes of all the previous Python dependency management tools. uv is the first one to break the cycle (and it's probably not a coincidence that it's the first one to not be written in Python).

nonethewiser · 2025-06-24T14:56:37 1750776997

Poetry broke the cycle. Unified toolchain, lock file, single configuration file, full dependency graph, dev dependencies. uv is faster which is great but Poetry was a huge step in the right direction and still a good tool.

nickelpro · 2025-06-23T19:16:24 1750706184

uv is purely a performance improvement, it changes nothing about the mechanics of Python environment management or packaging.

The improvements came from lots of work from the entire python build system ecosystem and consensus building.

0cf8612b2e1e · 2025-06-23T19:36:22 1750707382

Disagree in that uv makes switching out the underlying interpreter so straightforward. Becomes trivial to swap from say 3.11 to 3.12. The pybi idea.

Sure, other tools could handle the situation, but being baked into the tooling makes it much easier to bootstrap different configurations.

nickelpro · 2025-06-23T19:37:45 1750707465

Yes, it's faster and better than pyenv, but the mechanism it's using (virtual environments) is not a uv invention.

uv does the Python ecosystem better than any other tool, but it's still the standard Python ecosystem as defined in the relevant PEPs.

pityJuke · 2025-06-23T20:26:30 1750710390

Are the lock files standardised, or a uv-specific thing?

nickelpro · 2025-06-23T21:06:40 1750712800

uv has both a uv-specific implementation, and support for standard PEP 751 lockfiles

collinmanderson · 2025-06-25T17:16:12 1750871772

Worth noting uv-specific implementation has more features than that standard PEP 751 lockfiles, so uv plans to keep using its own implementation by default. https://github.com/astral-sh/uv/issues/12584

globular-toast · 2025-06-23T20:20:07 1750710007

Actually not true. One of the main differences with uv is you don't have to think about venvs any more. There's a talk about it from one of the authors at a recent PyCon here: https://www.youtube.com/watch?v=CV8KRvWKYDw (not the same talk I linked elsewhere in the thread).

nickelpro · 2025-06-23T21:10:54 1750713054

How do you think uv works?

It creates a venv. Note were talking about the concept of a virtual environment here, PEP 405, not the Python module "venv".

globular-toast · 2025-06-24T06:42:51 1750747371

I said you don't have to think about venvs any more. It's great that we have a standard way to implement them, but this is only necessary in the first place because of the way Python is. Now we have a tool that enforces a workflow that creates virtualenvs without you having to know about them and therefore not screwing them up with ad hoc pip installs etc.

lmm · 2025-06-24T01:27:38 1750728458

The implementation details don't matter. uv might follow PEP 405 but it could work just as well without doing so. The point is that it doesn't give you the bunch of extra footguns that any other Python package management does.

nickelpro · 2025-06-24T02:34:51 1750732491

It matters immensely that it follows PEP 405, it makes uv the implementation detail. You can swap out uv for any other project management tool or build frontend and change nothing needs to change about the development environment.

This is the entire purpose of the standards.

lmm · 2025-06-24T03:16:22 1750734982

> You can swap out uv for any other project management tool or build frontend and change nothing needs to change about the development environment.

> This is the entire purpose of the standards.

That seems to amount to saying that the purpose of the standards is to prevent progress and ensure that the mistakes of early Python project management tools are preserved forever. (Which would explain some things about the last ~25 years of Python project management I guess). The parts of uv that follow standards aren't the parts that people are excited about.

dagw · 2025-06-24T10:08:51 1750759731

The parts of uv that follow standards aren't the parts that people are excited about.

I disagree. Had uv not followed these standards and instead gone off and done their completely own thing, it could not function as a drop in replacement for pip and venv and wouldn't have gotten anywhere near as much traction. I can use uv personally to work on projects that officially have to support pip and venv and have it all be transparent.

nickelpro · 2025-06-24T18:14:34 1750788874

There are no parts of uv that don't follow standards.

The standards have nothing to do with the last 25 years of Python project management, the most import ones (PEP 517/518) are less than 10 years old.

aragilar · 2025-06-24T11:43:14 1750765394

uv only exists because of those standards and therefore can make assumptions that earlier tools could not.

blitzar · 2025-06-24T07:53:57 1750751637

> How do you think uv works?

Dont know, dont care. It thinks about these things not me.

akoboldfrying · 2025-06-24T00:17:32 1750724252

True, but then all software is developed for this reason.

henry700 · 2025-06-23T18:57:01 1750705021

Of course they do, this tends to happen when the history is it being hot flaming garbage.

mort96 · 2025-06-23T18:27:58 1750703278

Honestly "don't reinvent the wheel" makes absolutely no sense as a saying. We're not still all using wooden discs as wheels, we have invented much better wheels since the neolithic. Why shouldn't we do the same with software?

simonw · 2025-06-23T19:04:51 1750705491

When asked why he had invented JSON when XML already existed, Douglas Crockford said:

The good thing about reinventing the wheel is that you can get a round one.

https://scripting.wordpress.com/2006/12/20/scripting-news-fo...

idle_zealot · 2025-06-23T20:37:48 1750711068

You can get a round one. Or you can make yet another wonky shaped one to add to the collection, as ended up being the case with JSON.

simonw · 2025-06-23T21:12:27 1750713147

What makes JSON wonky?

Personally the only thing I miss from it is support for binary data - you end up having to base64 binary content which is a little messy.

idle_zealot · 2025-06-23T22:07:40 1750716460

Quoted keys, strict comma rules, very limited data types, are the main ones. There are a host of others if you view it through the lenses of user-read/write, and a different set of issues if you view it as a machine data interface. Trying to combine the two seems fundamentally misguided.

collinmanderson · 2025-06-25T17:21:22 1750872082

I consider JSON's very limited data types to be part of what makes it so good.

Myrmornis · 2025-06-24T02:10:41 1750731041

Lack of comments seems like a big one seeing as it's so widely used for "configuration". It's a big enough downside that VSCode and others have violated it via ad-hoc extensions of the format.

The comma rules introduce diff noise on unrelated lines.

psunavy03 · 2025-06-23T22:12:00 1750716720

Insert the xkcd about 15 competing standards . . .

oblio · 2025-06-23T22:50:47 1750719047

Standards do die off, up to a point. XML is widely used but the last time I really had to edit it in anger working in DevOps/web/Python was a long time ago (10 years ago?).

At this point XML is the backbone of many important technologies that many people won't use or won't use directly anymore.

This wasn't the case circa 2010, when I doubt any dev could have really avoided XML for a bunch of years.

I do like XML, though.

mort96 · 2025-06-27T13:28:09 1751030889

Probably the world's most over-used and misused comic strip. JSON wasn't created as a response to a situation where there were too many data interchange standards.

haiku2077 · 2025-06-23T18:52:51 1750704771

Right, wheels are reinvented every few years. Compare tires of today to the ones 20 years ago and the technology and capability is very different, even though they look identical to a casual eye.

My primary vehicle has off-road capable tires that offer as much grip as a road-only tire would have 20-25 years ago, thanks to technology allowing Michelin to reinvent what a dual-purpose tire can be!

nightpool · 2025-06-23T20:16:29 1750709789

> Compare tires of today to the ones 20 years ago and the technology and capability is very different, even though they look identical to a casual eye

Can you share more about this? What has changed between tires of 2005 and 2025?

haiku2077 · 2025-06-23T20:44:02 1750711442

In short: Better materials and better computational models.

https://www.caranddriver.com/features/a15078050/we-drive-the...

> In the last decade, the spiciest street-legal tires have nearly surpassed the performance of a decade-old racing tire, and computer modeling is a big part of the reason

(written about 8 years ago)

aalimov_ · 2025-06-23T18:51:57 1750704717

I always took this saying as meaning that we don’t re-invent the concept of the wheel. For example the Boring company and Tesla hoping to reinvent the concept of the bus/train.. (iirc your car goes underground on some tracks and you get to bypass traffic and not worry about steering)

A metal wheel is still just a wheel. A faster package manager is still just a package manager.

haiku2077 · 2025-06-23T18:53:57 1750704837

That's not how I've ever seen it used in practice. People use it to mean "don't build a replacement for anything functional."

sashimi-houdini · 2025-06-24T06:49:06 1750747746

I also like Dan Luu's take (starting with a Joel Spolsky quote)

“Find the dependencies — and eliminate them.” When you're working on a really, really good team with great programmers, everybody else's code, frankly, is bug-infested garbage, and nobody else knows how to ship on time.

We had a similar attitude, although I'd say that we were a bit more humble. We didn't think that everyone else was producing garbage but, we also didn't assume that we couldn't produce something comparable to what we could buy for a tenth of the cost. From talking to folks at some competitors, there was a pretty big cultural difference between how we operated and how they operated. It simply didn't occur to them that they didn't have to buy into the standard American business logic that you should focus on your core competencies, that you can think through whether or not it makes sense to do something in-house on the merits of the particular thing instead of outsourcing your thinking to a pithy saying.[0]

[0] https://danluu.com/nothing-works/

rocqua · 2025-06-24T00:12:33 1750723953

I came here to (wrongly) say that wooden disks were never used as wheels, and that ot all started with spokes. Some checking showed that, in fact, the oldest known wheels have a lot of solid disks. E.g: https://en.m.wikipedia.org/wiki/Ljubljana_Marshes_Wheel

Hopefully this can disabuse others of similar mistaken memory.

bmitc · 2025-06-23T22:55:25 1750719325

Ruff is actually a good example of the danger of rewrites. They rewrote tools but not all of the parts of the tools.

jjtheblunt · 2025-06-23T18:25:36 1750703136

> an order of magnitude better

off topic, but i wonder why that phrase gets used rather than 10x which is much shorter.

BeetleB · 2025-06-23T19:44:51 1750707891

Short answer: Because the base may not be 10.

Long answer: Because if you put a number, people expect it to be accurate. If it was 6x faster, and you said 10x, people may call you out on it.

screye · 2025-06-23T19:00:04 1750705204

It's meant to signify a step change. Order of magnitude change = no amount of incremental changes would make up for it.

In common conversation, the multiplier can vary from 2x - 10x. In context of some algorithms, order of magnitudes can be over the delta rather than absolutes. eg: an algorithms sees 1.1x improvement over the previous 10 years. A change that shows a 1.1x improvement by itself, overshadows an an order-of-magnitude more effort.

For salaries, I've used order-of-magnitude to mean 2x. Good way to show a step change in a person's perceived value in the market.

bxparks · 2025-06-23T18:42:44 1750704164

I think of "an order of magnitude" as a log scale. It means somewhere between 3.16X and 31.6X.

jjtheblunt · 2025-06-23T19:22:00 1750706520

yeah that's what i meant with 10x, like it's +1 on the exponent, if base is 10. but i'm guessing what others are thinking, hence the question.

bxparks · 2025-06-24T03:01:09 1750734069

The problem is that 10x appears to be a linear scale. It could mean 9.5x to 10.5x if it's supposed to have 2 significant digits. Or it could be 5x to 15x if it meant to have 1 significant digit.

jjtheblunt · 2025-06-24T17:51:03 1750787463

good point

refulgentis · 2025-06-23T18:41:23 1750704083

"10x" has been cheapened / heard enough / de facto, is a more general statement than a literal interpretation would indicate. (i.e. 10x engineer. Don't hear that much around these parts these days)

Order of magnitude faces less of that baggage, until it does :)

psunavy03 · 2025-06-23T22:12:45 1750716765

Would you say it faces . . . orders of magnitude less baggage?

fkyoureadthedoc · 2025-06-23T18:29:34 1750703374

- sounds cooler

- 10x is a meme

- what if it's 12x better

Scene_Cast2 · 2025-06-23T18:28:17 1750703297

10x is too precise.

bmacho · 2025-06-23T19:53:53 1750708433

Because it's not 10x?

chuckadams · 2025-06-23T19:11:42 1750705902

Because "magnitude" has cool gravitas, something in how it's pronounced. And it's not meant to be precise, it just means "a whole lot more".

neutronicus · 2025-06-23T19:11:26 1750705886

5x faster is an order of magnitude bc of rounding

zzzeek · 2025-06-24T02:00:45 1750730445

ruff does not support custom plugins so is useless to me