I'll bite. How do i train/make and/or use LoRA, or, separately, how do i fine-tu...

techwizrd · 2025-07-29T15:52:57 1753804377

We have been fine-tuning models using Axolotl and Unsloth, with a slight preference for Axolotl. Check out the docs [0] and fine-tune or quantize your first model. There is a lot to be learned in this space, but it's exciting.

0: https://axolotl.ai/ and https://docs.axolotl.ai/

arkmm · 2025-07-29T16:59:49 1753808389

When do you think fine tuning is worth it over prompt engineering a base model?

I imagine with the finetunes you have to worry about self-hosting, model utilization, and then also retraining the model as new base models come out. I'm curious under what circumstances you've found that the benefits outweigh the downsides.

reissbaker · 2025-07-29T18:30:13 1753813813

For self-hosting, there are a few companies that offer per-token pricing for LoRA finetunes (LoRAs are basically efficient-to-train, efficient-to-host finetunes) of certain base models:

- (shameless plug) My company, Synthetic, supports LoRAs for Llama 3.1 8b and 70b: https://synthetic.new All you need to do is give us the Hugging Face repo and we take care of the rest. If you want other people to try your model, we charge usage to them rather than to you. (We can also host full finetunes of anything vLLM supports, although we charge by GPU-minute for full finetunes rather than the cheaper per-token pricing for supported base model LoRAs.)

- Together.ai supports a slightly wider number of base models than we do, with a bit more config required, and any usage is charged to you.

- Fireworks does the same as Together, although they quantize the models more heavily (FP4 for the higher-end models). However, they support Llama 4, which is pretty nice although fairly resource-intensive to train.

If you have reasonably good data for your task, and your task is relatively "narrow" (i.e. find a specific kind of bug, rather than general-purpose coding; extract a specific kind of data from legal documents rather than general-purpose reasoning about social and legal matters; etc), finetunes of even a very small model like an 8b will typically outperform — by a pretty wide margin — even very large SOTA models while being a lot cheaper to run. For example, if you find yourself hand-coding heuristics to fix some problem you're seeing with an LLM's responses, it's probably more robust to just train a small model finetune on the data and have the finetuned model fix the issues rather than writing hardcoded heuristics. On the other hand, no amount of finetuning will make an 8b model a better general-purpose coding agent than Claude 4 Sonnet.

delijati · 2025-07-29T22:50:52 1753829452

Do you maybe know if there is a company in the EU that hosts models (DeepSeek, Qwen3, Kimi)?

reissbaker · 2025-07-30T04:27:04 1753849624

Most inference companies (Synthetic included) host in a mix of the U.S. and EU — I don't know of any that promise EU-only hosting, though. Even Mistral doesn't promise EU-only AFAIK, despite being a French company. I think at that point you're probably looking at on-prem hosting, or buying a maxed-out Mac Studio and running the big models quantized to Q4 (although even that couldn't run Kimi: you might be able to get it working over ethernet with two Mac Studios, but the tokens/sec will be pretty rough).

seunosewa · 2025-08-11T09:48:00 1754905680

When prompt engineering isn't giving you reliable results.

tough · 2025-07-29T18:18:51 1753813131

only for narrow applications where your fine tune can let you use a smaller model locally , specialised and trained for your specific use-case mostly

whimsicalism · 2025-07-29T17:32:12 1753810332

finetuning rarely makes sense unless you are an enterprise and even generally doesn't in most cases there either.

syntaxing · 2025-07-29T16:22:41 1753806161

What hardware do you train on using axolotl? I use unsloth with Google colab pro

notpublic · 2025-07-29T15:35:58 1753803358

https://github.com/unslothai/unsloth

I'm not sure if it contains exactly what you're looking for, but it includes several resources and notebooks related to fine-tuning LLMs (including LoRA) that I found useful.

qcnguy · 2025-07-29T16:31:26 1753806686

LLM fine tuning tends to destroy the model's capabilities if you aren't very careful. It's not as easy or effective as with image generation.

nxobject · 2025-07-30T11:14:19 1753874059

My very cursory understanding -- at least from Unsloth's recommendations -- is that you have to work very hard to preserve reasoning/instruct capabilities [1]: for example to "preserve" Qwen3's reasoning capabilities (however that's operationalized), they suggest a fine-tuning corpus that's 75% chain of thought to 25% non-reasoning. Is that a significant issue for orgs/projects that currently rely on fine-tuning?

[1] https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tun...

israrkhan · 2025-07-29T23:13:51 1753830831

do you have a suggestion or a way to measure if model capabilities are getting destroyed? how do one measure it objectively?

mensetmanusman · 2025-07-30T12:14:19 1753877659

These are now questions at the cutting edge of academic research. It might be computationally unknowable until checked.

RALaBarge · 2025-07-30T00:05:27 1753833927

Ask it a series of the same questions after you train that you posed before training started. Is the quality lower?

israrkhan · 2025-07-30T06:38:03 1753857483

That series of questions will measure only a particular area. I am concerned about destorying model capabilities in some other area that that I do not pay attention to, and have no way of knowing.

simonh · 2025-07-30T07:26:35 1753860395

Isn’t that a general problem with LLMs? The only way to know how good it is at something is to test it.

svachalek · 2025-07-29T16:19:40 1753805980

For completeness, for Apple hardware MLX is the way to go.

w10-1 · 2025-07-29T18:37:29 1753814249

MLX github: https://github.com/ml-explore/mlx

get started: https://developer.apple.com/videos/play/wwdc2025/315/

details: https://developer.apple.com/videos/play/wwdc2025/298/

minimaxir · 2025-07-29T15:27:39 1753802859

If you're using Hugging Face transformers, the library you want to use is peft: https://huggingface.co/docs/peft/en/quicktour

There are Colab Notebook tutorials around training models with it as well.

otabdeveloper4 · 2025-07-30T06:56:48 1753858608

> So what's the big secret about LLM LoRA?

No clear use case for LLMs yet. ("Spicy" aka pornography finetunes are the only ones with broad adoption, but we don't talk about that in polite society here.)

AlecSchueler · 2025-07-30T11:30:57 1753875057

Where do we speak about it? It feels like the biggest use for these models right now is for deep fakes and other harassment but few people in the industry want to talk about it while continuing to enable it.

jasonjmcghee · 2025-07-29T22:57:52 1753829872

brev.dev made an easy to follow guide a while ago but apparently Nvidia took it down or something when they bought them?

So here's the original

https://web.archive.org/web/20231127123701/https://brev.dev/...

electroglyph · 2025-07-29T19:33:53 1753817633

unsloth is the easiest way to finetune due to the lower memory requirements

pdntspa · 2025-07-29T19:49:44 1753818584

Have you tried asking an LLM?