Pumpkin Escobar

Pumpkin Escobar@lemmy.world · 6 days ago

DNFTA

Pumpkin Escobar@lemmy.world · edit-2 2 months ago

First a caveat/warning - you’ll need a beefy GPU to run larger models, there are some smaller models that perform pretty well.

Adding a medium amount of extra information for you or anyone else that might want to get into running models locally

Tools

Ollama - great app for downloading/managing/running models locally
OpenWebUI - A web app that provides a UI like the ChatGPT web app, but can use local models
continue.dev - A VS Code extension that can use ollama to give a github copilot-like AI assistant running against a local model (can also connect to Anthropic Claude, etc…)

Models

If you look at https://ollama.com/library?sort=featured you can see models

Model size is measured by parameter count. Generally higher parameter models are better (more “smart”, more accurate) but it’s very challenging/slow to run anything over 25b parameters on consumer GPUs. I tend to find 8-13b parameter models are a sort of sweet spot, the 1-4b parameter models are meant more for really low power devices, they’ll give you OK results for simple requests and summarizing, but they’re not going to wow you.

If you look at the ‘tags’ for the models listed below, you’ll see things like 8b-instruct-q8_0 or 8b-instruct-q4_0. The q part refers to quantization, or shrinking/compressing a model and the number after that is roughly how aggressively it was compressed. Note the size of each tag and how the size reduces as the quantization gets more aggressive (smaller numbers). You can roughly think of this size number as “how much video ram do I need to run this model”. For me, I try to aim for q8 models, fp16 if they can run in my GPU. I wouldn’t try to use anything below q4 quantization, there seems to be a lot of quality loss below q4. Models can run partially or even fully on a CPU but that’s much slower. Ollama doesn’t yet support these new NPUs found in new laptops/processors, but work is happening there.

Llama 3.1 - The 8b instruct model is pretty good, decent speed and good quality. This is a good “default” model to use
Llama 3.2 - This model was just released yesterday. I’m only seeing the 1b and 3b models right now. They’ve changed the 8b model to 11b, I’m assuming the 11b model is going to be my new goto when it’s available.
Deepseek Coder v2 - A great coding assistant model
Command-r - This is a more niche model, mainly useful for RAG. It’s only available in a 35b parameter model, so not all that feasible to run locally
Mistral small - A really good model, in the ballpark of Llama. I haven’t had quite as much luck with this as with Llama but it is good and I just saw that a new version was released 8 days ago, will need to check it out again

Pumpkin Escobar@lemmy.world · 2 months ago

It’s a good thing that real open source models are getting good enough to compete with or exceed OpenAI.

Pumpkin Escobar@lemmy.world · 2 months ago

It has been on my list to figure out how to move to forgejo, need to do it soon before the migration process breaks or gets awful.

Pumpkin Escobar@lemmy.world · 4 months ago

Taking ollama for instance, either the whole model runs in vram and compute is done on the gpu, or it runs in system ram and compute is done on the cpu. Running models on CPU is horribly slow. You won’t want to do it for large models

LM studio and others allow you to run part of the model on GPU and part on CPU, splitting memory requirements but still pretty slow.

Even the smaller 7B parameter models run pretty slow in CPU and the huge models are orders of magnitude slower

So technically more system ram will let you run some larger models but you will quickly figure out you just don’t want to do it.

Pumpkin Escobar@lemmy.world · 5 months ago

FWIW they didn’t merge it, they closed the PR without merging, link to line that still exists on master.

The recent comments are from the announcement of the ladybird browser project which is forked from some browser code from Serenity OS, I guess people are digging into who wrote the code.

Not arguing that the new comments on the PR are good/bad or anything, just a bit of context.

Pumpkin Escobar@lemmy.world · 5 months ago

Been 100% linux for like 6-9 months now, these stories make me thankful for finally making the switch.

I’ve tried to make the switch 3-4 times in the past and was stopped by 2 main things:

Drivers / Laptops were tough to get set up
Gaming

The experience was so much better this time and I really have no regrets. I don’t imagine I’ll ever run Windows again outside of a VM

Pumpkin Escobar@lemmy.world · 5 months ago

Tons of remote jobs out there, probably a higher percentage for startup jobs. Most remote places will have people in different time zones and some sort of core hours they expect people to be in, but having some discussion you’ll probably be able to find one that’s accommodating.

One good site to start looking:

https://wellfound.com/remote

Good luck

Pumpkin Escobar@lemmy.world · 5 months ago

Elon “Nick Cannon” Musk

Pumpkin Escobar@lemmy.world · 6 months ago

Battle.net running in bottles works Ok. I did have an issue with battle.net running under X for a while, switching to Wayland worked. Whatever the problem was seems to have been fixed

Pumpkin Escobar@lemmy.world · 6 months ago

Btrfs will be fine, I use btrfs on a standard arch install, timeshift for managing snapshots, works well.

Pumpkin Escobar@lemmy.world · 8 months ago

Just a note, the orange pi drivers are not in great shape. It’s getting better but I have a cluster of raspberry pi’s for development, bought an orange pi without first checking out much about them and it’s rough. Rockchip CPUs are great, and the driver / firmware situation is getting better, but something I’d read up on before buying one.

I’d still look at the N100, it’s about 2.5x the performance of raspberry pi 5, and being x86 you have more options than arm.

Pumpkin Escobar@lemmy.world · edit-2 8 months ago

There are a lot of tiny PCs these days that can output 4k video and audio. Look for something with an N100 or N200 CPU if you want to go as cheap as possible, they tend to be super-cheap and perform well. I’ve got one of the GMTecs and this wireless keyboard+mouse, works really well from the couch.

There are cheaper/other options but to get you started: https://www.amazon.com/GMKtec-Windows-Computer-Business-G3-dp-B0CQ4XQ2WG/dp/B0CQ4XQ2WG https://morefine.com/collections/pc-box (specifically the M9)

Pumpkin Escobar@lemmy.world · edit-2 10 months ago

TPM & secure boot. Look into sbctl for secure boot if you’re not on something that uses the signed shim like ubuntu. I know some hate secure boot but storing the unlock key in tpm is at least much more secure than having the key sitting on a usb drive

Tang - network based unlock. If you have a separate raspberry pi or something you can set it up as a tang server. You’ll want that thing encrypted too, can set that up to require manual unlock so if someone boosts your servers the tang server never comes up, storage server won’t either

Or just manually unlock the server with a password every boot?

That’s roughly my prioritized/preferred list

Pumpkin Escobar@lemmy.world · 1 year ago

It’s the same, I picked up an Orange Pi 5 plus on sale and didn’t even think about the kernel and module driver situation. It’s rough. Joshua-Riek/ubuntu-rockchip and the other contributors do great work to un-fuck the situation and get a non-screwy ubuntu install cobbled together, but in the comments for issues even he gives off a “well, the situation is shit” sort of vibe.

I won’t buy another rockchip sbc.

Pumpkin Escobar@lemmy.world · 1 year ago

Are there any alternatives for people with gluetun allergies?