Step 03 · The project Easy · 12 min

🧱Choosing the hardware

Which mini-PC, how much RAM, which SSD? The jargon-free buying guide for a machine that runs a coding agent and AI models locally.

Before you buy anything, good news: you don’t need a beast of a machine. A coding agent and compact AI models run just fine on a small PC costing a few hundred euros. But there’s one criterion that matters more than all the others, and it’s not the one people usually look at. Let’s untangle it together, no jargon.

The three things that really matter (in order)

Forget the marketing. To run AI locally, here’s what counts, ranked from most to least important.

1. RAM, by far factor number 1

This is the decision. To answer, an AI model has to fit entirely in memory. Not enough RAM, and the model simply won’t load, or it crawls horribly. Here are the concrete tiers:

16 GB : the bare minimum. Enough to run the coding agent (which can lean on a cloud model), but too tight for a real local AI model. Avoid it if you can.
32 GB : comfortable. You run a 30-billion-parameter model in a quantized (compressed) version, which already covers a huge range of needs. A solid entry point.
64 GB : the sweet spot. This is where local AI gets really serious: you load big models, you keep headroom for the system and the agent at the same time. If you’re torn between 32 and 64, take 64.
96 GB and up : the luxury. For the heaviest models or running several things in parallel. Nice, but not necessary to start.

2. Memory bandwidth, the hidden speed

Less well known, but decisive for comfort. Memory bandwidth is the speed at which the processor reads RAM. And an AI model has to reread its entire memory to produce each word. The higher the bandwidth, the faster the words come out (we talk about tokens per second). A machine with fast memory “types” its text before your eyes; a slow machine doles it out word by word. Keep an eye on this point, especially on Macs (see below), where unified memory is particularly fast.

3. The NVMe SSD, fast and roomy

The models are big: count on ~20 GB per model, and you’ll quickly collect several. Aim for an NVMe SSD of 1 TB minimum. NVMe (and not the old SATA SSD) because loading 20 GB into memory at launch should be nearly instantaneous, not a coffee break.

So what about the processor?

It matters less than you think for this use. Any AMD Ryzen or Intel Core from the last two or three generations does the job easily. Don’t pay extra for the fastest CPU, put that money into RAM. It’s the RAM that decides what you’ll be able to do.

Which form factor?

Several families, all valid. Stay neutral, choose based on your budget and your preferences.

Barebones or ready-to-use mini-PCs : Minisforum, Beelink, GMKtec, ASUS NUC. The most flexible choice: compact, frugal, and on the barebones versions you add the RAM and SSD yourself, so you can push memory to the max cheaply.
Apple Mac mini (M chips) : an excellent alternative, and even a well-kept secret for local AI. Its unified memory is fast and shared between CPU and GPU: a 64 GB Mac mini runs models that a dedicated graphics card can’t load. Small caveat: it runs macOS, not Linux.

iGPU or dedicated graphics card?

The big question, and the answer might surprise you. For this use:

A dedicated NVIDIA graphics card greatly speeds up the speed of models. But it’s capped by its VRAM (24 to 32 GB in practice), it adds noise, power draw and cost, and it rarely fits in a mini case.
Most people get along just fine with a mini-PC with a processor/iGPU + lots of RAM, running compact MoE models (clever models that activate only a part of themselves for each answer, see Choosing your local model).

The trade-off, plainly: a dedicated GPU gives you speed, but limits your model size and adds noise. Abundant RAM gives you big models, slower but silent. To start, the “lots of RAM, no GPU” approach is the simplest and most worry-free.

Frugal and silent: perfect for 24/7

We’ll say it again because it matters: these machines draw 10 to 30 W at idle and are nearly silent. That’s exactly what you want for a box running permanently in a corner of the office. A dedicated GPU breaks that calm a bit, it’s up to you whether the speed is worth it.

Don’t want to buy? The VPS option

Let’s be honest: you can also buy nothing and rent a server in the cloud, a VPS (virtual private server). It’s a virtual machine at a host (Hetzner, OVH, Scaleway, DigitalOcean…), billed monthly, already running Linux and reachable from anywhere. Everything else in the path (agent, Docker, networking, deployment) applies just the same.

It’s a route I’ve tested less than the homemade mini-PC, so I’ll give it to you for what it is, with its trade-offs:

For : no hardware to buy, nothing to plug in, a public IP and bandwidth from the get-go, and you scale power up or down in a few clicks.
Against : it’s a subscription (from a few euros to a few dozen per month, ticking even while you sleep), your data lives on someone else’s machine, and above all the big local LLM isn’t part of the deal : affordable VPSes have no GPU, so you fall back on the hybrid approach (orchestrator in the cloud, the VPS serves your projects and, at best, small models). GPU offers exist, but the price climbs fast.

My take: to learn and host projects without buying anything, a small VPS is a perfectly decent playground. To run real local models, the heart of this site, nothing replaces a machine of your own, with its RAM and, if you want, its GPU. Up to you where you set the cursor.

The recommendation table

Three profiles, depending on your budget and ambition.

Profile	RAM	What for	Reference
Discovery budget	32 GB	Coding agent + a small quantized local model	Entry-level barebones mini-PC
Comfort (recommended)	64 GB	The sweet spot: big local models + agent, headroom everywhere	Minisforum / Beelink 64 GB, or Mac mini 32 GB
Heavy-duty	96 GB+	The heaviest models, several workloads in parallel	Mac mini 64 GB, or mini-PC + dedicated GPU if you want the speed

If you should remember just one line: aim for “Comfort,” 64 GB. It’s the best fun/price ratio, and you won’t feel cramped six months from now.

Enough generalities, here are concrete models, several of them put through the test bench. Four families, depending on what you want to do.

The all-rounder mini-PC, Minisforum M2

My recommended starting point: compact, frugal, silent, and built to run 24/7 in a corner. You run the agent and a quantized local model on it without breaking a sweat. It’s the healthiest footprint / performance / price balance to get started. → My Minisforum M2 review on Frandroid

Unified memory, Mac mini M4, Mac Studio & Framework Desktop

If local AI is your real subject, unified memory is a weapon. CPU and GPU share a single, very fast memory, which lets you load big models that no consumer graphics card can hold.

Mac mini (M4, M5) / Mac Studio : the best of the kind on the Apple side: ultra-fast unified memory, a tiny and silent machine. The latest M5 chips gain even more speed, and the Mac Studio scales very high on memory for the heaviest models. (Reminder: macOS, not Linux, see the box above.)
Framework Desktop : my favorite on the x86 unified-memory side: it carries an AMD chip with generous and very fast unified memory, repairable and open like everything Framework makes. An excellent host for beefy local models, and it runs Linux. → My Framework Desktop review on Frandroid

Why unified memory is a game changer (and how to spot it)

This is the most important technical point in this whole guide, so let’s take our time. On a classic PC, there are two separate memories: the processor’s RAM, and the graphics card’s VRAM. The GPU can only use its VRAM, often 8, 12 or 16 GB, and not a byte more. It’s a wall.

Unified memory breaks that wall: CPU and GPU share one and the same pool of memory, very fast. So you can allocate a large share of that memory to the GPU when an AI model needs it. Concretely: a 64 GB unified-memory machine can present, say, 48 GB “as VRAM” to a model. No consumer graphics card knows how to do that. This is what lets a little Mac mini or a Framework Desktop load models that a €1,500 RTX simply cannot hold.

Recognizing unified-memory platforms

Not all chips are equal on this ground. How to tell them apart:

Apple Silicon (M chips: M4, and the brand-new M5) : the reference. Very fast unified memory, dynamic and automatic GPU allocation: you have nothing to tune, macOS gives the model what it needs. The M4/M5 Pro and Max push bandwidth even higher, and a well-equipped Mac Studio swallows the heaviest models. The simplest path, and often the fastest.
AMD Ryzen AI Max (codename “Strix Halo,” in the Framework Desktop) : the x86/Linux equivalent, and it’s beefy: a very wide memory bus, and you choose in the BIOS the share allocated to the GPU (the “Variable Graphics Memory,” often up to 96 GB on well-equipped models). The best of both worlds: the flexibility of Linux and unified memory. My favorite for a serious local AI workshop under Linux.
Intel Core Ultra (Lunar Lake, and the new “Panther Lake” generation) : shared memory too, with an integrated GPU that’s improving fast on the AI side. Panther Lake clearly raises the bar compared to Lunar Lake. GPU allocation generally remains a notch less generous and less flexible than at Apple or AMD Strix Halo, but the gap is closing.
The deciding criterion. Whatever the brand, look at two numbers: the total amount of memory, and the share the platform can present to the GPU. Apple does it dynamically, Strix Halo via the BIOS, Intel more modestly. The higher these two numbers, the bigger the models you load.
Classic PC + dedicated graphics card : no unified memory: the GPU is limited to its fixed VRAM. Very fast, but capped (see just below).

The question to ask before buying: “how much memory can this machine present to the GPU for AI?” On a unified platform, the answer is counted in tens of GB. On a classic PC + GPU, it’s fixed by the card’s VRAM.

High bandwidth, a PC with a dedicated graphics card

Aiming for maximum speed, or also doing creative work (image generation, video, training)? There, a real graphics card with its dedicated memory makes full sense: its bandwidth crushes that of an iGPU, and the tokens fly. My value-for-money / memory tip: the GeForce RTX 5060 Ti 16 GB : 16 GB of VRAM at a reasonable price, the sweet spot for running good models fast without blowing the budget.

CUDA or ROCm? The software behind the GPU

A graphics card only helps AI if the software layer follows. Two worlds:

CUDA (NVIDIA) : NVIDIA’s compute platform. It’s the de facto standard: nearly the entire AI ecosystem is designed for CUDA first, so it works, right away, no tinkering. The smoothest path. That’s why I recommend an RTX 5060 Ti 16 GB on the dedicated-GPU side: broad support, and the best price / VRAM ratio right now.
ROCm (AMD) : AMD’s open equivalent. More open, improving fast, and unbeatable on price per GB of VRAM on the Radeon side. But support remains a notch rougher: depending on the card and the tool, you sometimes have to get your hands dirty.

Plainly: you want it to just work → NVIDIA / CUDA. You’re comfortable tinkering and you’re hunting GB of VRAM at the best price → AMD / ROCm.

The ports to check before buying

Before clicking “order,” run through this little checklist:

2.5 GbE Ethernet

Not mandatory, but pleasant: fast wired networking is comfortable for transferring models or serving a project. Wi-Fi gets you by, the cable reassures.

Enough USB

Enough to plug in a keyboard, an install stick, an external drive for backups. Check there are at least two or three ports.

SODIMM RAM, upgradeable

The detail that changes everything: on barebones models, the RAM is in SODIMM sticks that you install yourself. You choose your capacity and you can upgrade later. Steer clear (for this project) of machines where the RAM is soldered and fixed, except Macs, whose unified memory is chosen at purchase and can’t be changed afterward.