A Large Language Model: a neural network trained on huge amounts of text that predicts the next word, and can thus write, summarize, code and answer. It is the engine behind an agent like Claude.

The chunk of text a model handles: not quite a letter, not quite a word, more a fragment (often a syllable or word-piece). Context size and, in the cloud, billing are counted in tokens. Rule of thumb: ~1 token ≈ ~4 characters.

What is Context window?

How much text a model can 'see' at once: your request, the files, the history. Measured in tokens. Beyond it, the model forgets the start. The bigger it is, the more memory it eats.

A model's internal 'knobs', set during training. Counted in billions (a '30B' model has 30 billion). The more there are, the more capable the model... and the more memory it needs to run.

What is Quantization?

Compressing a model to fit in less memory by storing its parameters with less precision. The quality loss is often imperceptible, and it is what makes local AI possible on a mini-PC (a 30B model drops from ~60 to ~20 GB).

What is MoE (Mixture of Experts)?

A 'clever' model split into specialized sub-networks (the 'experts'), only a small fraction of which fire on each token. The result: a model that is big on paper but fast and frugal at runtime. Ideal for local AI.

Turning a text into a list of numbers (a "vector") that captures its meaning. Two texts close in meaning yield close vectors: this is the basis of semantic search and RAG.

Retrieval-Augmented Generation. First fetch the right passages from your documents (via embeddings), then hand them to the model so it answers grounded in them. The classic remedy against hallucinations.

Running an already-trained model to produce an answer. Distinct from training: here you only use the model. It is what your mini-PC does when it generates text.

The instruction you give the model: the question, the directive, the context. Writing a good prompt (being precise, giving examples, framing it) radically changes the quality of the answer.

Lightly re-training an existing model on your own data to specialize it. Powerful but heavy: for most uses, a good prompt or RAG is enough and far cheaper.

What is Hallucination?

When a model states something false with full confidence: a function that does not exist, a made-up source. It is not 'lying', it is predicting plausible text. Hence the golden rule: verify, especially code.

Glossary

All the technical vocabulary in this guide, explained simply. No jargon for its own sake: just what you need to follow along, in plain words. The dotted-underlined words in the guides link back here.

AI & models

↑ Categories

Context window: How much text a model can 'see' at once: your request, the files, the history. Measured in tokens. Beyond it, the model forgets the start. The bigger it is, the more memory it eats.
Embeddings: Turning a text into a list of numbers (a "vector") that captures its meaning. Two texts close in meaning yield close vectors: this is the basis of semantic search and RAG.
Fine-tuning: Lightly re-training an existing model on your own data to specialize it. Powerful but heavy: for most uses, a good prompt or RAG is enough and far cheaper.
Frontier model: The most advanced models of the moment, at the cutting edge of what AI can do (Claude, GPT, Gemini…). Huge, they run in the cloud, not on your mini-PC. In a hybrid setup, one of them often acts as the orchestrator while smaller local models handle the rest.
Hallucination: When a model states something false with full confidence: a function that does not exist, a made-up source. It is not 'lying', it is predicting plausible text. Hence the golden rule: verify, especially code.
Inference: Running an already-trained model to produce an answer. Distinct from training: here you only use the model. It is what your mini-PC does when it generates text.
LLM: A Large Language Model: a neural network trained on huge amounts of text that predicts the next word, and can thus write, summarize, code and answer. It is the engine behind an agent like Claude.
MoE (Mixture of Experts): A 'clever' model split into specialized sub-networks (the 'experts'), only a small fraction of which fire on each token. The result: a model that is big on paper but fast and frugal at runtime. Ideal for local AI.
Open source (vs open-weight): For a model, truly 'open source' would mean opening everything: not just the weights, but also the training code, the data and a free licence. In practice, most so-called 'open' models (Qwen, Llama…) are open-weight, not open source: you get the weights, rarely the full recipe. A common shorthand, but the nuance matters.
Open-weight model: A model whose 'weights' (the trained parameters) are published and freely downloadable. You can run it on your own machine, no permission needed: this is what makes local AI possible. Careful, 'open weights' does not mean 'open source' (see below).
Parameters: A model's internal 'knobs', set during training. Counted in billions (a '30B' model has 30 billion). The more there are, the more capable the model... and the more memory it needs to run.
Prompt: The instruction you give the model: the question, the directive, the context. Writing a good prompt (being precise, giving examples, framing it) radically changes the quality of the answer.
Quantization: Compressing a model to fit in less memory by storing its parameters with less precision. The quality loss is often imperceptible, and it is what makes local AI possible on a mini-PC (a 30B model drops from ~60 to ~20 GB).
RAG: Retrieval-Augmented Generation. First fetch the right passages from your documents (via embeddings), then hand them to the model so it answers grounded in them. The classic remedy against hallucinations.
Token: The chunk of text a model handles: not quite a letter, not quite a word, more a fragment (often a syllable or word-piece). Context size and, in the cloud, billing are counted in tokens. Rule of thumb: ~1 token ≈ ~4 characters.

Agents

↑ Categories

Agent: An LLM given tools (read files, run commands, search the web) that loops on its own until it reaches a goal. That is the difference between a chatbot that answers and an assistant that acts.
Agent loop: The cycle an agent repeats: think → act (call a tool) → observe the result → start again, until the task is done. This loop is what makes it autonomous.
Hook: An automatic action triggered by an event: e.g. run the linter every time the agent edits a file. A hook is the 'when X, do Y' that does not rely on the model's goodwill.
MCP: Model Context Protocol: an open standard for plugging tools and data sources into an agent without reinventing the connection each time. A universal socket between the agent and the rest of the world.
Memory file: A file the agent re-reads every session to remember the project: conventions, decisions, context. Since it starts fresh each time, this is its long-term memory, written down in black and white.
Orchestrator: The 'lead' agent that thinks, splits the work and delegates to other agents or tools. In a hybrid setup, it is often the one kept in the cloud (a big model) while the local machine executes.
Skill: A reusable know-how you teach the agent once: a procedure, a command, a mini how-to stored in a file. You then invoke it with a single word instead of re-explaining everything.
Sub-agent: A secondary agent spawned by the main one for a focused task (explore the code, run a search). It works on its own and returns only its conclusion, keeping the lead agent’s context uncluttered.
Tools (tool use): The capabilities an agent can trigger beyond generating text: read/write a file, run a command, call an API. Tools are what turn a talkative model into an assistant that actually does things.

Hardware

↑ Categories

CPU (processor): The central processor, the machine’s general-purpose “brain”. For local AI it matters less than you’d think: any recent CPU will do; better to put the money into RAM.
CUDA: NVIDIA’s compute platform. The de facto standard: nearly the entire AI ecosystem targets CUDA first, so “it just works”. The smoothest path on the dedicated-GPU side.
GPU (graphics card): The graphics processor. Great at the massively parallel math of AI, it speeds up inference a lot. A dedicated card gives speed but is capped by its VRAM and adds noise and power draw.
iGPU (integrated GPU): The GPU built into the processor, sharing system RAM instead of having its own VRAM. Slower than a dedicated card, but frugal, silent, and on unified-memory platforms able to load big models.
Memory bandwidth: The speed at which the processor reads memory. Since a model re-reads all its memory to produce each word, it dictates how fast text comes out (tokens per second). The “hidden speed” people forget to check.
NVMe SSD: The fastest storage drive today. Essential here: loading a ~20 GB model into memory must be near-instant. Aim for 1 TB minimum, as models pile up fast.
RAM: The machine’s working memory. For local AI it is THE number-one factor: a model must fit entirely inside it to run. Too little RAM and the model won’t load, or crawls.
ROCm: AMD’s open equivalent of CUDA. Unbeatable on price per GB of VRAM and improving fast, but support is still a notch rougher: depending on the card and tool, you sometimes have to get your hands dirty.
Unified memory: A single, fast memory shared between the CPU and the GPU (Apple M chips, AMD Strix Halo…). A large share can be handed to the GPU: a 64 GB machine can thus load models no consumer graphics card can hold.
VRAM: The memory built into a graphics card. Very fast, but fixed and limited (often 8 to 32 GB): a model cannot exceed the card’s VRAM. This is the “wall” that unified memory tears down.

Networking & access

↑ Categories

API: A “socket” through which two programs talk. Your agent calls a cloud model’s API; your project exposes an API that other programs query. It is the interface, not a human, doing the talking.
API key: The password that lets your program use an API (and often what usage is billed against). Treat it as a secret: never in plaintext in the code, never pushed to GitHub.
Cloudflare Tunnel: A service that cleanly exposes a local project on the Internet, with a real domain name and HTTPS, without opening a single port on your router. The machine dials out to Cloudflare, never the reverse: simpler and safer.
DNS: The Internet’s phone book: it translates a human name (mydomain.com) into the IP address machines understand. When you set up a domain, DNS is what you configure.
Firewall: The filter deciding which network connections are allowed in or out. Well configured (close everything, open only the strict minimum), it is a basic building block to avoid exposing the machine.
IP address: The number that identifies a machine on a network (e.g. 192.168.1.20). Local (on your router) or public (on the Internet): it is the address at which your mini-PC is reached.
SSH: Secure Shell: the protocol to drive a machine remotely from the terminal, encrypted. This is how you’ll command your mini-PC from your laptop, with no keyboard or screen attached to it.
SSH key: A pair of cryptographic keys (one public, one private) that replaces the password for SSH. You drop the public one on the server and guard the private one: password-free login, and far safer.
Tailscale: A private “mesh” VPN that links all your devices into one encrypted network, as if side by side, wherever they are. The simplest and safest way to reach your mini-PC from outside.
VPN: Virtual Private Network: an encrypted tunnel between your devices over the Internet. Everything passes through it shielded from prying eyes, as if the machines were on the same local network.

System & tools

↑ Categories

CLI: Command-Line Interface: a program you drive by typing commands in the terminal rather than clicking. Most coding agents (Claude Code, OpenCode) are CLIs: lightweight, scriptable, and perfect over a remote connection.
Commit: A dated snapshot of your code at a given moment, with a short message describing the change. It is the unit of Git history: you can always come back to it.
Container: A lightweight, isolated “box” holding an application and its dependencies, ready to run identically anywhere. Lighter than a virtual machine, it is Docker’s basic unit.
Docker: The tool that packages an application with everything it needs into an isolated container. Each project lives in its own bubble, without polluting the system or the others: install, throw away, start over, breaking nothing.
Docker image: The frozen template from which containers are launched: a ready-to-use snapshot of the application. You download an image and start as many identical containers from it as you like.
Git: The system that keeps your code’s history: each change (a “commit”) is recorded, you can roll back, compare, work with others. The indispensable safety net, especially when an agent edits your files.
Linux: The free and open operating system that runs the vast majority of servers. Light, stable, fully keyboard-drivable: the ideal foundation for an always-on machine.
Markdown: A simple way to write formatted text with a few symbols: a hash for a heading, two stars around a word for bold, dashes for lists. Readable as-is, converted to a web page afterwards. It is the format of memory files, READMEs and almost all technical docs.
Ollama: The simplest tool to run AI models locally: one command to download a model, another to talk to it. It handles memory, quantization and exposes a local API for your projects.
Repository (repo): The “binder” of a Git-tracked project: all its code and full history. It lives locally on your machine, and usually a “remote” copy is hosted on GitHub for backup and sharing.
Shell: The program that interprets your commands in the terminal (bash, zsh…). It understands what you type, chains commands together and runs your scripts.
sudo: The command that runs an action with administrator (“superuser”) rights. You put it in front of a command when it touches the system. Use it wisely: with those rights, you can break everything.
systemd: The conductor of services on Linux. It starts your programs at boot, restarts them if they crash and keeps them alive around the clock. You hand it the agent, the tunnel, your projects.
Terminal (command line): The window where you type text commands to drive the machine, no mouse or buttons. Intimidating at first, but it is the most direct and powerful way, and the natural home of agents.
tmux: A 'terminal multiplexer': it keeps your sessions alive even after you disconnect. Essential for remote work: you launch an agent over SSH, close the laptop, and it keeps running. You find it intact when you reconnect.
Ubuntu: The most widespread Linux distribution for both desktop and server. Stick to its “LTS” versions (Long-Term Support, like 24.04), stable and maintained for years: the safe pick for this project.
VPS: Virtual Private Server: a machine rented in the cloud, billed monthly, already on Linux and reachable anywhere. An alternative to the homemade mini-PC when you don’t want to buy anything, but without a big GPU and with a subscription to pay.