# Lean Models - Run Frontier-Scale AI on Your Hardware

An inference runtime that runs massive open-weight MoE models on consumer GPUs by intelligently offloading experts across VRAM, RAM, and SSD. From Qwen3.5 to Arcee Trinity - 35B to 398B parameters on your hardware.

[Get Started](#quick-start) | [View Models](/models)

## How It Works

### Expert Offloading Engine
Run 398B+ parameter MoE models on consumer GPUs. Only active experts load into VRAM - cold experts live in RAM and SSD, with speculative prefetching that predicts what you'll need next.

### `.lmpack` Model Format
File-per-expert packaging enables mmap-based memory management. The OS kernel handles caching automatically - hot experts stay in RAM, cold experts page in from NVMe.

### Built for Performance
Flash attention, multi-GPU pipeline parallelism, and speculative router prefetch. 93% VRAM cache hit rate. Output validated bit-identical against llama.cpp. OpenAI-compatible API included.

## Quick Start

```bash
curl -sSf https://leanmodels.ai/install.sh | sh
lean pull lean-agent-35b
lean run lean-agent-35b
```

## Hardware Tiers

Three-tier memory hierarchy: VRAM, RAM, and NVMe SSD.

| Tier | VRAM | RAM | NVMe | Models |
|------|------|-----|------|--------|
| Minimal | 12 GB | 16 GB (32 GB for lean-coder-80b) | 1.8 TB | lean-agent-35b, lean-coder-80b |
| Prosumer | 24 GB | 32 GB (64 GB recommended) | 1.8 TB | lean-agent-122b |
| Enthusiast | 48 GB | 64 GB (128 GB recommended) | 1.8 TB | lean-reason-397b, lean-think-398b |

---

The intelligence is already in open-weight MoE models. We solve the engineering problem of fitting them in memory.

[View Models](/models) | [Benchmarks](/benchmarks)
