Find Your Perfect Local AI Model

Compare system requirements, performance benchmarks, and get personalized recommendations based on your hardware.

Your Hardware

Hardware detection not yet run

Showing 57 of 57 models

All-MiniLM

Sentence Transformers 路 23M

Compact and fast embedding model. Good balance of quality and speed for semantic search.

embedding embeddingfastlightweight
Size: 0.05GB
RAM: 0.5GB
VRAM: 0.25GB
Context: 0K
Performance (tokens/sec)
800
CPU
3000
Mid GPU
8000
High GPU
ollama pull all-minilm

BakLLaVA

SkunkworksAI 路 7B

Mistral-based vision model. Fast multimodal inference with good image understanding.

vision visionmultimodalfast
Size: 4.5GB
RAM: 8GB
VRAM: 6GB
Context: 4K
Performance (tokens/sec)
15
CPU
58
Mid GPU
110
High GPU
ollama pull bakllava

BGE-M3

BAAI 路 568M

Multilingual embedding model supporting 100+ languages. Great for cross-lingual retrieval.

embedding embeddingmultilingualretrieval
Size: 1.1GB
RAM: 2GB
VRAM: 1.5GB
Context: 8K
Performance (tokens/sec)
250
CPU
1200
Mid GPU
3000
High GPU
ollama pull bge-m3

Code Llama

Meta 路 34B

Meta's specialized coding model based on Llama 2. Excellent for code completion and generation.

coding codingprogramminginfill
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 16K
Performance (tokens/sec)
5
CPU
28
Mid GPU
65
High GPU
ollama pull codellama

CodeGemma

Google 路 7B

Google's coding variant of Gemma. Optimized for code completion and generation.

coding codinggoogleefficient
Size: 4.3GB
RAM: 8GB
VRAM: 6GB
Context: 8K
Performance (tokens/sec)
18
CPU
72
Mid GPU
135
High GPU
ollama pull codegemma

Command R

Cohere 路 35B

Cohere's model optimized for RAG and tool use. Excellent at following complex instructions.

general ragtool-useenterprise
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 128K
Performance (tokens/sec)
5
CPU
26
Mid GPU
60
High GPU
ollama pull command-r

Command R+

Cohere 路 104B

Cohere's largest model. Optimized for RAG, tool use, and complex multi-step tasks.

general ragtool-useenterprise
Size: 60GB
RAM: 68GB
VRAM: 64GB
Context: 128K
Performance (tokens/sec)
2
CPU
10
Mid GPU
30
High GPU
ollama pull command-r-plus

DeepSeek Coder V2

DeepSeek 路 16B

Advanced coding model with MoE architecture. Excellent for complex programming tasks.

coding codingprogrammingmoe
Size: 9.4GB
RAM: 14GB
VRAM: 12GB
Context: 128K
Performance (tokens/sec)
10
CPU
45
Mid GPU
95
High GPU
ollama pull deepseek-coder-v2

DeepSeek R1

DeepSeek 路 70B

State-of-the-art reasoning model with chain-of-thought capabilities. Excels at math, coding, and complex reasoning.

reasoning reasoningmathcoding
Size: 43GB
RAM: 48GB
VRAM: 44GB
Context: 64K
Performance (tokens/sec)
2.5
CPU
14
Mid GPU
42
High GPU
ollama pull deepseek-r1

DeepSeek R1 14B

DeepSeek 路 14B

Mid-sized distilled R1. Excellent reasoning capabilities for its size.

reasoning reasoningmathdistilled
Size: 8.5GB
RAM: 12GB
VRAM: 10GB
Context: 64K
Performance (tokens/sec)
12
CPU
48
Mid GPU
95
High GPU
ollama pull deepseek-r1:14b

DeepSeek R1 7B

DeepSeek 路 7B

Compact distilled version of R1. Strong reasoning in an efficient package.

reasoning reasoningmathdistilled
Size: 4.7GB
RAM: 8GB
VRAM: 6GB
Context: 64K
Performance (tokens/sec)
18
CPU
72
Mid GPU
135
High GPU
ollama pull deepseek-r1:7b

DeepSeek R1 Distill (Qwen)

DeepSeek 路 32B

Distilled version of R1 based on Qwen. Great reasoning in a smaller package.

reasoning reasoningdistilledefficient
Size: 19GB
RAM: 24GB
VRAM: 20GB
Context: 32K
Performance (tokens/sec)
6
CPU
28
Mid GPU
65
High GPU
ollama pull deepseek-r1-distill-qwen

Dolphin Llama 3

Cognitive Computations 路 8B

Uncensored Llama 3 fine-tune. General purpose assistant without content restrictions.

general uncensoredhelpfulchat
Size: 4.7GB
RAM: 8GB
VRAM: 6GB
Context: 8K
Performance (tokens/sec)
18
CPU
72
Mid GPU
135
High GPU
ollama pull dolphin-llama3

Dolphin Mixtral

Cognitive Computations 路 8x7B (47B)

Uncensored Mixtral fine-tune. Helpful, harmless, and honest without refusals.

general uncensoredmoehelpful
Size: 26GB
RAM: 32GB
VRAM: 28GB
Context: 32K
Performance (tokens/sec)
5
CPU
24
Mid GPU
55
High GPU
ollama pull dolphin-mixtral

EverythingLM

Community 路 13B

Extended context Llama model. Capable of handling very long documents.

general long-contextchat
Size: 7.9GB
RAM: 12GB
VRAM: 10GB
Context: 16K
Performance (tokens/sec)
12
CPU
48
Mid GPU
95
High GPU
ollama pull everythinglm

Gemma 2

Google 路 27B

Google's open model built from Gemini research. Available in 2B, 9B, and 27B sizes.

general googleefficientinstruct
Size: 16GB
RAM: 20GB
VRAM: 18GB
Context: 8K
Performance (tokens/sec)
7
CPU
32
Mid GPU
75
High GPU
ollama pull gemma2

Gemma 2 2B

Google 路 2B

Google's smallest Gemma. Fast and efficient for basic tasks.

small tinyfastgoogle
Size: 1.6GB
RAM: 4GB
VRAM: 2.5GB
Context: 8K
Performance (tokens/sec)
40
CPU
140
Mid GPU
260
High GPU
ollama pull gemma2:2b

Gemma 2 9B

Google 路 9B

Google's efficient 9B model. Great performance in a compact size.

general googleefficientinstruct
Size: 5.4GB
RAM: 10GB
VRAM: 8GB
Context: 8K
Performance (tokens/sec)
16
CPU
62
Mid GPU
120
High GPU
ollama pull gemma2:9b

Granite Code

IBM 路 34B

IBM's enterprise-focused coding model. Trained on 116 programming languages.

coding codingenterpriseibm
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 8K
Performance (tokens/sec)
5
CPU
26
Mid GPU
58
High GPU
ollama pull granite-code

Llama 3.1 70B

Meta 路 70B

Meta's powerful 70B model. Excellent reasoning and instruction following with 128K context.

general chatinstructmultilingual
Size: 40GB
RAM: 48GB
VRAM: 44GB
Context: 128K
Performance (tokens/sec)
3
CPU
15
Mid GPU
45
High GPU
ollama pull llama3.1:70b

Llama 3.1 8B

Meta 路 8B

Meta's versatile 8B model with 128K context. Great balance of capability and efficiency.

general chatinstructmultilingual
Size: 4.7GB
RAM: 8GB
VRAM: 6GB
Context: 128K
Performance (tokens/sec)
18
CPU
75
Mid GPU
140
High GPU
ollama pull llama3.1:8b

Llama 3.2

Meta 路 3B

Efficient smaller models from Meta, perfect for on-device deployment. Available in 1B and 3B sizes.

small lightweightfastmobile
Size: 2GB
RAM: 4GB
VRAM: 3GB
Context: 128K
Performance (tokens/sec)
35
CPU
120
Mid GPU
200
High GPU
ollama pull llama3.2

Llama 3.2 1B

Meta 路 1B

Ultra-lightweight model from Meta. Perfect for edge devices and fast inference.

small tinyfastedge
Size: 0.75GB
RAM: 2GB
VRAM: 1.5GB
Context: 128K
Performance (tokens/sec)
55
CPU
180
Mid GPU
320
High GPU
ollama pull llama3.2:1b

Llama 3.2 Vision

Meta 路 11B

Multimodal model that can understand images and text. Available in 11B and 90B variants.

vision multimodalimage-understandingvision
Size: 6.5GB
RAM: 10GB
VRAM: 8GB
Context: 128K
Performance (tokens/sec)
12
CPU
45
Mid GPU
90
High GPU
ollama pull llama3.2-vision

Llama 3.2 Vision 90B

Meta 路 90B

Meta's largest vision model. State-of-the-art multimodal understanding.

vision visionmultimodallarge
Size: 52GB
RAM: 60GB
VRAM: 56GB
Context: 128K
Performance (tokens/sec)
2
CPU
10
Mid GPU
28
High GPU
ollama pull llama3.2-vision:90b

Llama 3.3

Meta 路 70B

Meta's latest and most capable open model. Excellent for general tasks, coding, and reasoning with 128K context.

general chatinstructmultilingual
Size: 43GB
RAM: 48GB
VRAM: 44GB
Context: 128K
Performance (tokens/sec)
3
CPU
15
Mid GPU
45
High GPU
ollama pull llama3.3

LLaVA

LLaVA Team 路 13B

Visual instruction-tuned model combining CLIP vision with Llama. Great for image understanding.

vision visionmultimodalimage-understanding
Size: 8GB
RAM: 12GB
VRAM: 10GB
Context: 4K
Performance (tokens/sec)
10
CPU
40
Mid GPU
85
High GPU
ollama pull llava

LLaVA 34B

LLaVA Team 路 34B

Large vision-language model. Excellent image understanding and reasoning.

vision visionmultimodallarge
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 4K
Performance (tokens/sec)
4
CPU
20
Mid GPU
48
High GPU
ollama pull llava:34b

Magicoder

ise-uiuc 路 7B

OSS-Instruct trained coding model. Excels at generating clean, documented code.

coding codingclean-codedocumented
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 16K
Performance (tokens/sec)
18
CPU
72
Mid GPU
135
High GPU
ollama pull magicoder

Mistral 7B

Mistral AI 路 7B

Highly efficient 7B model that punches above its weight. Great balance of speed and capability.

general efficientfastinstruct
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 32K
Performance (tokens/sec)
20
CPU
80
Mid GPU
150
High GPU
ollama pull mistral

Mistral Large

Mistral AI 路 123B

Mistral's most capable model. Excellent for complex reasoning and enterprise use.

general enterprisereasoninglarge
Size: 72GB
RAM: 80GB
VRAM: 76GB
Context: 128K
Performance (tokens/sec)
1.5
CPU
8
Mid GPU
25
High GPU
ollama pull mistral-large

Mistral Small

Mistral AI 路 24B

Latest Mistral model optimized for efficiency. Enterprise-grade quality in a compact size.

general enterpriseefficientfunction-calling
Size: 14GB
RAM: 18GB
VRAM: 16GB
Context: 32K
Performance (tokens/sec)
8
CPU
35
Mid GPU
80
High GPU
ollama pull mistral-small

Mixtral 8x7B

Mistral AI 路 8x7B (47B)

Mixture of Experts model that activates only 2 experts per token. Fast inference with high quality.

general moeefficientmultilingual
Size: 26GB
RAM: 32GB
VRAM: 28GB
Context: 32K
Performance (tokens/sec)
5
CPU
25
Mid GPU
60
High GPU
ollama pull mixtral

Moondream

vikhyatk 路 1.8B

Tiny but capable vision model. Perfect for edge devices and fast image analysis.

vision visiontinyfast
Size: 1.1GB
RAM: 2GB
VRAM: 1.5GB
Context: 2K
Performance (tokens/sec)
35
CPU
120
Mid GPU
220
High GPU
ollama pull moondream

Mxbai Embed Large

Mixedbread AI 路 335M

State-of-the-art embedding model. Top performance on MTEB benchmarks.

embedding embeddingragretrieval
Size: 0.67GB
RAM: 2GB
VRAM: 1GB
Context: 1K
Performance (tokens/sec)
300
CPU
1500
Mid GPU
4000
High GPU
ollama pull mxbai-embed-large

Neural Chat

Intel 路 7B

Intel's fine-tuned Mistral for conversational AI. Optimized for dialogue.

general chatconversationalintel
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 8K
Performance (tokens/sec)
20
CPU
78
Mid GPU
145
High GPU
ollama pull neural-chat

Nomic Embed Text

Nomic AI 路 137M

High-quality text embedding model. Perfect for RAG, semantic search, and similarity matching.

embedding embeddingragsearch
Size: 0.27GB
RAM: 1GB
VRAM: 0.5GB
Context: 8K
Performance (tokens/sec)
500
CPU
2000
Mid GPU
5000
High GPU
ollama pull nomic-embed-text

Nous Hermes 2

Nous Research 路 34B

Nous Research's flagship model. Excellent for roleplay, creative writing, and reasoning.

general roleplaycreativereasoning
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 4K
Performance (tokens/sec)
5
CPU
26
Mid GPU
58
High GPU
ollama pull nous-hermes2

OpenHermes 2.5

Teknium 路 7B

Powerful Mistral fine-tune trained on large synthetic dataset. Great instruction following.

general instructchatsynthetic
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 8K
Performance (tokens/sec)
20
CPU
78
Mid GPU
145
High GPU
ollama pull openhermes

Orca Mini

Pankaj Mathur 路 7B

Compact model with strong reasoning. Good balance between size and capability.

small reasoningefficientinstruct
Size: 4GB
RAM: 8GB
VRAM: 6GB
Context: 4K
Performance (tokens/sec)
22
CPU
85
Mid GPU
160
High GPU
ollama pull orca-mini

Phi-3

Microsoft 路 14B

Microsoft's small language model. Surprisingly capable for its size, great for resource-constrained environments.

small efficientsmallreasoning
Size: 8.2GB
RAM: 12GB
VRAM: 10GB
Context: 128K
Performance (tokens/sec)
15
CPU
55
Mid GPU
110
High GPU
ollama pull phi3

Phi-4

Microsoft 路 14B

Microsoft's latest small model with exceptional reasoning. Trained on high-quality synthetic data.

reasoning reasoningmathefficient
Size: 8.4GB
RAM: 12GB
VRAM: 10GB
Context: 16K
Performance (tokens/sec)
14
CPU
52
Mid GPU
105
High GPU
ollama pull phi4

Qwen 2.5

Alibaba 路 72B

Alibaba's flagship model with excellent multilingual support. Available from 0.5B to 72B.

general multilingualchinesechat
Size: 44GB
RAM: 50GB
VRAM: 46GB
Context: 128K
Performance (tokens/sec)
2.8
CPU
14
Mid GPU
40
High GPU
ollama pull qwen2.5

Qwen 2.5 7B

Alibaba 路 7B

Efficient 7B version of Alibaba's Qwen 2.5. Great multilingual support and reasoning.

general multilingualchineseefficient
Size: 4.4GB
RAM: 8GB
VRAM: 6GB
Context: 128K
Performance (tokens/sec)
20
CPU
78
Mid GPU
145
High GPU
ollama pull qwen2.5:7b

Qwen 2.5 Coder

Alibaba 路 32B

Specialized coding model with excellent code completion and generation. Supports 92 programming languages.

coding codingprogrammingcode-completion
Size: 19GB
RAM: 24GB
VRAM: 20GB
Context: 128K
Performance (tokens/sec)
6
CPU
30
Mid GPU
70
High GPU
ollama pull qwen2.5-coder

Qwen 2.5 Coder 7B

Alibaba 路 7B

Efficient coding model supporting 92 languages. Great for code completion on consumer hardware.

coding codingprogrammingefficient
Size: 4.4GB
RAM: 8GB
VRAM: 6GB
Context: 128K
Performance (tokens/sec)
20
CPU
80
Mid GPU
150
High GPU
ollama pull qwen2.5-coder:7b

SmolLM2

Hugging Face 路 1.7B

Hugging Face's tiny but capable model. Perfect for on-device inference and testing.

small tinyfastefficient
Size: 1GB
RAM: 2GB
VRAM: 1.5GB
Context: 8K
Performance (tokens/sec)
50
CPU
170
Mid GPU
300
High GPU
ollama pull smollm2

Snowflake Arctic Embed

Snowflake 路 335M

High-performance embedding model. Top results on MTEB retrieval benchmarks.

embedding embeddingretrievalhigh-quality
Size: 0.67GB
RAM: 2GB
VRAM: 1GB
Context: 1K
Performance (tokens/sec)
300
CPU
1400
Mid GPU
3500
High GPU
ollama pull snowflake-arctic-embed

Solar

Upstage 路 10.7B

Upstage's depth-upscaled model. Strong performance through novel training approach.

general chatinstructupscaled
Size: 6.3GB
RAM: 10GB
VRAM: 8GB
Context: 4K
Performance (tokens/sec)
14
CPU
55
Mid GPU
108
High GPU
ollama pull solar

Stable Code

Stability AI 路 3B

Stability AI's coding model. Good for code completion and generation tasks.

coding codingsmallfast
Size: 1.8GB
RAM: 4GB
VRAM: 3GB
Context: 16K
Performance (tokens/sec)
35
CPU
130
Mid GPU
240
High GPU
ollama pull stable-code

StarCoder 2

BigCode 路 15B

Code LLM trained on The Stack v2. Excellent for code completion across 600+ programming languages.

coding codingcode-completionprogramming
Size: 9GB
RAM: 14GB
VRAM: 11GB
Context: 16K
Performance (tokens/sec)
12
CPU
50
Mid GPU
100
High GPU
ollama pull starcoder2

TinyLlama

TinyLlama 路 1.1B

Ultra-compact 1.1B model. Perfect for testing, edge devices, and resource-limited environments.

small tinyfastedge
Size: 0.64GB
RAM: 2GB
VRAM: 1GB
Context: 2K
Performance (tokens/sec)
60
CPU
200
Mid GPU
350
High GPU
ollama pull tinyllama

Wizard Vicuna Uncensored

Eric Hartford 路 13B

Uncensored model combining Wizard and Vicuna. Good for creative and unrestricted tasks.

general uncensoredcreativechat
Size: 7.9GB
RAM: 12GB
VRAM: 10GB
Context: 4K
Performance (tokens/sec)
12
CPU
48
Mid GPU
95
High GPU
ollama pull wizard-vicuna-uncensored

WizardCoder

WizardLM 路 33B

Evol-Instruct trained coding model. Strong on complex programming tasks.

coding codingcomplex-tasksevol-instruct
Size: 19GB
RAM: 24GB
VRAM: 21GB
Context: 8K
Performance (tokens/sec)
5
CPU
26
Mid GPU
58
High GPU
ollama pull wizardcoder

Yi 1.5

01.AI 路 34B

01.AI's bilingual model excelling in English and Chinese. Strong reasoning capabilities.

general bilingualchinesereasoning
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 32K
Performance (tokens/sec)
5
CPU
26
Mid GPU
58
High GPU
ollama pull yi

Yi Coder

01.AI 路 9B

01.AI's specialized coding model. Strong performance on code generation benchmarks.

coding codingprogrammingbilingual
Size: 5.4GB
RAM: 10GB
VRAM: 8GB
Context: 128K
Performance (tokens/sec)
16
CPU
60
Mid GPU
115
High GPU
ollama pull yi-coder

Zephyr

Hugging Face 路 7B

Hugging Face's DPO-aligned Mistral. Helpful and aligned assistant.

general chatalignedhelpful
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 32K
Performance (tokens/sec)
20
CPU
78
Mid GPU
145
High GPU
ollama pull zephyr
馃 OllamaModels.com

Find the best local AI models for your hardware. Not affiliated with Ollama.