NVIDIA RTX 4090 Llama 3.1 8B çalıştırabilir mi?

Evet — 4-bit sürüm 4.9 GB'lık bir indirme ve 24 GB VRAM içine sığıyor. Yaklaşık ~177 tok/s bekleyebilirsin.

NVIDIA RTX 4090 üzerinde çalışan en büyük LLM hangisi?

Katalogda sığan en büyük model Qwen 3.5 35B-A3B (4-bit'te 21.2 GB). Yaklaşık ~471 tok/s bekleyebilirsin.

NVIDIA RTX 4090 yerel LLM'ler için ne kadar hızlı?

Token üretimi bellek bant genişliğine bağlıdır. Yaklaşık 1008 GB/s ile NVIDIA RTX 4090, 4-bit'te 8B sınıfı bir modelde ~177 tok/s civarı üretir — hız, model boyutuyla ters orantılı ölçeklenir.

Modelin tamamı VRAM'e sığmak zorunda mı?

Tam GPU hızı için evet. llama.cpp gibi çalışma zamanları katmanları VRAM ile sistem RAM'i arasında bölebilir, ama RAM'e taşan her katman üretimi belirgin biçimde yavaşlatır.

← Tüm modellerCİHAZ KONTROLÜ

NVIDIA RTX 4090 hangi LLM'leri çalıştırır?

NVIDIA RTX 4090, 24 GB VRAM belleğe ve yaklaşık 1008 GB/s bellek bant genişliğine sahip. Aşağıda katalogdaki sığan her model, tahmini üretim hızıyla listeleniyor. En büyük seçim: Qwen 3.5 35B-A3B, ~471 tok/s.

Teknik özellikler

Bellek24 GB VRAM

Bant genişliği~1008 GB/s

Bellek türüAyrılmış VRAM

Çıkış2022-10

NVIDIA RTX 4090 üzerindeki modeller

62 / 73 model

Model	İndirme (Q4)	Sığar mı?	~Hız
Qwen 3.5 35B-A3BAlibaba	21.2 GB	Çalışır	~471 tok/s
Qwen 3.6 35B-A3BAlibaba	21.2 GB	Çalışır	~471 tok/s
Command R 35BCohere	21.2 GB	Çalışır	~40 tok/s
Qwen3-VL 32BAlibaba	20.0 GB	Çalışır	~43 tok/s
EXAONE 4.5 33BLG AI Research	20.0 GB	Çalışır	~43 tok/s
Qwen 3 32BAlibaba	19.9 GB	Çalışır	~43 tok/s
Qwen 2.5 Coder 32BAlibaba	19.9 GB	Çalışır	~43 tok/s
QwQ 32BAlibaba	19.9 GB	Çalışır	~43 tok/s
DeepSeek R1 32BDeepSeek	19.9 GB	Çalışır	~43 tok/s
Granite 4.0 H SmallIBM	19.4 GB	Çalışır	~157 tok/s
Nemotron 3 Nano 30B-A3BNVIDIA	19.2 GB	Çalışır	~393 tok/s
Gemma 4 31BGoogle	18.6 GB	Çalışır	~46 tok/s
Qwen 3 30B-A3BAlibaba	18.5 GB	Çalışır	~428 tok/s
Qwen3-VL 30B-A3BAlibaba	18.2 GB	Çalışır	~471 tok/s
Gemma 3 27BGoogle	16.6 GB	Çalışır	~52 tok/s
Qwen 3.5 27BAlibaba	16.4 GB	Çalışır	~52 tok/s
Qwen 3.6 27BAlibaba	16.4 GB	Çalışır	~52 tok/s
Gemma 4 26B A4BGoogle	15.3 GB	Çalışır	~372 tok/s
Mistral Small 3.1 24BMistral AI	14.6 GB	Çalışır	~59 tok/s
Devstral 24BMistral AI	14.6 GB	Çalışır	~59 tok/s
Magistral Small 1.2Mistral AI	14.6 GB	Çalışır	~59 tok/s
Devstral Small 2 24BMistral AI	14.6 GB	Çalışır	~59 tok/s
Codestral 22BMistral AI	13.5 GB	Çalışır	~64 tok/s
GPT-OSS 20BOpenAI	12.7 GB	Çalışır	~393 tok/s
Phi-4 Reasoning Vision 15BMicrosoft	9.1 GB	Çalışır	~94 tok/s
Qwen 3 14BAlibaba	9.0 GB	Çalışır	~95 tok/s
DeepSeek R1 14BDeepSeek	9.0 GB	Çalışır	~95 tok/s
Phi-4 14BMicrosoft	8.9 GB	Çalışır	~96 tok/s
Ministral 3 14BMistral AI	8.5 GB	Çalışır	~101 tok/s
OLMo 2 13BAi2	8.3 GB	Çalışır	~103 tok/s
Gemma 3 12BGoogle	7.4 GB	Çalışır	~116 tok/s
Mistral Nemo 12BMistral AI	7.4 GB	Çalışır	~116 tok/s
Gemma 4 12BGoogle	7.3 GB	Çalışır	~118 tok/s
Mellum 2 12B-A2.5BJetBrains	7.3 GB	Çalışır	~565 tok/s
Qwen 3.5 9BAlibaba	5.5 GB	Çalışır	~157 tok/s
GLM-4.6V-FlashZ.ai	5.5 GB	Çalışır	~157 tok/s
Qwen 2.5 VL 7BAlibaba	5.0 GB	Çalışır	~170 tok/s
Qwen 3 8BAlibaba	5.0 GB	Çalışır	~172 tok/s
Granite 3.3 8BIBM	5.0 GB	Çalışır	~172 tok/s
Llama 3.1 8BMeta	4.9 GB	Çalışır	~177 tok/s
DeepSeek R1 8BDeepSeek	4.9 GB	Çalışır	~177 tok/s
Gemma 4 E4BGoogle	4.9 GB	Çalışır	~314 tok/s
Qwen3-VL 8BAlibaba	4.9 GB	Çalışır	~177 tok/s
Ministral 3 8BMistral AI	4.9 GB	Çalışır	~177 tok/s
Gemma 3n E4BGoogle	4.7 GB	Çalışır	~353 tok/s
Qwen 2.5 Coder 7BAlibaba	4.6 GB	Çalışır	~186 tok/s
DeepSeek R1 7BDeepSeek	4.6 GB	Çalışır	~186 tok/s
Mistral 7BMistral AI	4.4 GB	Çalışır	~196 tok/s
Gemma 4 E2BGoogle	3.1 GB	Çalışır	~614 tok/s
Gemma 3 4BGoogle	2.6 GB	Çalışır	~329 tok/s
Qwen 3 4BAlibaba	2.4 GB	Çalışır	~353 tok/s
Qwen 3.5 4BAlibaba	2.4 GB	Çalışır	~353 tok/s
Phi-4 Mini 3.8BMicrosoft	2.3 GB	Çalışır	~372 tok/s
Llama 3.2 3BMeta	1.9 GB	Çalışır	~442 tok/s
DeepSeek-OCRDeepSeek	1.8 GB	Çalışır	~2479 tok/s
Ministral 3 3BMistral AI	1.8 GB	Çalışır	~471 tok/s
DeepSeek R1 1.5BDeepSeek	1.1 GB	Çalışır	~785 tok/s
Qwen 3 1.7BAlibaba	1.0 GB	Çalışır	~831 tok/s
SmolLM2 1.7BHugging Face	1.0 GB	Çalışır	~831 tok/s
Llama 3.2 1BMeta	0.7 GB	Çalışır	~1178 tok/s
Gemma 3 1BGoogle	0.6 GB	Çalışır	~1413 tok/s
Qwen 3 0.6BAlibaba	0.4 GB	Çalışır	~2355 tok/s

Modelin tamamen GPU'da çalışması için 4-bit sürümün VRAM'e sığması gerekir. Sığmayan modeller CPU + sistem RAM'i ile yine çalışır, ama birkaç kat daha yavaş. · Veri güncellemesi: 2026-06-11 · Bu sayıları nasıl hesaplıyoruz? →