NOVA BENCH
DGX Spark · Blackwell GPU · 11 Models
First Comprehensive DGX Spark Benchmark

How Fast Can a $4,699
Desktop Run LLMs?

11 models from 8B to 123B parameters, benchmarked across inference, coding, context scaling, and vision on NVIDIA's GB10 Blackwell architecture.

11
Models
42.9tok/s
Peak Speed
2,090tok/s
Prompt Eval
123B
Largest
90B
Vision
Generation Speed Ranking tokens/second · higher = faster
#MODELTIERSPEEDPROMPTSIZE
Sweet Spot
27–32B
10–12 tok/s. Fast enough for interactive coding and research. Best quality-to-speed ratio on DGX Spark.
Ceiling
123B
Mistral Large at 2.28 tok/s. Fits in 128GB unified memory but barely interactive for real work.
SORTABLE TABLE
Task: Write a Python function for longest palindromic substring with docstring and type hints.
Highlight
DeepSeek-R1 generated 10,710 tokens of reasoning for one question — 39 minutes of chain-of-thought on desktop hardware. Qwen3 produced 8,180 tokens.
Test: Prompt processing — 19-token simple vs 130-token complex prompt.
MODELSHORTLONGSCALEGEN
Discovery
Prompt eval scales 3–4x with longer inputs. Llama 8B: 574 → 2,090 tok/s. Generation speed stays constant — the bottleneck is decode, not prefill.
👁️
Vision Model
Llama3.2-Vision 90B
90 billion parameter vision model on a $4,699 desktop
3.47
GEN tok/s
6.02
PROMPT tok/s
16.9s
LOAD TIME

About This Benchmark

Nova Bench is the first comprehensive benchmark suite built specifically for the NVIDIA DGX Spark — a $4,699 desktop supercomputer powered by the GB10 Blackwell Superchip with 128GB of unified LPDDR5x memory.

All benchmarks were run on a single DGX Spark unit using Ollama 0.18.3 as the inference framework. Models range from 8B to 123B parameters, covering general inference, code generation, context scaling, and vision tasks.

Hardware

The DGX Spark features a GB10 Grace Blackwell architecture with 20 ARM cores (10x Cortex-X925 + 10x Cortex-A725), 128GB unified memory shared between CPU and GPU via NVLink-C2C, and a 4TB NVMe SSD. The GPU delivers up to 1 PFLOP of FP4 performance.

Methodology

Each model was tested with the same prompt: "Explain quantum computing in exactly 200 words" for general inference. Coding benchmarks used a palindromic substring problem. Context scaling compared a 19-token prompt against a 130-token complex business analysis. All results are reproducible — raw data is available on GitHub and HuggingFace.

Reproduce These Results

All benchmark data, scripts, and methodology are open source. Clone the repo and run on your own hardware:

git clone https://github.com/GOPITRINADH3561/nova-bench.git

Dataset available at: huggingface.co/datasets/G3nadh/dgx-spark-benchmarks