Agentic coding
Qwen, GLM, and Deepseek have become the open-weight defaults for coding agents.
open-weight AI lab and decentralized inference network.
at ~30% lower cost, verified across all modalities.
4.7M+
monthly downloads
90+
open-weight models
800+
GPUs in beta
Distribution plus price is the wedge. Quality match is the gate. Dolphin is built for the models developers already want to route.
Qwen, GLM, and Deepseek have become the open-weight defaults for coding agents.
Claude Code and Codex-style products charge multiples of model run cost.
A cheaper, verified default can win share before expanding into broader inference.
Dolphin is not a generic compute marketplace. We ship models, run the network, and own the engine layer needed to make consumer GPUs useful for quality-tier inference.
90 open-weight models shipped, including partnering with Venice to develop one of their most popular models.
Decentralized inference with 800+ beta GPUs and future partner inventory from Targon and Lium.
Sonar-flash for consumer-GPU single requests and Sonar for batched enterprise workloads.
Agentic coding is the most cost-sensitive inference workload at scale. Win the coding-model price war, become the OpenRouter default, then expand into multimodal and general inference.
Frontier open-weight coding models match or approach centralized API quality on key benchmarks.
High-volume coding agents feel inference cost immediately.
Once latency, cost, and quality stabilize, developers stop re-shopping inference routes.
~90%
expected volume from coding models
Qwen
frontier small coding family
GLM
frontier large open coding family
Comparable latency, throughput, and quality to centralized APIs at roughly 30% lower cost, verified with a proof-of-weights stack that works for text and multimodal.
Tiny, fast, no user prompts required, scalable to image, audio, and video weights.
Verification does not require retaining user prompts, avoiding the core privacy risk of logprob schemes.
LOGIC and logprob approaches do not scale cleanly beyond text. Dolphin's stack is designed for the full surface.
Full paper available to investors under NDA.
Generic engines waste vRAM that consumer cards need to fit Qwen-class models without quality loss. Dolphin built its own runtime where the network needed it most.
GPU-first runtime with native FP8, native C quantizer, hand-optimized kernels, compact KV layout, and GPU sampling.
A vLLM fork with proof-of-weights baked in for batched enterprise GPU workloads.
Local llama.cpp or vLLM users can swap to a Dolphin worker, keep private local inference, and earn POD on idle cycles.
Dolphin has model distribution, paid partner demand, and live network supply before the V2 unlock.
Default uncensored model surface for Venice's roughly 1M users, with Venice Uncensored 2 in flight.
Beta GPUs are online under capped utilization while V2 is prepared.
Bittensor Targon runs the Dolphin worker on idle inventory; Lium feeds additional GPU supply.
4.7M+
monthly model downloads
40k VVV
across 4 Venice commissioned models
800+
GPUs online in beta
All network revenue goes to on-market POD buybacks. Rewards float with inference demand, so supply grows with usage instead of fixed emissions.
Image, audio, and video pricing is where consumer GPU economics matter most.
POD demand is tied directly to customer inference revenue.
The protocol chooses permissionless mining with rewards that respond to demand.
Free inference buys the OpenRouter listing slot, social proof, and volume needed to validate quality. Then Dolphin prices 30% below market.
Ship V2 worker, uncap rewards, and run datagen so emissions attract GPUs.
At 500+ GPUs, turn off datagen and open a free API with OpenRouter motion once entity setup allows.
Trigger monetization by 1B daily tokens or Q4 2026, pricing 30% below competitors.
60-90d
time-boxed free inference window
500+
GPU threshold before opening free API
1B
daily token monetization trigger
The closest peers are missing two of three legs: model lab, production inference network, and custom inference engine with multimodal verification.
Morpheus, Inference.net, Akash, io.net, Render, Heurist, Venice, and Targon are decentralized or cheap.
Prime Intellect, Nous, Gensyn, and Pluralis carry research credibility but not production inference at scale.
Dolphin is not generic DePIN compute, not crypto-only narrative, not frontier base-model R&D, and not pure wholesale.
Verification differentiation matters most: logprob-based approaches do not work for multimodal.
The bet is deliberately measurable: 1 trillion daily inference tokens, with network revenue exceeding emissions and POD buyback exceeding dilution.
LoRA marketplace, consumer subscription apps, and custom Discord-bot character surfaces.
OpenRouter slips past Q4 2026, V2 slips past Q3 2026, or Sonar-flash misses llama.cpp single-request performance.
Open-weight coding models are good enough for the distribution and cost layer to matter.
The team has already shipped models, network infrastructure, and the verification and engine work that make decentralized inference practical.
Luke as CEO plus five technical operators across research, infrastructure, and product execution.
Agentic-coding lead and assistant, distributed-training helpers, designer, UX dev, full-time UI dev, and node-operator support.
The first new headcount turns coding-model demand into a paid sandbox network surface.
Capital goes to the surfaces that turn Dolphin from a verified inference network into a larger AI developer platform.
Agentic-coding sandbox workload with a lead at $300k per year and assistant at $150k per year.
Industrial-scale B300 purchases, including capacity reserved for Dolphin model training.
Audio, image, video network expansion, sharded inference for 70B+, and web scraper network pulled forward.