DolphinNetwork

open-weight AI lab and decentralized inference network.
at ~30% lower cost, verified across all modalities.

4.7M+

monthly downloads

90+

open-weight models

800+

GPUs in beta

OpenRouter Is The Wedge

Distribution plus price is the wedge. Quality match is the gate. Dolphin is built for the models developers already want to route.

Agentic coding

Qwen, GLM, and Deepseek have become the open-weight defaults for coding agents.

API margin

Claude Code and Codex-style products charge multiples of model run cost.

Liquidity wins

A cheaper, verified default can win share before expanding into broader inference.

Lab. Network. Engine.

Dolphin is not a generic compute marketplace. We ship models, run the network, and own the engine layer needed to make consumer GPUs useful for quality-tier inference.

Lab

90 open-weight models shipped, including partnering with Venice to develop one of their most popular models.

Network

Decentralized inference with 800+ beta GPUs and future partner inventory from Targon and Lium.

Engine

Sonar-flash for consumer-GPU single requests and Sonar for batched enterprise workloads.

Coding Models First

Agentic coding is the most cost-sensitive inference workload at scale. Win the coding-model price war, become the OpenRouter default, then expand into multimodal and general inference.

Quality gate

Frontier open-weight coding models match or approach centralized API quality on key benchmarks.

Price lever

High-volume coding agents feel inference cost immediately.

Toolchain lock-in

Once latency, cost, and quality stabilize, developers stop re-shopping inference routes.

~90%

expected volume from coding models

Qwen

frontier small coding family

GLM

frontier large open coding family

Price You Can Verify

Comparable latency, throughput, and quality to centralized APIs at roughly 30% lower cost, verified with a proof-of-weights stack that works for text and multimodal.

CPU-only proof

Tiny, fast, no user prompts required, scalable to image, audio, and video weights.

No prompt storage

Verification does not require retaining user prompts, avoiding the core privacy risk of logprob schemes.

Multimodal proof

LOGIC and logprob approaches do not scale cleanly beyond text. Dolphin's stack is designed for the full surface.

Full paper available to investors under NDA.

Built For Consumer GPUs

Generic engines waste vRAM that consumer cards need to fit Qwen-class models without quality loss. Dolphin built its own runtime where the network needed it most.

Sonar-flash

GPU-first runtime with native FP8, native C quantizer, hand-optimized kernels, compact KV layout, and GPU sampling.

Sonar

A vLLM fork with proof-of-weights baked in for batched enterprise GPU workloads.

Worker swap

Local llama.cpp or vLLM users can swap to a Dolphin worker, keep private local inference, and earn POD on idle cycles.

Traction Across The Stack

Dolphin has model distribution, paid partner demand, and live network supply before the V2 unlock.

Venice

Default uncensored model surface for Venice's roughly 1M users, with Venice Uncensored 2 in flight.

Beta network

Beta GPUs are online under capped utilization while V2 is prepared.

Supply partners

Bittensor Targon runs the Dolphin worker on idle inventory; Lium feeds additional GPU supply.

4.7M+

monthly model downloads

40k VVV

across 4 Venice commissioned models

800+

GPUs online in beta

The POD Flywheel

All network revenue goes to on-market POD buybacks. Rewards float with inference demand, so supply grows with usage instead of fixed emissions.

Multimodal edge

Image, audio, and video pricing is where consumer GPU economics matter most.

Revenue buybacks

POD demand is tied directly to customer inference revenue.

Demand rewards

The protocol chooses permissionless mining with rewards that respond to demand.

Free First, Paid Fast

Free inference buys the OpenRouter listing slot, social proof, and volume needed to validate quality. Then Dolphin prices 30% below market.

Ship V2

Ship V2 worker, uncap rewards, and run datagen so emissions attract GPUs.

Open free API

At 500+ GPUs, turn off datagen and open a free API with OpenRouter motion once entity setup allows.

Monetize

Trigger monetization by 1B daily tokens or Q4 2026, pricing 30% below competitors.

60-90d

time-boxed free inference window

500+

GPU threshold before opening free API

daily token monetization trigger

Structurally Different

The closest peers are missing two of three legs: model lab, production inference network, and custom inference engine with multimodal verification.

Inference peers

Morpheus, Inference.net, Akash, io.net, Render, Heurist, Venice, and Targon are decentralized or cheap.

Training peers

Prime Intellect, Nous, Gensyn, and Pluralis carry research credibility but not production inference at scale.

What we are not

Dolphin is not generic DePIN compute, not crypto-only narrative, not frontier base-model R&D, and not pure wholesale.

Verification differentiation matters most: logprob-based approaches do not work for multimodal.

The 1T Token Bet

The bet is deliberately measurable: 1 trillion daily inference tokens, with network revenue exceeding emissions and POD buyback exceeding dilution.

Expansion

LoRA marketplace, consumer subscription apps, and custom Discord-bot character surfaces.

Kill criteria

OpenRouter slips past Q4 2026, V2 slips past Q3 2026, or Sonar-flash misses llama.cpp single-request performance.

Timing

Open-weight coding models are good enough for the distribution and cost layer to matter.

Technical Operators

The team has already shipped models, network infrastructure, and the verification and engine work that make decentralized inference practical.

Core team

Luke as CEO plus five technical operators across research, infrastructure, and product execution.

Next hires

Agentic-coding lead and assistant, distributed-training helpers, designer, UX dev, full-time UI dev, and node-operator support.

Wedge-led hiring

The first new headcount turns coding-model demand into a paid sandbox network surface.

Two Accelerants

Capital goes to the surfaces that turn Dolphin from a verified inference network into a larger AI developer platform.

CPU sandbox

Agentic-coding sandbox workload with a lead at $300k per year and assistant at $150k per year.

Owned GPUs

Industrial-scale B300 purchases, including capacity reserved for Dolphin model training.

Roadmap pull-forward

Audio, image, video network expansion, sharded inference for 70B+, and web scraper network pulled forward.