Simple, transparent pricing

Lovable builds it. Cursor codes it. Nexlayer runs it. $1 = 1,000 credits.

$1 = 1,000 credits

One unified credit pool for compute, storage, and GPU. No separate bills. Scale predictably.

Free

$0forever

5,000 credits

Get started with AI deployment

5,000 starter credits
Any stack — Docker, Next.js, Python, Go
Blackwell 32GB (90-min sessions)
Zero-config service discovery
24/7 agent monitoring
Custom domains with HTTPS

Start for free

Pro

$29/month

30,000 credits/month

For builders shipping to production

30,000 credits/month
Everything in Free
Blackwell 48GB VRAM
Agent debugging
Spending caps
Priority support
Team collaboration

Get started with Pro

For scaling companies

Scale

$299/month

300,000 credits/month

Production workloads at scale

300,000 credits/month
Everything in Pro
Blackwell 48GB VRAM
RTX PRO 6000 96GB GDDR7
Dedicated optimization agents
Reserved GPU pools
99.9% uptime SLA

Get started with Scale

Custom

Enterprise

Let's talk

3,000,000 credits/month

Dedicated GPU cluster + credits. Everything else à la carte.

3,000,000 credits/month
Dedicated GPU cluster (NVIDIA RTX PRO 6000 96GB GDDR7)
Blackwell 48GB + RTX PRO 6000 96GB available
Custom / BYO models (Mode 1 dedicated)
All Scale GPU models unlocked (70B+ class)
3,000,000 credits/mo included

Add-ons (à la carte)

+Seats — contact sales
+SSO & audit logs — contact sales
+SOC 2 / BAA / HIPAA — contact sales
+Dedicated account manager — contact sales
+Private networking — contact sales

Let's talk

GPUs & Models

Add gpu to your deployment and get production LLMs instantly — no CUDA setup, no model downloads, no cold-start waits.

Nexlayer is an agentic infrastructure platform, not a GPU bin-rental service. Your model runs on the same network as your app, next to your Postgres, your Redis, your background workers. That co-location is the edge — the inference endpoint is one hop over the network, not a trip across the public internet to a rack in another region.

Your stack, your model, one config

You chose the model, Nexlayer launches your environment, and your GPU is already set.

Co-located with your data

1ms RTT to Postgres, RAG without internet egress, agent tool-calls stay in-network.

Cost you can see

$0.50 / $1.25 / $2.50 per GPU-hour. Per-second metered. No hidden per-token multiplier.

Switch models in 30 seconds

Change your GPU model, redeploy. The scheduler pins the new one. No vendor migration.

Auto multi-model routing

Flip on auto mode and Nexlayer picks the right model per workload — Llama for chat, DeepSeek for reasoning, Qwen for code, Gemma for throughput, Phi for edge latency. No model picking, no redeploys.

Catalog includes Llama 3.3 70B, DeepSeek-R1 reasoning, Qwen 2.5 Coder, Gemma 4 31B, Phi 3.5 Mini, Nomic embeddings, plus bring-your-own-model on Enterprise.

Hardware

RTX PRO 6000

96GB GDDR7 · Blackwell · 240 TFLOPS FP16

Current production fleet. Every Mode 2/3 model runs here.

Pricing by mode

Mode 3 — Shared Pinned

$0.50 / hr500 credits/hr

Workhorse inference. Autocomplete, chat, embeddings.

Best-effort, multi-tenant card. Sub-50ms TTFT on small models.

Mode 2 — Large Pinned

$1.25 / hr1,250 credits/hr

70B-class reasoning + chat. GPT-4 replacement class.

Dedicated slot on a 96GB card. Consistent latency.

Mode 1 — Dedicated (Enterprise)

$2.50 / hr2,500 credits/hr

Raw card, BYO model server (vLLM, TGI, custom CUDA).

Full 96GB VRAM, exclusive access. You own the runtime.

All rates at coefficient 1.0 (1,000 credits = $1). Per-second metered via the k8s meter.

Model catalog

CHAT & CONTENT — GPT-4-REPLACEMENT CLASS

Model	Use case	Mode	Access
Llama 3.3 70B	General chat, Q&A, content moderation.	Mode 2 · ~70GB	Scale, Enterprise
DeepSeek-R1 Distill Llama 70B	Top-tier reasoning at ~$0.60 / M tokens.	Mode 2 · ~70GB	Scale, Enterprise

REASONING — CHAIN-OF-THOUGHT

Model	Use case	Mode	Access
DeepSeek-R1 Distill Qwen 32B	Mid-tier reasoning, ~50% faster than 70B.	Mode 2 · ~34GB	Scale, Enterprise
DeepSeek-R1 Distill Qwen 14B	Cost-effective reasoning.	Mode 3 · ~16GB	Scale, Enterprise
DeepSeek-R1 Distill Qwen 7B	Budget reasoning, small footprint.	Mode 3 · ~8GB	Scale, Enterprise

CODE — COMPLETION + REVIEW

Model	Use case	Mode	Access
Qwen 2.5 Coder 7B	Autocomplete / completion, sub-50ms TTFT.	Mode 3 · ~8GB	Free (auto), Pro, Scale, Enterprise
Qwen 2.5 Coder 32B	Full code review / agent work.	Mode 2 · ~34GB	Scale, Enterprise

HIGH-THROUGHPUT CHAT — SMALL, FAST, SHARED

Model	Use case	Mode	Access
Gemma 4 31B	Multi-modal + high-throughput general chat, RedHat FP8.	Mode 2 · ~34GB	Scale, Enterprise
Llama 3.1 8B	Workhorse chat, packs well with siblings.	Mode 3 · ~9GB	Free (auto), Pro, Scale, Enterprise
Llama 3.2 3B	Low-latency edge inference.	Mode 3 · ~4GB	Free (auto), Pro, Scale, Enterprise
Phi 3.5 Mini	25-40ms TTFT, Microsoft small-model family.	Mode 3 · ~4GB	Free (auto), Pro, Scale, Enterprise

EMBEDDINGS

Model	Use case	Mode	Access
Nomic Embed	768-dim, MTEB-competitive.	Mode 3 · ~1GB	Free (auto), Pro, Scale, Enterprise

BRING-YOUR-OWN MODEL — MODE 1

Model	Use case	Mode	Access
custom	Any HF / custom weights; run vLLM / TGI / your own server.	Mode 1 · ≤96GB	Enterprise

Free plan uses auto mode — the scheduler picks a small shared model for you. Upgrade to Pro+ to pin a specific one, or Scale+ to unlock 70B-class and Gemma 4.

Feature comparison

Price is the easy comparison. Features are the real one. Every provider in this table rents GPU cycles — only one gives you the platform that turns a model into an app.

Feature	Nexlayer	Together	RunPod	Replicate	Fireworks
Multi-model per card (Mode 3) Multiple small models share one card, packed by the scheduler — sub-50ms TTFT at a fraction of dedicated-slot cost.	✓	✗	✗	✗	✗
Full app hosting + GPU routing Deploy your app, database, workers, and the model from one config — routing handled by the platform.	✓	✗	partial	✗	✗
Auto-model-selection harness Scheduler picks chat / reasoning / code / throughput / edge model per workload. No redeploys to switch.	✓	✗	✗	✗	partial
Gemma 4 support Google's Gemma 4 31B FP8 — multi-modal, high-throughput chat — available pinned today.	✓	—	✗	✗	✗
Speculative decoding pair Large model paired with a small draft model for 2-3× faster generation with identical output quality.	✓	✗	✗	✗	partial
Semantic response cache Embedding-matched cache layer returns prior completions for near-duplicate prompts — free latency win.	✓	✗	✗	✗	✗
Data residency / private cluster Your data never leaves your tenant network. Dedicated cluster + BAA available on Enterprise.	✓	partial	✓	✗	partial

✓ supported · partial = limited / add-on only · ✗ not offered · — not applicable. Feature set current April 2026.

Model hosting is a commodity. The platform is the product.

Any provider on this list can serve Llama 3.3 70B. Only Nexlayer also runs your Postgres, Redis, background workers, and vector DB on the same internal network, with the inference endpoint one hop away. That co-location is why your agent doesn't stall on egress, why RAG is a millisecond call instead of a cross-region round-trip, and why you're not reconciling five invoices at the end of the month. Raw-GPU providers leave CUDA, driver, weight, and server wrangling on your plate. Hosted-model APIs give you a URL but no way to run the rest of your app. Nexlayer ships the whole stack from one config.

You're in control. Always.

Credits run out? You decide what happens next.

Free

Apps pause at zero credits. No surprise charges. Restart anytime by adding credits.

Pro

Opt-in overages. Set spending caps. Buy credit packs ($10–$100) or enable auto-refill.

Scale & Enterprise

Overages on by default with spending caps. Agents warn you at 85% consumption.

Credit rate card

Transparent per-resource pricing. Deploys, domains, and agent operations are free.

Resource	Rate	Approx. cost
CPU	120 credits/hr	$0.12/hr
Storage	~0.70 credits/day	$0.02/mo
Egress bandwidth	10 credits/GB	$0.01/GB
GPU (shared)	500 credits/hr	$0.50/hr
GPU (dedicated)	2,500 credits/hr	$2.50/hr
Deploys	Free	—
Custom domains	Free	—
Agent operations	Free	—

Volume discounts

Automatic tiered pricing as you scale.

Monthly credits	Rate per 1K	Discount
Up to 100K	$1.00	—
100K – 1M	$0.95	5%
1M+	$0.85	15%

Ready to ship?

Deploy from Claude, Cursor, or CLI. Free to start. Agents handle the rest.

Get started for free