Simple, transparent pricing
Lovable builds it. Cursor codes it. Nexlayer runs it. $1 = 1,000 credits.
$1 = 1,000 credits
One unified credit pool for compute, storage, and GPU. No separate bills. Scale predictably.
Free
5,000 credits
Get started with AI deployment
- 5,000 starter credits
- Any stack — Docker, Next.js, Python, Go
Blackwell 32GB (90-min sessions)- Zero-config service discovery
- 24/7 agent monitoring
- Custom domains with HTTPS
Pro
30,000 credits/month
For builders shipping to production
- 30,000 credits/month
- Everything in Free
Blackwell 48GB VRAM- Agent debugging
- Spending caps
- Priority support
- Team collaboration
Scale
300,000 credits/month
Production workloads at scale
- 300,000 credits/month
- Everything in Pro
Blackwell 48GB VRAM
RTX PRO 6000 96GB GDDR7- Dedicated optimization agents
- Reserved GPU pools
- 99.9% uptime SLA
Enterprise
3,000,000 credits/month
Dedicated GPU cluster + credits. Everything else à la carte.
- 3,000,000 credits/month
- Dedicated GPU cluster (NVIDIA RTX PRO 6000 96GB GDDR7)
Blackwell 48GB + RTX PRO 6000 96GB available- Custom / BYO models (Mode 1 dedicated)
- All Scale GPU models unlocked (70B+ class)
- 3,000,000 credits/mo included
Add-ons (à la carte)
- +Seats — contact sales
- +SSO & audit logs — contact sales
- +SOC 2 / BAA / HIPAA — contact sales
- +Dedicated account manager — contact sales
- +Private networking — contact sales
GPUs & Models
Add gpu to your deployment and get production LLMs instantly — no CUDA setup, no model downloads, no cold-start waits.
Nexlayer is an agentic infrastructure platform, not a GPU bin-rental service. Your model runs on the same network as your app, next to your Postgres, your Redis, your background workers. That co-location is the edge — the inference endpoint is one hop over the network, not a trip across the public internet to a rack in another region.
Your stack, your model, one config
You chose the model, Nexlayer launches your environment, and your GPU is already set.
Co-located with your data
1ms RTT to Postgres, RAG without internet egress, agent tool-calls stay in-network.
Cost you can see
$0.50 / $1.25 / $2.50 per GPU-hour. Per-second metered. No hidden per-token multiplier.
Switch models in 30 seconds
Change your GPU model, redeploy. The scheduler pins the new one. No vendor migration.
Auto multi-model routing
Flip on auto mode and Nexlayer picks the right model per workload — Llama for chat, DeepSeek for reasoning, Qwen for code, Gemma for throughput, Phi for edge latency. No model picking, no redeploys.
Catalog includes Llama 3.3 70B, DeepSeek-R1 reasoning, Qwen 2.5 Coder, Gemma 4 31B, Phi 3.5 Mini, Nomic embeddings, plus bring-your-own-model on Enterprise.
Hardware
RTX PRO 600096GB GDDR7 · Blackwell · 240 TFLOPS FP16
Current production fleet. Every Mode 2/3 model runs here.
Pricing by mode
Mode 3 — Shared Pinned
Workhorse inference. Autocomplete, chat, embeddings.
Best-effort, multi-tenant card. Sub-50ms TTFT on small models.
Mode 2 — Large Pinned
70B-class reasoning + chat. GPT-4 replacement class.
Dedicated slot on a 96GB card. Consistent latency.
Mode 1 — Dedicated (Enterprise)
Raw card, BYO model server (vLLM, TGI, custom CUDA).
Full 96GB VRAM, exclusive access. You own the runtime.
All rates at coefficient 1.0 (1,000 credits = $1). Per-second metered via the k8s meter.
Model catalog
CHAT & CONTENT — GPT-4-REPLACEMENT CLASS
| Model | Use case | Mode | Access |
|---|---|---|---|
| Llama 3.3 70B | General chat, Q&A, content moderation. | Mode 2 · ~70GB | Scale, Enterprise |
| DeepSeek-R1 Distill Llama 70B | Top-tier reasoning at ~$0.60 / M tokens. | Mode 2 · ~70GB | Scale, Enterprise |
REASONING — CHAIN-OF-THOUGHT
| Model | Use case | Mode | Access |
|---|---|---|---|
| DeepSeek-R1 Distill Qwen 32B | Mid-tier reasoning, ~50% faster than 70B. | Mode 2 · ~34GB | Scale, Enterprise |
| DeepSeek-R1 Distill Qwen 14B | Cost-effective reasoning. | Mode 3 · ~16GB | Scale, Enterprise |
| DeepSeek-R1 Distill Qwen 7B | Budget reasoning, small footprint. | Mode 3 · ~8GB | Scale, Enterprise |
CODE — COMPLETION + REVIEW
| Model | Use case | Mode | Access |
|---|---|---|---|
| Qwen 2.5 Coder 7B | Autocomplete / completion, sub-50ms TTFT. | Mode 3 · ~8GB | Free (auto), Pro, Scale, Enterprise |
| Qwen 2.5 Coder 32B | Full code review / agent work. | Mode 2 · ~34GB | Scale, Enterprise |
HIGH-THROUGHPUT CHAT — SMALL, FAST, SHARED
| Model | Use case | Mode | Access |
|---|---|---|---|
| Gemma 4 31B | Multi-modal + high-throughput general chat, RedHat FP8. | Mode 2 · ~34GB | Scale, Enterprise |
| Llama 3.1 8B | Workhorse chat, packs well with siblings. | Mode 3 · ~9GB | Free (auto), Pro, Scale, Enterprise |
| Llama 3.2 3B | Low-latency edge inference. | Mode 3 · ~4GB | Free (auto), Pro, Scale, Enterprise |
| Phi 3.5 Mini | 25-40ms TTFT, Microsoft small-model family. | Mode 3 · ~4GB | Free (auto), Pro, Scale, Enterprise |
EMBEDDINGS
| Model | Use case | Mode | Access |
|---|---|---|---|
| Nomic Embed | 768-dim, MTEB-competitive. | Mode 3 · ~1GB | Free (auto), Pro, Scale, Enterprise |
BRING-YOUR-OWN MODEL — MODE 1
| Model | Use case | Mode | Access |
|---|---|---|---|
| custom | Any HF / custom weights; run vLLM / TGI / your own server. | Mode 1 · ≤96GB | Enterprise |
Free plan uses auto mode — the scheduler picks a small shared model for you. Upgrade to Pro+ to pin a specific one, or Scale+ to unlock 70B-class and Gemma 4.
Feature comparison
Price is the easy comparison. Features are the real one. Every provider in this table rents GPU cycles — only one gives you the platform that turns a model into an app.
| Feature | Nexlayer | Together | RunPod | Replicate | Fireworks |
|---|---|---|---|---|---|
Multi-model per card (Mode 3) Multiple small models share one card, packed by the scheduler — sub-50ms TTFT at a fraction of dedicated-slot cost. | ✓ | ✗ | ✗ | ✗ | ✗ |
Full app hosting + GPU routing Deploy your app, database, workers, and the model from one config — routing handled by the platform. | ✓ | ✗ | partial | ✗ | ✗ |
Auto-model-selection harness Scheduler picks chat / reasoning / code / throughput / edge model per workload. No redeploys to switch. | ✓ | ✗ | ✗ | ✗ | partial |
Gemma 4 support Google's Gemma 4 31B FP8 — multi-modal, high-throughput chat — available pinned today. | ✓ | — | ✗ | ✗ | ✗ |
Speculative decoding pair Large model paired with a small draft model for 2-3× faster generation with identical output quality. | ✓ | ✗ | ✗ | ✗ | partial |
Semantic response cache Embedding-matched cache layer returns prior completions for near-duplicate prompts — free latency win. | ✓ | ✗ | ✗ | ✗ | ✗ |
Data residency / private cluster Your data never leaves your tenant network. Dedicated cluster + BAA available on Enterprise. | ✓ | partial | ✓ | ✗ | partial |
✓ supported · partial = limited / add-on only · ✗ not offered · — not applicable. Feature set current April 2026.
Model hosting is a commodity. The platform is the product.
Any provider on this list can serve Llama 3.3 70B. Only Nexlayer also runs your Postgres, Redis, background workers, and vector DB on the same internal network, with the inference endpoint one hop away. That co-location is why your agent doesn't stall on egress, why RAG is a millisecond call instead of a cross-region round-trip, and why you're not reconciling five invoices at the end of the month. Raw-GPU providers leave CUDA, driver, weight, and server wrangling on your plate. Hosted-model APIs give you a URL but no way to run the rest of your app. Nexlayer ships the whole stack from one config.
You're in control. Always.
Credits run out? You decide what happens next.
Free
Apps pause at zero credits. No surprise charges. Restart anytime by adding credits.
Pro
Opt-in overages. Set spending caps. Buy credit packs ($10–$100) or enable auto-refill.
Scale & Enterprise
Overages on by default with spending caps. Agents warn you at 85% consumption.
Credit rate card
Transparent per-resource pricing. Deploys, domains, and agent operations are free.
| Resource | Rate | Approx. cost |
|---|---|---|
| CPU | 120 credits/hr | $0.12/hr |
| Storage | ~0.70 credits/day | $0.02/mo |
| Egress bandwidth | 10 credits/GB | $0.01/GB |
| GPU (shared) | 500 credits/hr | $0.50/hr |
| GPU (dedicated) | 2,500 credits/hr | $2.50/hr |
| Deploys | Free | — |
| Custom domains | Free | — |
| Agent operations | Free | — |
Volume discounts
Automatic tiered pricing as you scale.
| Monthly credits | Rate per 1K | Discount |
|---|---|---|
| Up to 100K | $1.00 | — |
| 100K – 1M | $0.95 | 5% |
| 1M+ | $0.85 | 15% |
Ready to ship?
Deploy from Claude, Cursor, or CLI. Free to start. Agents handle the rest.
Get started for free