Simple, transparent pricing

Lovable builds it. Cursor codes it. Nexlayer runs it. $1 = 1,000 credits.

$1 = 1,000 credits

One unified credit pool for compute, storage, and GPU. No separate bills. Scale predictably.

Free

$0forever

5,000 credits

Get started with AI deployment

  • 5,000 starter credits
  • Any stack — Docker, Next.js, Python, Go
  • NVIDIABlackwell 32GB (90-min sessions)
  • Zero-config service discovery
  • 24/7 agent monitoring
  • Custom domains with HTTPS
Start for free
Most popular

Pro

$29/month

30,000 credits/month

For builders shipping to production

  • 30,000 credits/month
  • Everything in Free
  • NVIDIABlackwell 48GB VRAM
  • Agent debugging
  • Spending caps
  • Priority support
  • Team collaboration
Get started with Pro
For scaling companies

Scale

$299/month

300,000 credits/month

Production workloads at scale

  • 300,000 credits/month
  • Everything in Pro
  • NVIDIABlackwell 48GB VRAM
  • NVIDIARTX PRO 6000 96GB GDDR7
  • Dedicated optimization agents
  • Reserved GPU pools
  • 99.9% uptime SLA
Get started with Scale
Custom

Enterprise

Let's talk

3,000,000 credits/month

Dedicated GPU cluster + credits. Everything else à la carte.

  • 3,000,000 credits/month
  • Dedicated GPU cluster (NVIDIA RTX PRO 6000 96GB GDDR7)
  • NVIDIABlackwell 48GB + RTX PRO 6000 96GB available
  • Custom / BYO models (Mode 1 dedicated)
  • All Scale GPU models unlocked (70B+ class)
  • 3,000,000 credits/mo included

Add-ons (à la carte)

  • +Seats — contact sales
  • +SSO & audit logs — contact sales
  • +SOC 2 / BAA / HIPAA — contact sales
  • +Dedicated account manager — contact sales
  • +Private networking — contact sales
Let's talk

GPUs & Models

Add gpu to your deployment and get production LLMs instantly — no CUDA setup, no model downloads, no cold-start waits.

Nexlayer is an agentic infrastructure platform, not a GPU bin-rental service. Your model runs on the same network as your app, next to your Postgres, your Redis, your background workers. That co-location is the edge — the inference endpoint is one hop over the network, not a trip across the public internet to a rack in another region.

Your stack, your model, one config

You chose the model, Nexlayer launches your environment, and your GPU is already set.

Co-located with your data

1ms RTT to Postgres, RAG without internet egress, agent tool-calls stay in-network.

Cost you can see

$0.50 / $1.25 / $2.50 per GPU-hour. Per-second metered. No hidden per-token multiplier.

Switch models in 30 seconds

Change your GPU model, redeploy. The scheduler pins the new one. No vendor migration.

Auto multi-model routing

Flip on auto mode and Nexlayer picks the right model per workload — Llama for chat, DeepSeek for reasoning, Qwen for code, Gemma for throughput, Phi for edge latency. No model picking, no redeploys.

Catalog includes Llama 3.3 70B, DeepSeek-R1 reasoning, Qwen 2.5 Coder, Gemma 4 31B, Phi 3.5 Mini, Nomic embeddings, plus bring-your-own-model on Enterprise.

Hardware

NVIDIARTX PRO 6000

96GB GDDR7 · Blackwell · 240 TFLOPS FP16

Current production fleet. Every Mode 2/3 model runs here.

Pricing by mode

Mode 3 — Shared Pinned

$0.50 / hr500 credits/hr

Workhorse inference. Autocomplete, chat, embeddings.

Best-effort, multi-tenant card. Sub-50ms TTFT on small models.

Mode 2 — Large Pinned

$1.25 / hr1,250 credits/hr

70B-class reasoning + chat. GPT-4 replacement class.

Dedicated slot on a 96GB card. Consistent latency.

Mode 1 — Dedicated (Enterprise)

$2.50 / hr2,500 credits/hr

Raw card, BYO model server (vLLM, TGI, custom CUDA).

Full 96GB VRAM, exclusive access. You own the runtime.

All rates at coefficient 1.0 (1,000 credits = $1). Per-second metered via the k8s meter.

Model catalog

CHAT & CONTENT — GPT-4-REPLACEMENT CLASS

ModelUse caseModeAccess
Llama 3.3 70BGeneral chat, Q&A, content moderation.Mode 2 · ~70GBScale, Enterprise
DeepSeek-R1 Distill Llama 70BTop-tier reasoning at ~$0.60 / M tokens.Mode 2 · ~70GBScale, Enterprise

REASONING — CHAIN-OF-THOUGHT

ModelUse caseModeAccess
DeepSeek-R1 Distill Qwen 32BMid-tier reasoning, ~50% faster than 70B.Mode 2 · ~34GBScale, Enterprise
DeepSeek-R1 Distill Qwen 14BCost-effective reasoning.Mode 3 · ~16GBScale, Enterprise
DeepSeek-R1 Distill Qwen 7BBudget reasoning, small footprint.Mode 3 · ~8GBScale, Enterprise

CODE — COMPLETION + REVIEW

ModelUse caseModeAccess
Qwen 2.5 Coder 7BAutocomplete / completion, sub-50ms TTFT.Mode 3 · ~8GBFree (auto), Pro, Scale, Enterprise
Qwen 2.5 Coder 32BFull code review / agent work.Mode 2 · ~34GBScale, Enterprise

HIGH-THROUGHPUT CHAT — SMALL, FAST, SHARED

ModelUse caseModeAccess
Gemma 4 31BMulti-modal + high-throughput general chat, RedHat FP8.Mode 2 · ~34GBScale, Enterprise
Llama 3.1 8BWorkhorse chat, packs well with siblings.Mode 3 · ~9GBFree (auto), Pro, Scale, Enterprise
Llama 3.2 3BLow-latency edge inference.Mode 3 · ~4GBFree (auto), Pro, Scale, Enterprise
Phi 3.5 Mini25-40ms TTFT, Microsoft small-model family.Mode 3 · ~4GBFree (auto), Pro, Scale, Enterprise

EMBEDDINGS

ModelUse caseModeAccess
Nomic Embed768-dim, MTEB-competitive.Mode 3 · ~1GBFree (auto), Pro, Scale, Enterprise

BRING-YOUR-OWN MODEL — MODE 1

ModelUse caseModeAccess
customAny HF / custom weights; run vLLM / TGI / your own server.Mode 1 · ≤96GBEnterprise

Free plan uses auto mode — the scheduler picks a small shared model for you. Upgrade to Pro+ to pin a specific one, or Scale+ to unlock 70B-class and Gemma 4.

Feature comparison

Price is the easy comparison. Features are the real one. Every provider in this table rents GPU cycles — only one gives you the platform that turns a model into an app.

FeatureNexlayerTogetherRunPodReplicateFireworks
Multi-model per card (Mode 3)
Multiple small models share one card, packed by the scheduler — sub-50ms TTFT at a fraction of dedicated-slot cost.
Full app hosting + GPU routing
Deploy your app, database, workers, and the model from one config — routing handled by the platform.
partial
Auto-model-selection harness
Scheduler picks chat / reasoning / code / throughput / edge model per workload. No redeploys to switch.
partial
Gemma 4 support
Google's Gemma 4 31B FP8 — multi-modal, high-throughput chat — available pinned today.
Speculative decoding pair
Large model paired with a small draft model for 2-3× faster generation with identical output quality.
partial
Semantic response cache
Embedding-matched cache layer returns prior completions for near-duplicate prompts — free latency win.
Data residency / private cluster
Your data never leaves your tenant network. Dedicated cluster + BAA available on Enterprise.
partialpartial

✓ supported · partial = limited / add-on only · ✗ not offered · — not applicable. Feature set current April 2026.

Model hosting is a commodity. The platform is the product.

Any provider on this list can serve Llama 3.3 70B. Only Nexlayer also runs your Postgres, Redis, background workers, and vector DB on the same internal network, with the inference endpoint one hop away. That co-location is why your agent doesn't stall on egress, why RAG is a millisecond call instead of a cross-region round-trip, and why you're not reconciling five invoices at the end of the month. Raw-GPU providers leave CUDA, driver, weight, and server wrangling on your plate. Hosted-model APIs give you a URL but no way to run the rest of your app. Nexlayer ships the whole stack from one config.

You're in control. Always.

Credits run out? You decide what happens next.

Free

Apps pause at zero credits. No surprise charges. Restart anytime by adding credits.

Pro

Opt-in overages. Set spending caps. Buy credit packs ($10–$100) or enable auto-refill.

Scale & Enterprise

Overages on by default with spending caps. Agents warn you at 85% consumption.

Credit rate card

Transparent per-resource pricing. Deploys, domains, and agent operations are free.

ResourceRateApprox. cost
CPU120 credits/hr$0.12/hr
Storage~0.70 credits/day$0.02/mo
Egress bandwidth10 credits/GB$0.01/GB
GPU (shared)500 credits/hr$0.50/hr
GPU (dedicated)2,500 credits/hr$2.50/hr
DeploysFree
Custom domainsFree
Agent operationsFree

Volume discounts

Automatic tiered pricing as you scale.

Monthly creditsRate per 1KDiscount
Up to 100K$1.00
100K – 1M$0.955%
1M+$0.8515%

Ready to ship?

Deploy from Claude, Cursor, or CLI. Free to start. Agents handle the rest.

Get started for free
Pricing - Simple, Transparent Cloud Pricing