RAG & embedding cost calculator

Rough monthly cost for a typical retrieval setup: maintaining a text embedding index, embedding user queries, and running your chat model on retrieved context plus prompts. Tune numbers to match your pipeline — excludes vector DB hosting and re-rankers.

Vector index & embeddings

Total tokens in vector index (millions)

Sum of embedded chunk tokens across your corpus.

% of index re-embedded per month

New docs, edits, or full re-embeds — your ops, not your users.

Embedding model

Queries & generation

Queries per day

Tokens to embed per user query

Retrieved context tokens per query (into the LLM)

Other prompt tokens per query (instructions, history slice)

Output tokens per query

Chat / completion model

Estimated monthly API cost

Total

$211.64

Embedding + LLM only. Embeddings: April 2026. LLM tables: April 2026.

Breakdown

Index re-embedding (maintenance)$0.016
Query embeddings$0.120
Chat completions (RAG answers)$211.50

Embeddings subtotal: $0.136

Not included

Vector DB hosting (Pinecone, pgvector on RDS, etc.), re-ranking models, OCR, or multimodal embedding. Add those separately.

Simple monthly LLM cost (no RAG)Need cost per customer & feature? → PerUnit