← All LLMs
Voyage 3
Voyage AI
text
Retrieval embedding model for RAG.
- Developer
- Voyage AI
- Release date
- Sep 1, 2024
- Parameters
- Undisclosed
- Corpus size
- Undisclosed
- License
- Proprietary
- Context window
- 128K tokens
- Modalities
- text
Links
Learn this model
Tutorial tailored to Voyage 3—cost, capabilities, API setup, and production patterns based on this model's specs (not generic copy for every LLM).
Cost & access
Voyage 3 is priced per million embedding tokens (or per request), not like chat completion. Compare dimensions and batch APIs on Voyage AI's pricing page. With a 128K tokens context window, long PDFs or chat histories increase input tokens quickly—trim history or summarize older turns in production.
Functional understanding
- Retrieval embedding model for RAG.
- Modalities: text · License: Proprietary · Released 2024-09-01.
- Best-fit workflows for this model:
- • Semantic search, deduplication, and RAG retrieval—Voyage 3 outputs vectors, not chat prose.
- • Clustering support tickets, docs, or product catalogs by meaning.
Technical foundation
- Voyage AI reports Undisclosed parameters; training data: Undisclosed.
- Context: 128K tokens. Open weights: no.
- Voyage 3 is positioned as a embedding model in the Voyage AI lineup.
First API call
Use Voyage AI's embeddings endpoint; pass Voyage 3 as the model name and store vectors in your DB or vector index.
import cohere
co = cohere.Client("YOUR_API_KEY")
response = co.embed(
texts=["Text to embed for Voyage 3"],
model="voyage-3",
input_type="search_document",
)
print(response.embeddings[0][:5], "...")Important technical topics
- Prompting Voyage 3: be explicit about output format. Weak: "Analyze this." Better: "Return JSON with fields id, total, date for Voyage AI billing data."
- Temperature: use 0–0.3 for extraction and compliance on Voyage 3; 0.7–1.0 for brainstorming.
- Tokens: Voyage 3 bills by tokens (~¾ word each). Undisclosed parameters affect capability; your bill is driven by context length and call volume.
- Context window (128K tokens): everything in one request—system prompt, tools, RAG chunks, and history—must fit. Truncate or summarize when approaching the limit for Voyage 3.
Real enterprise patterns
- Index docs with Voyage 3, then retrieve top-k chunks before calling a chat model.
- Hybrid search: combine keyword (BM25) + embeddings for better recall.
- Version embedding indexes when you change models—dimensions may differ.
- Monitor drift: re-embed when Voyage AI ships a new embedding revision.
Production & security
- Secrets: never commit keys for Voyage 3; use vault + per-environment rotation.
- PII: mask before inference; log redacted prompts only.
- Observability: trace id per request; log model=voyage-3, tokens in/out, latency.
- Rate limits: handle Voyage AI 429/5xx with exponential backoff and circuit breakers.
- Guardrails: schema-validate JSON; block disallowed topics; cross-check numbers against source docs.
Mini projects with this model
- Semantic FAQ: embed help-center articles with Voyage 3, answer from nearest neighbors.
- Duplicate ticket detector for support queues.
- Product recommendation by description similarity.
- Eval: NDCG@k on labeled query–doc pairs.
Suggested stack
- Language: Python 3.11+
- Embeddings: Voyage 3 (Voyage AI)
- Vector DB: Pinecone, Chroma, pgvector, or Weaviate
- Orchestration: LangChain or LlamaIndex for chunking
- UI: Streamlit or Next.js for internal tools
- APIs: FastAPI
Learning path
- Python basics
- HTTP/REST and environment variables
- Voyage AI embeddings API for Voyage 3
- Vector database fundamentals
- Chunking strategies
- RAG retrieval evaluation
- Deploy index refresh pipeline