Token Counting Explained: tiktoken, Anthropic, and Gemini (2025 Guide)

Token counting is how providers measure prompt and response size for pricing, context limits, and rate limits. Tokens are not characters or words; they are model-specific units produced by a tokenizer. Understanding how tokenization works—and how models differ—is key to budgeting costs and avoiding truncation.
How Tokenization Works
Most modern models use subword tokenization. Common approaches include Byte Pair Encoding (BPE) and SentencePiece-like unigram models. Tokenizers split text into byte sequences or subwords based on training statistics, so the same sentence may produce different token counts across providers.
Key Concepts
- Byte-level encodings: Robust with emojis and non‑ASCII, but can add overhead.
- Subwords vs. words: Short, common pieces tokenize cheaply; rare terms expand into many tokens.
- System + tool framing: Provider SDKs often add hidden tokens for roles, tools, and metadata.
FAQ: Anthropic Token Counting
How do I count tokens in Anthropic (Claude)?
Use the official messages.countTokens
method in @anthropic-ai/sdk
. Build your messages
array exactly as you will send it (including system prompts and tool definitions) and pass the same model
. This returns accurate counts that match billing.
Can I approximate Anthropic token counts without the API?
For quick estimates, you can use OpenAI’s tiktoken
with thep50k_base
encoding (MODEL_P50K_BASE
). This is an approximation—use Anthropic’s API for ground‑truth numbers.
OpenAI: tiktoken Overview
OpenAI models are typically measured with tiktoken, an efficient BPE tokenizer. In JavaScript, use js-tiktoken
to estimate tokens for prompts and responses.
// TypeScript (Node)
// npm i js-tiktoken
import { encoding_for_model } from "js-tiktoken";
const enc = encoding_for_model("gpt-4o-mini"); // choose a close encoding
const prompt = "Explain zero-copy streaming in Node.js and give examples.";
const ids = enc.encode(prompt);
console.log("tokens:", ids.length);
enc.free();
Note: Pick an encoding close to your target model. Exact counts may vary by release and provider framing (system prompts, tool calls).
Anthropic: Tokenizer and API (How to Count Tokens)
Anthropic models (Claude family) use a tokenizer distinct from tiktoken. Anthropic provides an official API and SDKs; for local estimation you can use their tokenizer utilities. See the official guide on token counting for Claude here.
Quick approximation: if you can’t call Anthropic’s countTokens
(e.g., offline estimation), you can approximate Claude token counts using OpenAI’s tiktoken
with the p50k_base
encoding (a.k.a. MODEL_P50K_BASE
; seetiktoken encodings). This is only an estimate—always prefer Anthropic’s official counts for billing‑grade accuracy.
// TypeScript (Node) // npm i js-tiktoken import { get_encoding } from "js-tiktoken"; // Approximate Claude tokens locally (estimation only) const enc = get_encoding("p50k_base"); // MODEL_P50K_BASE const text = "Contrast streaming vs. non‑streaming responses in Claude."; const approx = enc.encode(text).length; console.log("approx anthropic tokens:", approx); enc.free();
Quick steps: How to count tokens in Anthropic (Claude)
- Install
@anthropic-ai/sdk
and setANTHROPIC_API_KEY
. - Build the request
messages
exactly as you plan to send them. - Call
client.messages.countTokens({ model, messages })
to get counts.
// TypeScript (Node) // npm i @anthropic-ai/sdk import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }); const messages = [ { role: "user", content: "Contrast streaming vs. non-streaming responses in Claude." }, ]; // Official token counting (recommended) const count = await client.messages.countTokens({ model: "claude-3-5-sonnet-20240620", messages, }); console.log("anthropic input tokens:", count.input_tokens); // Normal generation call (actual usage metrics come back on responses) const msg = await client.messages.create({ model: "claude-3-5-sonnet-20240620", max_tokens: 400, messages, });
Google Gemini: Token Counting API
Gemini exposes a dedicated countTokens
endpoint to estimate usage before generating. This is available in both the Generative Language API and Vertex AI. See the API reference at ai.google.dev/api/tokens.
// TypeScript (Node)
// npm i @google/generative-ai
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });
const { totalTokens } = await model.countTokens({
contents: [{ role: "user", parts: [{ text: "Explain embeddings vs. tokens" }] }],
});
console.log("gemini tokens:", totalTokens);
Mistral: Tokenizer and Counting
Mistral models (e.g., Mistral 7B, Mixtral) use a SentencePiece‑style tokenizer similar to LLaMA families. For local estimates, use the model’s tokenizer from Hugging Face; API responses also include usage fields you can log for ground‑truth counts. See Mistral’s tokenizer guide here.
# Python (Hugging Face Transformers)
# pip install transformers sentencepiece
from transformers import AutoTokenizer
# Choose the exact tokenizer for your target model
# Examples: "mistralai/Mistral-7B-Instruct-v0.2" or "mistralai/Mixtral-8x7B-Instruct-v0.1"
tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
text = "Explain rotary position embeddings and KV cache tradeoffs."
ids = tok(text).input_ids
print("mistral tokens:", len(ids))
Tip: Match the tokenizer to the exact checkpoint you are targeting. Small vocabulary changes can shift counts. When validating costs, prefer API usage fields from real calls.
Why Counts Differ Across Models
- Different vocabularies: Token sets vary by provider and model family.
- Pre/post framing: System prompts, tool schemas, and safety wrappers add hidden tokens.
- Multimodal inputs: Images, audio, and structured tool calls have provider‑specific counting rules.
Troubleshooting: Different Counts Across Providers
- Normalize whitespace and newlines before counting; hidden characters change results.
- Replicate provider framing: include system prompts, tool schemas, and function args.
- Use provider tooling: tiktoken (OpenAI), Anthropic tokenizer, Gemini
countTokens
, and the exact HF tokenizer for Mistral. - Compare apples to apples: count the final request body you actually send over the wire.
Cost Estimation Tips
- Budget both input and output tokens; responses often dominate cost.
- Cache prompts/tool schemas; avoid re-sending large system frames.
- Use Gemini
countTokens
preflight and local tokenizers elsewhere. - Track merged LOC or accepted diff size, not just raw generation.
See Also
- AI Code Reviews Guide: how token budgets interact with review depth and latency.
- How AI is Transforming Code Completion: practical tips to reduce prompt size while retaining context.
- Tuning Chat Completion Parameters in Mistral API: temperature vs. top_p, max_tokens, random_seed, streaming, and safety.
References
- OpenAI tiktoken: GitHub · js‑tiktoken
- tiktoken encoding names (e.g.,
p50k_base
/MODEL_P50K_BASE
): available encodings - Anthropic token counting guide: docs.anthropic.com/en/docs/build-with-claude/token-counting
- Google Gemini token counting API: ai.google.dev/api/tokens
- Mistral tokenization guide: docs.mistral.ai/guides/tokenization/ · Hugging Face models
Ready to Transform Your Code Review Process?
See how Propel's AI-powered code review helps engineering teams ship better code faster with intelligent analysis and actionable feedback.