Token Counting Explained: tiktoken, Anthropic, and Gemini (2025 Guide)

Token counting is how providers measure prompt and response size for pricing, context limits, and rate limits. Tokens are not characters or words; they are model-specific units produced by a tokenizer. Understanding how tokenization works—and how models differ—is key to budgeting costs and avoiding truncation.
How Tokenization Works
Most modern models use subword tokenization. Common approaches include Byte Pair Encoding (BPE) and SentencePiece-like unigram models. Tokenizers split text into byte sequences or subwords based on training statistics, so the same sentence may produce different token counts across providers.
Key Concepts
- Byte-level encodings: Robust with emojis and non‑ASCII, but can add overhead.
- Subwords vs. words: Short, common pieces tokenize cheaply; rare terms expand into many tokens.
- System + tool framing: Provider SDKs often add hidden tokens for roles, tools, and metadata.
OpenAI: tiktoken Overview
OpenAI models are typically measured with tiktoken, an efficient BPE tokenizer. In JavaScript, use js-tiktoken
to estimate tokens for prompts and responses.
// TypeScript (Node)
// npm i js-tiktoken
import { encoding_for_model } from "js-tiktoken";
const enc = encoding_for_model("gpt-4o-mini"); // choose a close encoding
const prompt = "Explain zero-copy streaming in Node.js and give examples.";
const ids = enc.encode(prompt);
console.log("tokens:", ids.length);
enc.free();
Note: Pick an encoding close to your target model. Exact counts may vary by release and provider framing (system prompts, tool calls).
Anthropic: Tokenizer and API
Anthropic models (Claude family) use a tokenizer distinct from tiktoken. Anthropic provides an official API and SDKs; for local estimation you can use their tokenizer utilities. See the official guide on token counting for Claude here.
// TypeScript (Node) // npm i @anthropic-ai/sdk import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }); const messages = [ { role: "user", content: "Contrast streaming vs. non-streaming responses in Claude." }, ]; // Official token counting (recommended) const count = await client.messages.countTokens({ model: "claude-3-5-sonnet-20240620", messages, }); console.log("anthropic input tokens:", count.input_tokens); // Normal generation call (actual usage metrics come back on responses) const msg = await client.messages.create({ model: "claude-3-5-sonnet-20240620", max_tokens: 400, messages, });
Google Gemini: Token Counting API
Gemini exposes a dedicated countTokens
endpoint to estimate usage before generating. This is available in both the Generative Language API and Vertex AI. See the API reference at ai.google.dev/api/tokens.
// TypeScript (Node)
// npm i @google/generative-ai
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });
const { totalTokens } = await model.countTokens({
contents: [{ role: "user", parts: [{ text: "Explain embeddings vs. tokens" }] }],
});
console.log("gemini tokens:", totalTokens);
Mistral: Tokenizer and Counting
Mistral models (e.g., Mistral 7B, Mixtral) use a SentencePiece‑style tokenizer similar to LLaMA families. For local estimates, use the model’s tokenizer from Hugging Face; API responses also include usage fields you can log for ground‑truth counts. See Mistral’s tokenizer guide here.
# Python (Hugging Face Transformers)
# pip install transformers sentencepiece
from transformers import AutoTokenizer
# Choose the exact tokenizer for your target model
# Examples: "mistralai/Mistral-7B-Instruct-v0.2" or "mistralai/Mixtral-8x7B-Instruct-v0.1"
tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
text = "Explain rotary position embeddings and KV cache tradeoffs."
ids = tok(text).input_ids
print("mistral tokens:", len(ids))
Tip: Match the tokenizer to the exact checkpoint you are targeting. Small vocabulary changes can shift counts. When validating costs, prefer API usage fields from real calls.
Why Counts Differ Across Models
- Different vocabularies: Token sets vary by provider and model family.
- Pre/post framing: System prompts, tool schemas, and safety wrappers add hidden tokens.
- Multimodal inputs: Images, audio, and structured tool calls have provider‑specific counting rules.
Troubleshooting: Different Counts Across Providers
- Normalize whitespace and newlines before counting; hidden characters change results.
- Replicate provider framing: include system prompts, tool schemas, and function args.
- Use provider tooling: tiktoken (OpenAI), Anthropic tokenizer, Gemini
countTokens
, and the exact HF tokenizer for Mistral. - Compare apples to apples: count the final request body you actually send over the wire.
Cost Estimation Tips
- Budget both input and output tokens; responses often dominate cost.
- Cache prompts/tool schemas; avoid re-sending large system frames.
- Use Gemini
countTokens
preflight and local tokenizers elsewhere. - Track merged LOC or accepted diff size, not just raw generation.
See Also
- AI Code Reviews Guide: how token budgets interact with review depth and latency.
- How AI is Transforming Code Completion: practical tips to reduce prompt size while retaining context.
References
- OpenAI tiktoken: GitHub · js‑tiktoken
- Anthropic token counting guide: docs.anthropic.com/en/docs/build-with-claude/token-counting
- Google Gemini token counting API: ai.google.dev/api/tokens
- Mistral tokenization guide: docs.mistral.ai/guides/tokenization/ · Hugging Face models
Ready to Transform Your Code Review Process?
See how Propel's AI-powered code review helps engineering teams ship better code faster with intelligent analysis and actionable feedback.