Token Counting Explained: tiktoken, Anthropic, and Gemini (2025 Guide)

Token counting is how providers measure prompt and response size for pricing, context limits, and rate limits. Tokens are not characters or words; they are model-specific units produced by a tokenizer. Understanding how tokenization works—and how models differ—is key to budgeting costs and avoiding truncation.

How Tokenization Works

Most modern models use subword tokenization. Common approaches include Byte Pair Encoding (BPE) and SentencePiece-like unigram models. Tokenizers split text into byte sequences or subwords based on training statistics, so the same sentence may produce different token counts across providers.

Key Concepts

Byte-level encodings: Robust with emojis and non‑ASCII, but can add overhead.
Subwords vs. words: Short, common pieces tokenize cheaply; rare terms expand into many tokens.
System + tool framing: Provider SDKs often add hidden tokens for roles, tools, and metadata.

FAQ: Anthropic Token Counting

How do I count tokens in Anthropic (Claude)?

Use the official messages.countTokens method in @anthropic-ai/sdk. Build your messages array exactly as you will send it (including system prompts and tool definitions) and pass the same model. This returns accurate counts that match billing.

Can I approximate Anthropic token counts without the API?

For quick estimates, you can use OpenAI’s tiktoken with thep50k_base encoding (MODEL_P50K_BASE). This is an approximation—use Anthropic’s API for ground‑truth numbers.

OpenAI: tiktoken Overview

OpenAI models are typically measured with tiktoken, an efficient BPE tokenizer. In JavaScript, use js-tiktoken to estimate tokens for prompts and responses.

// TypeScript (Node)
// npm i js-tiktoken
import { encoding_for_model } from "js-tiktoken";

const enc = encoding_for_model("gpt-4o-mini"); // choose a close encoding
const prompt = "Explain zero-copy streaming in Node.js and give examples.";
const ids = enc.encode(prompt);
console.log("tokens:", ids.length);
enc.free();

Note: Pick an encoding close to your target model. Exact counts may vary by release and provider framing (system prompts, tool calls).

Anthropic: Tokenizer and API (How to Count Tokens)

Anthropic models (Claude family) use a tokenizer distinct from tiktoken. Anthropic provides an official API and SDKs; for local estimation you can use their tokenizer utilities. See the official guide on token counting for Claude here.

Quick approximation: if you can’t call Anthropic’s countTokens (e.g., offline estimation), you can approximate Claude token counts using OpenAI’s tiktokenwith the p50k_base encoding (a.k.a. MODEL_P50K_BASE; seetiktoken encodings). This is only an estimate—always prefer Anthropic’s official counts for billing‑grade accuracy.

// TypeScript (Node)
// npm i js-tiktoken
import { get_encoding } from "js-tiktoken";

// Approximate Claude tokens locally (estimation only)
const enc = get_encoding("p50k_base"); // MODEL_P50K_BASE
const text = "Contrast streaming vs. non‑streaming responses in Claude.";
const approx = enc.encode(text).length;
console.log("approx anthropic tokens:", approx);
enc.free();

Quick steps: How to count tokens in Anthropic (Claude)

Install @anthropic-ai/sdk and set ANTHROPIC_API_KEY.
Build the request messages exactly as you plan to send them.
Call client.messages.countTokens({ model, messages }) to get counts.

// TypeScript (Node)
// npm i @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
const messages = [
  { role: "user", content: "Contrast streaming vs. non-streaming responses in Claude." },
];

// Official token counting (recommended)
const count = await client.messages.countTokens({
  model: "claude-3-5-sonnet-20240620",
  messages,
});
console.log("anthropic input tokens:", count.input_tokens);

// Normal generation call (actual usage metrics come back on responses)
const msg = await client.messages.create({
  model: "claude-3-5-sonnet-20240620",
  max_tokens: 400,
  messages,
});

Google Gemini: Token Counting API

Gemini exposes a dedicated countTokens endpoint to estimate usage before generating. This is available in both the Generative Language API and Vertex AI. See the API reference at ai.google.dev/api/tokens.

// TypeScript (Node)
// npm i @google/generative-ai
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });

const { totalTokens } = await model.countTokens({
  contents: [{ role: "user", parts: [{ text: "Explain embeddings vs. tokens" }] }],
});
console.log("gemini tokens:", totalTokens);

Mistral: Tokenizer and Counting

Mistral models (e.g., Mistral 7B, Mixtral) use a SentencePiece‑style tokenizer similar to LLaMA families. For local estimates, use the model’s tokenizer from Hugging Face; API responses also include usage fields you can log for ground‑truth counts. See Mistral’s tokenizer guide here.

# Python (Hugging Face Transformers)
# pip install transformers sentencepiece
from transformers import AutoTokenizer

# Choose the exact tokenizer for your target model
# Examples: "mistralai/Mistral-7B-Instruct-v0.2" or "mistralai/Mixtral-8x7B-Instruct-v0.1"
tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
text = "Explain rotary position embeddings and KV cache tradeoffs."
ids = tok(text).input_ids
print("mistral tokens:", len(ids))

Tip: Match the tokenizer to the exact checkpoint you are targeting. Small vocabulary changes can shift counts. When validating costs, prefer API usage fields from real calls.

Why Counts Differ Across Models

Different vocabularies: Token sets vary by provider and model family.
Pre/post framing: System prompts, tool schemas, and safety wrappers add hidden tokens.
Multimodal inputs: Images, audio, and structured tool calls have provider‑specific counting rules.

Troubleshooting: Different Counts Across Providers

Normalize whitespace and newlines before counting; hidden characters change results.
Replicate provider framing: include system prompts, tool schemas, and function args.
Use provider tooling: tiktoken (OpenAI), Anthropic tokenizer, Gemini countTokens, and the exact HF tokenizer for Mistral.
Compare apples to apples: count the final request body you actually send over the wire.

Cost Estimation Tips

Budget both input and output tokens; responses often dominate cost.
Cache prompts/tool schemas; avoid re-sending large system frames.
Use Gemini countTokens preflight and local tokenizers elsewhere.
Track merged LOC or accepted diff size, not just raw generation.

Token Counting Explained: tiktoken, Anthropic, and Gemini (2025 Guide)

How Tokenization Works

Key Concepts

FAQ: Anthropic Token Counting

OpenAI: tiktoken Overview

Anthropic: Tokenizer and API (How to Count Tokens)

Quick steps: How to count tokens in Anthropic (Claude)

Google Gemini: Token Counting API

Mistral: Tokenizer and Counting

Why Counts Differ Across Models

Troubleshooting: Different Counts Across Providers

Cost Estimation Tips

See Also

References

Ready to Transform Your Code Review Process?

Explore More

Automated Code Review Tools and Practices: 2025 Guide

Top GitHub Code Review Platforms and Integrations (2025)

Tuning Chat Completion Parameters in Mistral API (2025)

Resources

Company

Legal & Security