Tuning Chat Completion Parameters in Mistral API (2025)

This guide shows how to tune the key parameters for Mistral’s Chat Completions API so you get the right balance of creativity, determinism, response length, and latency. Copy the snippets, adjust the values, and you’re production‑ready.
Quick Start
You can call the API directly with fetch
or use the official SDK. The examples below use the HTTP endpoint documented at docs.mistral.ai.
// TypeScript (Node)
// env: MISTRAL_API_KEY
const resp = await fetch("https://api.mistral.ai/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "+Bearer " + process.env.MISTRAL_API_KEY,
},
body: JSON.stringify({
model: "mistral-small-latest",
messages: [{ role: "user", content: "Explain temperature vs. top_p" }],
temperature: 0.5,
top_p: 1,
max_tokens: 300,
}),
});
const data = await resp.json();
console.log(data.choices?.[0]?.message?.content);
Parameters That Matter
- temperature (0–2): controls randomness. Lower = focused and deterministic; higher = more creative. Start at
0.2
for tasks requiring precision and0.7
for ideation. - top_p (0–1): nucleus sampling. Lower values restrict the token set to the smallest cumulative probability mass. Tune either temperature or top_p—rarely both.
- max_tokens: upper bound on the number of tokens generated. Use to keep latency/cost in check and to prevent overly long answers.
- stop: array of strings that will stop generation when encountered in the output. Useful for delimiting tool output or multi‑turn templates.
- random_seed: set for repeatable sampling. With the same prompt and parameters, the model produces stable outputs—great for tests and evals.
- stream: set to
true
to receive incremental tokens. Improves perceived latency for chat UIs. - safe_prompt: enables provider safety prompt additions that reduce risky outputs but may nudge tone. Use when deploying end‑user‑facing features.
Recipes
Deterministic Evals
For reproducible evaluations, pin temperature
, top_p
, and set arandom_seed
. Use concise and fully specified prompts.
const body = {
model: "mistral-small-latest",
messages: [{ role: "user", content: "Summarize: <>".replace("<>", doc) }],
temperature: 0,
top_p: 1,
random_seed: 42,
max_tokens: 200,
};
Short, On‑Brand Responses
Keep answers tight using max_tokens
and stop
. Nudge tone in the system message.
const body = {
model: "mistral-small-latest",
messages: [
{ role: "system", content: "You are concise and friendly. 1–2 sentences." },
{ role: "user", content: "Write a CTA for a pricing page" },
],
temperature: 0.4,
top_p: 0.95,
max_tokens: 60,
stop: ["
"],
};
Creative Brainstorming
Increase temperature
or decrease top_p
to broaden the search space. Provide structure in the prompt to keep ideas usable.
const body = {
model: "mistral-small-latest",
messages: [{ role: "user", content: "Give 7 creative taglines for a dev tool" }],
temperature: 0.9,
top_p: 1,
max_tokens: 200,
};
Streaming for Chat UIs
Turn on stream
and handle server‑sent events (SSE) to show tokens as they arrive. Users perceive faster responses even when total latency is unchanged.
const resp = await fetch("https://api.mistral.ai/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "+Bearer " + process.env.MISTRAL_API_KEY,
},
body: JSON.stringify({
model: "mistral-small-latest",
messages: [{ role: "user", content: "Stream a haiku about latency" }],
stream: true,
temperature: 0.6,
}),
});
// Read SSE stream (implementation depends on your framework)
Safety‑Conscious Defaults
Enable safe_prompt
for consumer‑facing features or when integrating with UGC. This adds provider safety scaffolding which may slightly change tone.
const body = {
model: "mistral-small-latest",
messages: [{ role: "user", content: "Explain prompt injection briefly" }],
temperature: 0.3,
top_p: 0.9,
safe_prompt: true,
max_tokens: 180,
};
Tips
- Pick one: tune temperature or top_p, not both at once.
- Use
random_seed
for tests/evals and remove it for live chats. - Bound outputs with
max_tokens
and consider addingstop
. - Stream responses to keep users engaged on long generations.
- Log usage from API responses to validate cost and latency.
See Also
- Token Counting Explained: tiktoken, Anthropic, and Gemini: how to measure and estimate usage across providers.
References
- Mistral Chat Completions API: docs.mistral.ai/api/#tag/chat
Ready to Transform Your Code Review Process?
See how Propel's AI-powered code review helps engineering teams ship better code faster with intelligent analysis and actionable feedback.