Tuning Chat Completion Parameters in Mistral API (2025)

This guide shows how to tune the key parameters for Mistral’s Chat Completions API so you get the right balance of creativity, determinism, response length, and latency. Copy the snippets, adjust the values, and you’re production‑ready.
Quick Start
You can call the API directly with fetch or use the official SDK. The examples below use the HTTP endpoint documented at docs.mistral.ai.
// TypeScript (Node)
// env: MISTRAL_API_KEY
const resp = await fetch("https://api.mistral.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: "+Bearer " + process.env.MISTRAL_API_KEY,
  },
  body: JSON.stringify({
    model: "mistral-small-latest",
    messages: [{ role: "user", content: "Explain temperature vs. top_p" }],
    temperature: 0.5,
    top_p: 1,
    max_tokens: 300,
  }),
});
const data = await resp.json();
console.log(data.choices?.[0]?.message?.content);
Parameters That Matter
- temperature (0–2): controls randomness. Lower = focused and deterministic; higher = more creative. Start at 0.2for tasks requiring precision and0.7for ideation.
- top_p (0–1): nucleus sampling. Lower values restrict the token set to the smallest cumulative probability mass. Tune either temperature or top_p—rarely both.
- max_tokens: upper bound on the number of tokens generated. Use to keep latency/cost in check and to prevent overly long answers.
- stop: array of strings that will stop generation when encountered in the output. Useful for delimiting tool output or multi‑turn templates.
- random_seed: set for repeatable sampling. With the same prompt and parameters, the model produces stable outputs—great for tests and evals.
- stream: set to trueto receive incremental tokens. Improves perceived latency for chat UIs.
- safe_prompt: enables provider safety prompt additions that reduce risky outputs but may nudge tone. Use when deploying end‑user‑facing features.
Recipes
Deterministic Evals
For reproducible evaluations, pin temperature, top_p, and set arandom_seed. Use concise and fully specified prompts.
const body = {
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Summarize: <>".replace("<>", doc) }],
  temperature: 0,
  top_p: 1,
  random_seed: 42,
  max_tokens: 200,
};
Short, On‑Brand Responses
Keep answers tight using max_tokens and stop. Nudge tone in the system message.
const body = {
  model: "mistral-small-latest",
  messages: [
    { role: "system", content: "You are concise and friendly. 1–2 sentences." },
    { role: "user", content: "Write a CTA for a pricing page" },
  ],
  temperature: 0.4,
  top_p: 0.95,
  max_tokens: 60,
  stop: ["
"],
};
Creative Brainstorming
Increase temperature or decrease top_p to broaden the search space. Provide structure in the prompt to keep ideas usable.
const body = {
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Give 7 creative taglines for a dev tool" }],
  temperature: 0.9,
  top_p: 1,
  max_tokens: 200,
};
Streaming for Chat UIs
Turn on stream and handle server‑sent events (SSE) to show tokens as they arrive. Users perceive faster responses even when total latency is unchanged.
const resp = await fetch("https://api.mistral.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: "+Bearer " + process.env.MISTRAL_API_KEY,
  },
  body: JSON.stringify({
    model: "mistral-small-latest",
    messages: [{ role: "user", content: "Stream a haiku about latency" }],
    stream: true,
    temperature: 0.6,
  }),
});
// Read SSE stream (implementation depends on your framework)
Safety‑Conscious Defaults
Enable safe_prompt for consumer‑facing features or when integrating with UGC. This adds provider safety scaffolding which may slightly change tone.
const body = {
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Explain prompt injection briefly" }],
  temperature: 0.3,
  top_p: 0.9,
  safe_prompt: true,
  max_tokens: 180,
};
Tips
- Pick one: tune temperature or top_p, not both at once.
- Use random_seedfor tests/evals and remove it for live chats.
- Bound outputs with max_tokensand consider addingstop.
- Stream responses to keep users engaged on long generations.
- Log usage from API responses to validate cost and latency.
See Also
- Token Counting Explained: tiktoken, Anthropic, and Gemini: how to measure and estimate usage across providers.
References
- Mistral Chat Completions API: docs.mistral.ai/api/#tag/chat
Ready to Transform Your Code Review Process?
See how Propel's AI-powered code review helps engineering teams ship better code faster with intelligent analysis and actionable feedback.




