Model Specifications

Voxtral Model Family

Choose the right model for your use case. All models feature automatic language detection, built-in Q&A capabilities, and Apache 2.0 licensing.

Recommended

Voxtral Large
v1.0

Our flagship model with 32K token context window and state-of-the-art accuracy

95.2%
Accuracy
32K tokens
Context
15+
Languages
7B parameters
Parameters

Key Features

  • 32K token context window (40+ minutes)
  • Built-in Q&A and summarization
  • Function calling from voice
  • Automatic language detection
  • Real-time processing support

Benchmark Results

Common Voice95.2%
vs 91.7% Whisper large-v3
FLEURS93.8%
vs 89.4% Whisper large-v3
LibriSpeech96.1%
vs 94.2% Whisper large-v3

Voxtral Base
v1.0

Balanced performance and efficiency for most use cases

92.8%
Accuracy
16K tokens
Context
15+
Languages
1.5B parameters
Parameters

Key Features

  • 16K token context window (20+ minutes)
  • Built-in Q&A and summarization
  • Automatic language detection
  • Optimized for edge deployment

Benchmark Results

Common Voice92.8%
vs 89.1% Whisper base
FLEURS90.5%
vs 86.2% Whisper base
LibriSpeech93.4%
vs 91.8% Whisper base

Voxtral Small
v1.0

Lightweight model optimized for mobile and edge devices

89.4%
Accuracy
8K tokens
Context
10+
Languages
244M parameters
Parameters

Key Features

  • 8K token context window (10+ minutes)
  • Automatic language detection
  • Mobile and edge optimized
  • Low memory footprint

Benchmark Results

Common Voice89.4%
vs 85.7% Whisper small
FLEURS86.2%
vs 82.1% Whisper small
LibriSpeech90.8%
vs 88.3% Whisper small

Supported Languages

All models support automatic language detection across these languages

English
Spanish
French
Portuguese
Hindi
German
Dutch
Italian
Chinese
Japanese
Korean
Russian
Arabic
Turkish
Polish

Quick Start

Get started with Voxtral models using our API or self-hosted deployment

API Usage

curl -X POST https://api.mistral.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_audio.wav" \
  -F "model=voxtral-large" \
  -F "response_format=verbose_json"

Self-hosted Deployment

docker run -p 8000:8000 \
  -v ./models:/models \
  voxtral/voxtral-server:latest \
  --model voxtral-large

Ready to Try Voxtral?

Experience the power of our speech AI models with our interactive demo or start building with our API.