Whisper from OpenAI is a powerful tool for transcribing audio files into text. It supports multiple languages, including Norwegian. Below is a guide on how to use Whisper to transcribe Norwegian audio files.

whisper K01-L32.mp3 --language Norwegian

alternatively, can use abbreviation for language:

whisper K01-L32.mp3 --language no

The transcription will be printed to the console as well as some auxiliary files.

  • K01-L32.json JavaScript Object Notation.
  • K01-L32.srt Standard subtitle format for videos. Shows text with start and end timestamps in HH:MM:SS,ms format. Can be loaded into most media players for subtitles.
  • K01-L32.tsv Tab-separated values format. Each row usually contains start time, end time, and the text for a segment. Can be opened in Excel, Google Sheets, or read into Python/R for analysis.
  • K01-L32.txt Plain text. Simple transcript without timestamps or metadata. Best for quickly reading or copying the entire text.
  • K01-L32.vtt Web Video Text Tracks. Subtitles/captions for web videos. Similar to SRT but designed for HTML5 <track> elements. Supports additional metadata like styling or speaker labels.

If you want a specific output format, you can specify it using the --output_format option. For example, to get only the SRT file:

whisper K01-L32.mp3 --language Norwegian --output_format srt

--output_format: format of the output file; if not specified, all available formats will be produced (default: all). Supported formats: txt, vtt, srt,tsv, json, all.

  • If you want multiple formats, just use the default all. If you run the command twice with different formats, Whisper re-transcirbes the audio each time, which can be time-consuming.

Other options:

--output_dir: This argument specifies where Whisper’s output should be saved. --model: This is the size of the Whisper model to be used for this task. The sizes range from tiny, base, small, medium, and large. As models get larger, their accuracy improves but the relative speed of their output decreases.


Run the following to view all available options:

whisper --help

Whisper GitHub repo: https://github.com/openai/whisper?tab=readme-ov-file


Q: Is Whisper free to use?
A: Yes, Whisper is free to use. It is an open-source project by OpenAI, and you can run it locally on your machine without any cost.

Q: When Whisper is NOT free?
A: OpenAI API: If you use the hosted Whisper service provided through the OpenAI API, you will be charged a per-minute rate for transcription.

Key Distinction 

  • Open-source model vs. API service: The open-source version you download and run yourself is free, similar to using a calculator app that works offline. The API is a hosted service that costs money, like a subscription-based app.

Q: How do show the transcripts when playing the audio?
A: Play the MP3 file in VLC then go to Audio > Visualizations > Spectrometer. Now the subtitles should display.

use srt with audio.png

Source: https://superuser.com/a/1784310

OpenAI Platform: https://platform.openai.com/docs/guides/speech-to-text