Prepare Your AI Environment

Reading time: approx. 9 min

What You Will Learn

To run complex AI models like KB-Whisper we need a specific set of software libraries. To avoid conflicts with other programs on your computer we create a "virtual environment". This moment guides you through creating a folder for the project, setting up an isolated Python environment, and installing all necessary packages.

The Basics: Virtual Environments (venv)

A virtual environment is like a separate, clean workspace for a specific Python project. Nothing you install here will affect the rest of your system, and vice versa. It is a standard method for keeping projects organized and reproducible.

How We Do It: Step-by-Step Guide

1. Create a Project Folder

We start by creating and navigating to a folder for our transcription project.

mkdir ~/sv-transkriptor
cd ~/sv-transkriptor

2. Create and Activate the Virtual Environment

Inside your project folder, run the following commands.

Create the environment (you only do this once):

python3 -m venv .venv

This creates a new folder .venv that contains its own copy of Python.

Activate the environment (you do this every time you work with the project):

source .venv/bin/activate

You will see (.venv) at the beginning of your terminal prompt, which indicates that the environment is active.

3. Install Python Dependencies

While your venv is active, install the following libraries. This is the core of our AI process.

Install PyTorch (the foundation for most AI models):

pip install torch --index-url https://download.pytorch.org/whl/cu121

Note: cu121 is for NVIDIA graphics cards. If you do not have one, you may need a different version adapted for CPU.

Install Transformers (to download and use models like Whisper):

pip install transformers safetensors

Install tools for audio handling:

pip install ffmpeg-python pydub

Install tools for punctuation (period, comma, etc.):

pip install sentencepiece

4. Prepare the Audio File for the AI Model

AI models cannot handle arbitrarily long audio files. We must split our MP3 file into smaller, manageable chunks.

Create a Python file preprocess.py: Use a text editor to create the file preprocess.py and paste in the following code:

import os, math, ffmpeg

# Put your MP3 file in the same folder and name it input.mp3
INPUT_FILE   = "input.mp3"
OUTPUT_DIR   = "chunks"
CHUNK_SEC    = 30 # Split into 30-second chunks

os.makedirs(OUTPUT_DIR, exist_ok=True)
probe = ffmpeg.probe(INPUT_FILE)
duration = float(probe["format"]["duration"])
num_chunks = math.ceil(duration / CHUNK_SEC)

print(f"Splitting {INPUT_FILE} into {num_chunks} parts...")

for i in range(num_chunks):
    start = i * CHUNK_SEC
    out = f"{OUTPUT_DIR}/chunk_{i:03d}.wav"
    (
        ffmpeg
        .input(INPUT_FILE, ss=start, t=CHUNK_SEC)
        .output(out, ac=1, ar=16000, format="wav", loglevel="error")
        .overwrite_output()
        .run()
    )
    print(f"Created: {out}")

print("Preprocessing complete!")

How to Run the Preprocessing

  1. Make sure you have an MP3 file named input.mp3 in your sv-transkriptor folder.
  2. Make sure your virtual environment is active (source .venv/bin/activate).
  3. Run the script:
    python preprocess.py
    

You will now have a new folder called chunks filled with small WAV files.

Next Step

The environment is ready and the audio is prepared. In the next moment it is time for the magic: sending our audio files to KB-Whisper and getting back text.