Prepare Your AI Environment
Reading time: approx. 9 min
What You Will Learn
To run complex AI models like KB-Whisper we need a specific set of software libraries. To avoid conflicts with other programs on your computer we create a "virtual environment". This moment guides you through creating a folder for the project, setting up an isolated Python environment, and installing all necessary packages.
The Basics: Virtual Environments (venv)
A virtual environment is like a separate, clean workspace for a specific Python project. Nothing you install here will affect the rest of your system, and vice versa. It is a standard method for keeping projects organized and reproducible.
How We Do It: Step-by-Step Guide
1. Create a Project Folder
We start by creating and navigating to a folder for our transcription project.
mkdir ~/sv-transkriptor
cd ~/sv-transkriptor
2. Create and Activate the Virtual Environment
Inside your project folder, run the following commands.
Create the environment (you only do this once):
python3 -m venv .venv
This creates a new folder .venv that contains its own copy of Python.
Activate the environment (you do this every time you work with the project):
source .venv/bin/activate
You will see (.venv) at the beginning of your terminal prompt, which indicates that the environment is active.
3. Install Python Dependencies
While your venv is active, install the following libraries. This is the core of our AI process.
Install PyTorch (the foundation for most AI models):
pip install torch --index-url https://download.pytorch.org/whl/cu121
Note: cu121 is for NVIDIA graphics cards. If you do not have one, you may need a different version adapted for CPU.
Install Transformers (to download and use models like Whisper):
pip install transformers safetensors
Install tools for audio handling:
pip install ffmpeg-python pydub
Install tools for punctuation (period, comma, etc.):
pip install sentencepiece
4. Prepare the Audio File for the AI Model
AI models cannot handle arbitrarily long audio files. We must split our MP3 file into smaller, manageable chunks.
Create a Python file preprocess.py: Use a text editor to create the file preprocess.py and paste in the following code:
import os, math, ffmpeg
# Put your MP3 file in the same folder and name it input.mp3
INPUT_FILE = "input.mp3"
OUTPUT_DIR = "chunks"
CHUNK_SEC = 30 # Split into 30-second chunks
os.makedirs(OUTPUT_DIR, exist_ok=True)
probe = ffmpeg.probe(INPUT_FILE)
duration = float(probe["format"]["duration"])
num_chunks = math.ceil(duration / CHUNK_SEC)
print(f"Splitting {INPUT_FILE} into {num_chunks} parts...")
for i in range(num_chunks):
start = i * CHUNK_SEC
out = f"{OUTPUT_DIR}/chunk_{i:03d}.wav"
(
ffmpeg
.input(INPUT_FILE, ss=start, t=CHUNK_SEC)
.output(out, ac=1, ar=16000, format="wav", loglevel="error")
.overwrite_output()
.run()
)
print(f"Created: {out}")
print("Preprocessing complete!")
How to Run the Preprocessing
- Make sure you have an MP3 file named
input.mp3in yoursv-transkriptorfolder. - Make sure your virtual environment is active (
source .venv/bin/activate). - Run the script:
python preprocess.py
You will now have a new folder called chunks filled with small WAV files.
Next Step
The environment is ready and the audio is prepared. In the next moment it is time for the magic: sending our audio files to KB-Whisper and getting back text.

