Answered By: James Capobianco
Last Updated: Nov 28, 2023     Views: 136

A good tool that runs locally is OpenAI's Whisper. It provides very good quality transcripts fairly quickly. Installation and use depends on your operating system and which version you install.

Important Note: The Whisper API, where audio is sent to OpenAI to be processed by them and then sent back (usually through a programming language like Python) is NOT appropriate for sensitive data. The model should be downloaded with tools such as those described in the rest of this FAQ, so that audio is kept to your local machine. 

Whisper

Windows

To install and use on Windows, you will need some comfort with the command line (cmd) and administrative rights on your computer. The easiest version of Whisper is whisper-standalone-win, which though run from the command line, has everything in one .exe file.

You'll need to download one of the Windows zipped files from the Assets on the Releases page.

Unzip it and put it somewhere on your computer you will remember. The easiest thing at this point is also to put the audio you need to transcribe in the same folder.

Find the command prompt (search for cmd), right click on it, and then choose "Open as Administrator."

Navigate to the folder with whisper-faster.exe, and use this form of command to run it:

whisper-faster.exe "[path-to-file-to-transcibe]" --language=English --model=medium

It can be somewhat challenging to get this working on Windows, so don't hesitate to contact us for help getting started.

MacOS

There are many more options for the Mac, including apps that can be run without using the Terminal. 

One of the easiest to use is Aiko which can be installed from the App Store. You can then just drag-and-drop audio onto the app and it will transcribe it, and you can choose to download in different formats, including those with timestamps. By default, it auto-detects the language and transcribes in that language, but you can also set it to translate to English. You will likely need at least 16 GB of RAM to use this (though some users report that it works on less). 

Other Options

Adobe Premiere Pro Speech to Text

You can also use Adobe Premiere Pro Speech to Text to generate captions for video and audio. To make sure it is only transcribing on your own device, make sure you have version 22.2 or later. As a Harvard affiliate, you have free access to the Adobe Creative Cloud, including Premiere Pro.