What can I use for secure transcription of sensitive audio or video?

Answered By: James Capobianco

Last Updated: Jul 16, 2025 Views: 11149

A good tool that runs locally is OpenAI's Whisper. It provides very good quality transcripts fairly quickly. Installation and use depends on your operating system and which version you install.

Important Note: The Whisper API, where audio is sent to OpenAI to be processed by them and then sent back (usually through a programming language like Python) is NOT appropriate for sensitive data. The model should be downloaded with tools such as those described in the rest of this FAQ, so that audio is kept to your local machine.

Whisper

Windows

aTrain is a free app created by the University of Graz that not only uses Whisper to transcribe, but also distinguishes different speakers. It can be downloaded from the Microsoft Store or downloaded and installed directly from aTrain's Download Page (MSIX format recommended).

You can choose the file you want to transcribe, the size of model (smaller is faster but not as accurate), and also tell the app whether there are multiple speakers (specifying the number improves accuracy). If you have an NVIDIA graphics card (GPU), you can greatly speed up transcription time by noting that in the Advanced Settings.

MacOS

There are many options to use Whisper for the Mac, though none of the recommended apps are free.

One of the easiest to use is Aiko which can be installed from the App Store. You can then just drag-and-drop audio onto the app and it will transcribe it, and you can choose to download in different formats, including those with timestamps. By default, it auto-detects the language and transcribes in that language, but you can also set it to translate to English. You will likely need at least 16 GB of RAM to use this (though some users report that it works on less). This is the cheaper of the two Mac options, but does not provide speaker identification.

One of the most popular and regularly updated is MacWhisper. It can be used in a free version, but it doesn't have many of the important functions, such as using larger models that are most accurate. The Pro version (one-time purchase) includes speaker identification, as well as access to a newer transcription model from NVIDIA called Parakeet, which transcribes incredibly quickly (though only in English).

Other Options

Adobe Premiere Pro Speech to Text

You can also use Adobe Premiere Pro Speech to Text to generate captions for video and audio. To make sure it is only transcribing on your own device, make sure you have version 22.2 or later. As a Harvard affiliate, you have free access to the Adobe Creative Cloud, including Premiere Pro.

All FAQs

Return to Search and Browse All FAQs

Q. What can I use for secure transcription of sensitive audio or video?

Answered By: James Capobianco Last Updated: Jul 16, 2025 Views: 11149

All FAQs

Answered By: James Capobianco

Last Updated: Jul 16, 2025 Views: 11149