This is one of the very concrete uses of artificial intelligence: automatic transcription into text from an audio file: a revolution for students, doctors, journalists in particular, but also in businesses.
Published
Update
Reading time: 3 min
Over the past year, the number of AI-based automatic transcription software or hardware solutions has exploded. Since September 2022 – to be precise – and the release of Whisper, another artificial intelligence technology, made available by OpenAI. We are then two months before the public launch of its other baby, chatGPT, which will attract even more attention… But in parallel with this global success, Whisper will begin to serve as the foundation for a multitude of file conversion applications audio to text.
Let’s take the example of a journalist who is conducting an interview, and who needs to transcribe the audio of the questions and answers: a tedious exercise that can take several hours. Now, thanks to AI, simply import the audio file into an application or software to obtain, in a few seconds, the full transcription of the interview in text form.
The fidelity of this text depends on the audio quality, the quality of the AI and its model: the larger it is, the more memory and power it requires, the more time the processing requires but also the more the transcription is precise. The fidelity of the text finally depends on the language. American AIs are less trained in French than English. And so, yes, there are always errors to be corrected but the time saving is, in all cases, spectacular.
To take advantage of these automatic transcriptions with AI, you can therefore use an application or software. There are dozens that rely either on Whisper or other technologies like IBM’s Watson. For example, on Android, Google Speechnotes; Transcribe on iPhone; on PC, voice recognition integrated into Windows; and on Mac, MacWhisper. In this galaxy, we encounter several economic models: from free to paid, including subscriptions with a quota of minutes of transcription per month, or even translation into other languages.
In all cases, prefer applications that provide local transcription, that is to say without using the Cloud, such as Chuchotis, offered on Mac by Denis Delbecq, former researcher and journalist for the Swiss daily newspaper The weathera colleague who is very attentive to confidentiality and the protection of sensitive information.
And then, there is this accessory, the Plaud Note, launched in Europe last week, during the Viva Technology show in Paris. Imagine a revolutionary aluminum dictaphone, in credit card size, and also thin, magnetically stuck in its case, on the back of your smartphone. Press a button and the Plaud Note records, via microphones and vibration sensors, either the sound around you or your telephone conversation. Switching to recording mode is confirmed by a haptic vibration and the activation of a red diode.
The mobile application then allows you to obtain the transcription, and even a stunning summary, thanks to chatGPT version 4, with several possible formats (conference, course, medical consultation, discussion, etc.). I tried it for a dissertation defense, it was spectacular. The only downside: the use of a still nebulous Cloud. A future update could make it possible to target a Cloud in France, for greater data security, and to choose another AI such as that of the French Mistral AI.