Convert speech to text
The Converting speech to text can be done manually or automatically. The Manual transcription is still clearly superior to automatic transcription in terms of quality.. However, automatic processing of speech to text also has many advantages. The biggest advantage is the speed with which speech is converted into writing.
Manually converting or transcribing an interview into text can take a lot of time. By hand, depending on the typing speed, about 3 - 7 times the number of audio minutes is needed. With an appropriate "speech to text" programme, this can be done within a few minutes to seconds without any effort.
The book is available as a free download: Find out everything about Transcription & Co now!
Automatically convert language to text
With the help of artificial intelligence (AI), it is possible to Converting audio files automatically into text. There are now many programmes that convert speech or audio files into text. The best-known providers include Google (Speech-to-Text), Apple (Siri), Amazon (Alexa) and Microsoft (Cortana); less prominent are Voicedocs and EML. For the most part, the programmes can process files in the common audio formats (MP3 and WAV).
Other file formats or even video files can be converted online or with special programmes (e.g. with the Online Audio Converter or with the VLC Media Player). In the case of automatic conversion from speech to text, the files are usually stored temporarily. With regard to data protection, one should therefore inform oneself in advance with the individual providers.
Study on converting speech to text / German
In a detailed study we tested the performance of programmes that convert German audio to text and compared the results of the six programmes mentioned. In the German language, the lesser-known programmes for speech to text from the providers Voicedocs and EML in particular performed best in many categories. We are happy to provide this study on request.
The The quality of the transcripts produced is currently still heavily dependent on the audio files.i.e. the number of speakers, the recording conditions (quiet or noisy environment), the vocabulary (simple or specialised) and deviations from standard speech (accents or dialects). Under perfect conditions, automatic speech recognition can already achieve acceptable results; with any restriction (e.g. as few as two speakers), the quality of the conversion of speech into text drops significantly.
The quality of automatic language to text programmes is highly variable for German. One should always do a test in advance.
To avoid unpleasant surprises, it is advisable to first make a test transcript when automatically transcribing from speech to text. This is possible with us free of charge. Without any obligations you can Free sample transcript of the first 2 minutes of your file receive a sample transcript. All you have to do is send us your file and you will receive the sample transcript and detailed information about the result. Click here for the order form:
All transcripts created with AI are checked manually by us. During the follow-up check, gross errors are corrected and the speech contributions are assigned to the individual speakers. In general, most speech-to-text programmes do not yet reliably assign speakers. This must be done manually. A subsequent correction is therefore necessary even with the best programme and the best quality.
All in all, the effort for automatic transcription from speech to text is thus still very great and can only be recommended in cases where the requirements for the material to be transcribed (good audio quality, preferably only one speaker, no dialect) are met. Thus, consistently good and reliable quality and correctness of transcripts can still only be achieved through manual speech-to-text editing. Our manually created transcripts generally have a quality level of at least 97% and are thus significantly more correct than any transcript created using AI.
Further questions and answers
For the conversion of Speech to text there are basically two methods:
With the automatic speech recognition a machine converts the spoken word into text. For recordings with one person without dialect and background noise, this already works reasonably well. With several speakers, the quality is currently mediocre at best.
With manual transcription, a human types the voice recording. The manual Transcribe still achieves a much higher quality than machine recording.
For the automatic conversion of Speech to text there are a number of providers where this is Speech recognition partly offered free of charge. However, the quality is nowadays only mediocre for recordings with more than one speaker.
For manual transcription there are a number of Transcription services and typing agencies such as the German market leader typist.com.
There is a whole range of providers, some of which offer free systems for automatic Speech recognition offer.