> For the complete documentation index, see [llms.txt](https://irosyadi.gitbook.io/irosyadi/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://irosyadi.gitbook.io/irosyadi/digitalmedia/speech-to-text.md).

# Speech to Text Apps

### Speech to Text Apps

Related links:\
🔗 [Speech to Text Apps](/irosyadi/digitalmedia/speech-to-text.md)\
🔗 [Text to Speech Apps](/irosyadi/digitalmedia/text-to-speech.md)\
🔗 [Speech to Speech (Fake Voice Generator)](/irosyadi/digitalmedia/speech-to-speech.md)

#### Speech to Text

* [DeepSpeech](https://github.com/mozilla/DeepSpeech) : simpler although inferior
* [Kaldi](https://kaldi-asr.org/) : STT supports hybrid NN-HMM and lattice-free MMI models. Kaldi is used by many people both in research and in production.
* [Lingvo](https://github.com/tensorflow/lingvo) is the open source version of Google speech recognition toolkit, with support mostly for end-to-end models.
* [ESPNet](https://github.com/espnet/espnet) is good and well known for end-to-end models as well.
* [RASR](https://github.com/rwth-i6/rasr) + [RETURNN](https://github.com/rwth-i6/returnn) are very good as well, both for end-to-end models and hybrid NN-HMM, but they are for non-commercial applications only (or you need a commercial licence) (disclaimer: I work at the university chair which develops these frameworks).
* <http://gkarsay.github.io/parlatype/>
* <https://github.com/juanerasmoe/pmTrans>
* <https://pythonbasics.org/transcribe-audio/>
* [Wav2Letter](https://github.com/facebookresearch/wav2letter), the tool by Facebook.
* [snakers4/silero-models at mlnews](https://github.com/snakers4/silero-models) Silero Speech to Text
* [coqui](https://github.com/coqui-ai) [Coqui](https://coqui.ai/) STT and TTS
* [voice2json - Command-line tools for speech and intent recognition on Linux](https://voice2json.org/#supported-languages)
* [VOSK Offline Speech Recognition API](https://alphacephei.com/vosk/)
* Dataset
  * English: Tedlium, Librispeech, etc.
  * <https://github.com/gooofy/zamia-speech>
  * <https://commonvoice.mozilla.org/en/datasets>
  * <https://www.openslr.org/resources.php>
* [snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple](https://github.com/snakers4/silero-models)

#### Speech to Text Indonesian Support

* [Voice Notebook](https://voicenotebook.com/)
* [Speech Texter](https://www.speechtexter.com/)
* [Voicenote](http://voicenote.in/)
* [Speechnotes](https://speechnotes.co/)
* [Dictation](https://dictation.io/speech)
* [Dictanote](https://dictanote.co/)
* [oTranscribe](https://otranscribe.com/)
* [Google Web Speech API](https://www.google.com/intl/en/chrome/demos/speech.html)
* [Google Docs Type for your Voice](https://support.google.com/docs/answer/4492226)
  * `Tools` and then `Voice typing`

#### Speech Recognition

* [Wav2vec: Semi and Unsupervised Speech Recognition - Vaclav Kosar’s Blog](https://vaclavkosar.com/ml/Wav2vec2-Semi-and-Unsupervised-Speech-Recognition)
* [The Illustrated Wav2vec - Jonathan Bgn](https://jonathanbgn.com/2021/06/29/illustrated-wav2vec.html)

#### Video Transcriber

* [Transcribe File](https://freesubtitles.ai/)
* [Edit Video Fast | Simon Says](https://www.simonsaysai.com/)
* [Audio/Video Transcription | 99% Accuracy, 12-HR Turnaround](https://scribie.com/)
* [Transcription](https://www.read.ai/transcription)
* [AssemblyAI | #1 API Platform for AI Models](https://www.assemblyai.com/)

### Whisper

* [Talk - GPT-2 meets Whisper in WebAssembly](https://whisper.ggerganov.com/talk/)
* [whisper.cpp : WASM example](https://whisper.ggerganov.com/)
* [Transcribe File](https://freesubtitles.ai/)
*