# Speech to Text Apps

### Speech to Text Apps

Related links:\
🔗 [Speech to Text Apps](/irosyadi/digitalmedia/speech-to-text.md)\
🔗 [Text to Speech Apps](/irosyadi/digitalmedia/text-to-speech.md)\
🔗 [Speech to Speech (Fake Voice Generator)](/irosyadi/digitalmedia/speech-to-speech.md)

#### Speech to Text

* [DeepSpeech](https://github.com/mozilla/DeepSpeech) : simpler although inferior
* [Kaldi](https://kaldi-asr.org/) : STT supports hybrid NN-HMM and lattice-free MMI models. Kaldi is used by many people both in research and in production.
* [Lingvo](https://github.com/tensorflow/lingvo) is the open source version of Google speech recognition toolkit, with support mostly for end-to-end models.
* [ESPNet](https://github.com/espnet/espnet) is good and well known for end-to-end models as well.
* [RASR](https://github.com/rwth-i6/rasr) + [RETURNN](https://github.com/rwth-i6/returnn) are very good as well, both for end-to-end models and hybrid NN-HMM, but they are for non-commercial applications only (or you need a commercial licence) (disclaimer: I work at the university chair which develops these frameworks).
* <http://gkarsay.github.io/parlatype/>
* <https://github.com/juanerasmoe/pmTrans>
* <https://pythonbasics.org/transcribe-audio/>
* [Wav2Letter](https://github.com/facebookresearch/wav2letter), the tool by Facebook.
* [snakers4/silero-models at mlnews](https://github.com/snakers4/silero-models) Silero Speech to Text
* [coqui](https://github.com/coqui-ai) [Coqui](https://coqui.ai/) STT and TTS
* [voice2json - Command-line tools for speech and intent recognition on Linux](https://voice2json.org/#supported-languages)
* [VOSK Offline Speech Recognition API](https://alphacephei.com/vosk/)
* Dataset
  * English: Tedlium, Librispeech, etc.
  * <https://github.com/gooofy/zamia-speech>
  * <https://commonvoice.mozilla.org/en/datasets>
  * <https://www.openslr.org/resources.php>
* [snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple](https://github.com/snakers4/silero-models)

#### Speech to Text Indonesian Support

* [Voice Notebook](https://voicenotebook.com/)
* [Speech Texter](https://www.speechtexter.com/)
* [Voicenote](http://voicenote.in/)
* [Speechnotes](https://speechnotes.co/)
* [Dictation](https://dictation.io/speech)
* [Dictanote](https://dictanote.co/)
* [oTranscribe](https://otranscribe.com/)
* [Google Web Speech API](https://www.google.com/intl/en/chrome/demos/speech.html)
* [Google Docs Type for your Voice](https://support.google.com/docs/answer/4492226)
  * `Tools` and then `Voice typing`

#### Speech Recognition

* [Wav2vec: Semi and Unsupervised Speech Recognition - Vaclav Kosar’s Blog](https://vaclavkosar.com/ml/Wav2vec2-Semi-and-Unsupervised-Speech-Recognition)
* [The Illustrated Wav2vec - Jonathan Bgn](https://jonathanbgn.com/2021/06/29/illustrated-wav2vec.html)

#### Video Transcriber

* [Transcribe File](https://freesubtitles.ai/)
* [Edit Video Fast | Simon Says](https://www.simonsaysai.com/)
* [Audio/Video Transcription | 99% Accuracy, 12-HR Turnaround](https://scribie.com/)
* [Transcription](https://www.read.ai/transcription)
* [AssemblyAI | #1 API Platform for AI Models](https://www.assemblyai.com/)

### Whisper

* [Talk - GPT-2 meets Whisper in WebAssembly](https://whisper.ggerganov.com/talk/)
* [whisper.cpp : WASM example](https://whisper.ggerganov.com/)
* [Transcribe File](https://freesubtitles.ai/)
*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://irosyadi.gitbook.io/irosyadi/digitalmedia/speech-to-text.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
