An application that use mozilla deepspeech to perform Speech-To-Text searches on a given corpus.
It uses mozilla DeepSpeech and the Node.JS / Electron.JS package with the Italian Model for speech-to-text extraction,
and Natural for text classification.
This project was carried out during the Mozilla Italia Developer Contest, the purpose of the app is to try to guess which Fabrizio De André song you are singing (or, better, reading).
The app streams the microphone audio from the browser to a NodeJS server (using socket.io) where DeepSpeech will read the buffer and a classifier will classify the DeepSpeech result.
You can find out more about the corpus here while here you can see how to train and load the classifier.
At the moment, for text extraction, only 2 (really simple) algorithms are supported : Levenshtein Distance (slower but more accurate) and Bayes Classification (faster but less accurate).
Live demo (may be offline) : https://deepspeech.czzncl.dev/
- Use Levenshtein Distance : wether using Levenshtein Distance or Bayes for text classification.
- Show Recognized Text : wether show or not deepSpeech Recognized text
- Show Debug Window : wether show or not a debug windows that contains some information about the app status and payload.
- Stop recording on recognition : wether stop speech-to-text and text classification once a sentence is recognized and classified
- Show matched sentence (instead that song title)* : Show the matched sentence (of the song) for the current recognized speech-to-text
*Only with Levenshtein
The easiest way to install the app is using docker :
git clone https://github.com/nicosh/faber-js.git
cd faber-js
docker build --tag "faberjs" .
docker run -p 3000:3000 faberjs
Please note that you need python and sox, sox need to be added to PATH (on windows) see https://github.com/JoFrhwld/FAVE/wiki/Sox-on-Windows and https://github.com/nodejs/node-gyp#installation
clone the repo :
git clone https://github.com/nicosh/faber-js.git
Also you need to manually download the models from https://github.com/MozillaItalia/DeepSpeech-Italian-Model/releases/download/2020.08.07/transfer_model_tensorflow_it.tar.xz
and move them inside app/DeepSpeech/models
Then:
cd faber-js/app
npm install
Build for production:
npm run build
npm run start
or run in dev mode
npm run dev
At the moment seems that voice recognition using .wav files have better performance compared to voice recognition using live streaming.
Some simple test using static files can be found here.
To compare the results is enough to run the app and try to pronounce the same sentences as the files above or just play the files and record the speakers with the microphone.
Some things to investigate :
- Maybe live streaming need a better configuration / fine tuning
- Microphone noise / surrounding noise affects too much speech-to-text extraction
- Live streaming performances with good quality microphones vs cheap / lowquality