Skip to content

Alpe6825/Unicornn

Repository files navigation

Unicornn

Demobild

Content

Training (Jupyter Notebook)

Requirements

pip install -r requirements.txt

Dataset

Download the speech_commands_v0.02 dataset from Warden P. (2018) and unpack it in the Dataset folder.

Dataset
├── data-speech_commands_v0.02
    ├── _background_noise_
    ├── backward
    ...
    └── zero

Warden, P.: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition (april 2018) https://arxiv.org/abs/1804.03209

We split the dataset in a train and test dataset with the ratio 80:20.

DataBalance

To train the model we use the mel-spectogram instead of raw audio for a better feature detection.

Mel

Training Process

Run the jupyter notebook train.ipynb. The section "Params" at the top of the notebook allows certain settings like model type and optimizer type. After the notebook finished the folder Weights contains the trained weights (.pth) and model with trainied weights in the ONNX-format.

Training Course

We compared the VGG19BN and ResNet34 such as SGD and Adam. All plots based on the test dataset

VGG19 + SGD-Optimizer VGG_SGD

VGG19 + Adam-Optimizer (best result) VGG_Adam

ResNet34 + SGD-Optimizer ResNet_SGD

ResNet34 + Adam-Optimizer ResNet_Adam

Training Result

ConfusionMatrix ConfusionMatrix ConfusionMatrix ConfusionMatrix

Pretrained Models

Download pretrained models from here.


Deployment (Unity3D)

[Verified for 2019.4.1f - Windows only]

  • See our video tutorial Unicornn - HowTo for a quick 10 minutes introduction
  • Contains the same information as the following Readme.md

Quick Start

  1. Start PlayMode and wait till monitoring says Python running and circle is green
  • A background python process is initiated (see requirements.txt)
  • librosa
  • numpy
  • datetime
  • pylab
  • PIL
  • numba 0.48
  • You can activate console window by selecting useShell [x] in GameManager -> PythonInterface.cs to see process output
  • Process is terminated automatically after leaving PlayMode
  1. Start voice recording by pressing Start Button
  2. Say two words with silence (± 1 sec) in between
  • Check your Mic level with VU-Meter on the right side after first recording
  • Check Treshold if there's background noise (GameManager -> MicrophoneInput.cs)
  • Possible Words: e.g: Zero [...] Left
Objects Actions
Zero Forward
One Backward
Two Left
Three Right
Four Up
Five Down
Six
Seven
Eight
Nine
  1. Stop recording with Stop Button and wait
  2. Word splitting and processing is done automatically
  • You can see the detected words and their probability next to our Unicornn

Scene Overview

| Unicornn

  • | GameManager
    • | SpeechCommands.cs: translate prediction to action in scene
    • | MicrophoneInput.cs: process microphone input, slice words, use threshold
    • | PythonInterface.cs: run background process (librosa) to create spectograms
  • | Agent
  • | Agent.cs: take .onnx as model and input spectograms from sliced words, find prediction * | SceneStuff
  • | UI elements, buttons and visual elements

Barracuda Setup

General Information

Models for Agent.cs

  • You can exchange our different trained models by using the Button in the lower left corner
    • If you want to use own models drag them to the coressponding field in Agents.cs
    • By default we chose the model with the best results to start with (VGG + Adam)

Audio settings for MicrophoneInput.cs

  • By default we chose a very low threshold to detect silence between words
  • If you have a louder environment it could happen, that a "silent" moment is still above our threshold
    • Only change GameManager -> MicrophoneInput -> Threshold to a bigger value

Python Background Process

  • We process the audio input in Unity3D, save the sliced float arrays as .wav and use librosa to generate mel-spectograms
  • The background python process listens for existing file-names in the project folder and processes them if they exist
  • After processing the process deletes the .wav files for the next iteration
  • The script writes its own pid-ID to a text-file because it is started via cmd.exe
    • To terminate all processes we need the process-ID for all children-processes
  • By checking if process exited we can monitor the status of our background process

process.hasExited()