You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.
I'm playing with Pocketsphinx for few days and was curious about the differences in behavior of the Python library vs. the available executables (pocketsphinx_continuous, pocketsphinx_batch).
I have enabled the Verbose flag for the Python version and adapted the 3 fields that were different from the logs I got from the mentioned executables (vad_threshold, kws_threshold, allphone_ci). My expectations were that the outputs of my python code below will match to one of the outputs generated by the bash scripts I call the executables from, but that doesn't happen.
Could you please give me some hints what else is different, what is the reason of these differences? The audio files used for all the programs are the same and all are mono, 16kHz 16-bit signed little-endian.
(Switching the ps.decode() arguments: no_search = True has no effect on the output, full_utt = True then doesn't produce any output at all. Where can I find what exactly do these two flags mean?)
Below I'm attaching the codes and files with the corresponding transcription outputs and configuration logs.
importosfromosimportpath, listdirfrompocketsphinximportPocketsphinx, get_model_pathimportsoxmodel_path=get_model_path()
config= {
# using the default values - see https://pypi.org/project/pocketsphinx/'hmm': os.path.join(model_path, 'en-us'),
'lm': os.path.join(model_path, 'en-us.lm.bin'),
'dict': os.path.join(model_path, 'cmudict-en-us.dict'),
'sampling_rate': 16000,
'verbose': True,
# with following configs, the settings should exactly match what we can reach with the wrapped scripts'vad_threshold': 2.0,
'kws_threshold': 1.0,
'allphone_ci': False
}
ps=Pocketsphinx(**config)
# path to the directory where the .wav's are storeddirectory="../my_records/jindra/converted"out_hyp_file_path="./python_output_github.hyp"out_hyp_file=open(out_hyp_file_path, "w")
file_list=os.listdir(directory)
# sort the list by alphabet (default order is "arbitrary") to obtain outputs diff-able with outputs of pocketsphinx_batchfile_list.sort()
forentryinfile_list:
entry_file=os.path.join(directory, entry)
if(os.path.isfile(entry_file) and (entry[-4:] ==".wav")):
ps.decode(audio_file=entry_file, buffer_size=2048, no_search=False, full_utt=False)
hypothesis=ps.hypothesis()
# format similar to outputs of pocketsphinx_batchout_hyp_file.write(hypothesis+" ("+entry[:-4] +")\n")
out_hyp_file.close()
# !bin/bash# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")
curr_dir=$(pwd)cd$1
out_file=output_continuous.hyp
iftest -f "$out_file";then
rm $out_filefiforfin*.wav
do
hyp=$(pocketsphinx_continuous -infile $f \ -hmm "${model_dir}/en-us" \ -lm "${model_dir}/en-us.lm.bin" \ -dict "${model_dir}/cmudict-en-us.dict" \ -samprate 16000 \)
f_name=$(basename $f .wav)# this shall give similar output format as pocketsphinx_batch, so we can simply diff itecho"${hyp} (${f_name})">>$out_filedonecd$curr_dir
# !bin/bash# make sure you're running from .venv where your pocketsphix is installed
model_dir=$(python3 -c "from pocketsphinx import get_model_path; print(get_model_path())")
curr_dir=$(pwd)cd$1
ctl_filename="ctlfile.txt"# there's no -q flag for rm, so do it this way?iftest -f "$ctl_filename";then
rm $ctl_filenamefiforfin*.wav
doecho$(basename $f .wav)>>$ctl_filenamedone# The adcin seems to be important here# https://cmusphinx.github.io/wiki/tutorialtuning/
pocketsphinx_batch -adcin yes \
-cepdir . \
-cepext .wav \
-ctl $ctl_filename \
-hmm "${model_dir}/en-us" \
-lm "${model_dir}/en-us.lm.bin" \
-dict "${model_dir}/cmudict-en-us.dict" \
-samprate 16000 \
-hyp output_batch.hyp
cd$curr_dir
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello all,
I'm playing with Pocketsphinx for few days and was curious about the differences in behavior of the Python library vs. the available executables (pocketsphinx_continuous, pocketsphinx_batch).
I have enabled the Verbose flag for the Python version and adapted the 3 fields that were different from the logs I got from the mentioned executables (
vad_threshold
,kws_threshold
,allphone_ci
). My expectations were that the outputs of my python code below will match to one of the outputs generated by the bash scripts I call the executables from, but that doesn't happen.Could you please give me some hints what else is different, what is the reason of these differences? The audio files used for all the programs are the same and all are mono, 16kHz 16-bit signed little-endian.
(Switching the
ps.decode()
arguments:no_search = True
has no effect on the output,full_utt = True
then doesn't produce any output at all. Where can I find what exactly do these two flags mean?)Below I'm attaching the codes and files with the corresponding transcription outputs and configuration logs.
Python code (corresponding attachments: python_output_tuned.hyp.txt, python_tuned.log.txt):
Pocketsphinx_cont_wrapper.sh (output_continuous.hyp.txt,
continuous.log.config.txt):
pocketsphinx_batch_wrapper.sh (output_batch.hyp.txt,
batch.log.config.txt):
The text was updated successfully, but these errors were encountered: