MemoryError in extracting binary ark file #52

asadullah797 · 2018-08-16T14:41:34Z

I have scp file with corresponding ark file, whenever I want to use class kaldi_feat.py, using read_next_utt() function I get the following error:
MemoryError
I backtraced the error and obtained the following information of error:

tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
Traceback (most recent call last):
File "", line 1, in
MemoryError
Please note values of rows and cols are (1254238158, 5784298), their * becomes very big number.

MaigoAkisame · 2018-08-16T15:23:07Z

It looks like the format of the scp or ark file is corrupted: the program has read very large numbers of rows and columns.

You may need to provide the scp and ark file for me to debug it...

asadullah797 · 2018-08-16T20:47:58Z

Please find the attached scp with ark file for your information.
I have tried two scp files and two ark files but got big values of rows and columns.

Untitled Folder.tar.gz
Untitled Folder.tar.gz

Thanks

MaigoAkisame · 2018-08-17T04:45:26Z

I see that your ark files are in the compressed format, which is not supported by PDNN.
I would suggest that you dump the scp and ark files in uncompressed binary format, and try again.

asadullah797 · 2018-08-17T07:38:13Z

No no, actually I could not attach files here in github thread as usual then I compressed the files but for experiment I am using the original uncompressed files.

MaigoAkisame · 2018-08-17T14:23:04Z

I did decompress your zip archives. What I mean is, in your .ark files, the matrices are stored in a compressed format, which is not recognized by PDNN.

Here's the content of one of your .ark files:

The highlighted four bytes are the header of a matrix.
"\0BCM" means "binary compressed matrix".
The format accepted by PDNN is "\0BFM", which stands for "binary float matrix".

asadullah797 · 2018-08-17T14:50:18Z

Alrigtht, you are right,
But could you please suggest me how can I export Kaldi features to python (maybe tensorflow).
As far as I know PDNN convert kaldi features to python and these files are exactly kaldi MFCC features.
If there is any changes required, could you please suggest me.
Many thanks

MaigoAkisame · 2018-08-17T14:59:21Z

You can use either the kaldi_feat.py script in PDNN, or the readArk / readScp functions in the following script:
https://github.com/MaigoAkisame/fileutils/blob/master/kaldi.py
to load Kaldi features into Python.

Afterwards, you can arrange them into any shapes required by Tensorflow.

Unfortunately neither script supports the compressed matrix format, so you'll have to dump the features into the BFM format first.

asadullah797 · 2018-08-17T15:40:55Z

Thanks for your information.
usually:
copy-feats ark:abc.ark ark,t:a.txt
is used to convert ark binary file to correspoding text format but I don't know how to convert text ark to binary float matrix in order to use it in kaldi_feat.py (PDNN).

MaigoAkisame · 2018-08-17T16:10:04Z

copy-feats scp:1.scp ark,scp:2.ark,2.scp

This should do the job. If you don't specify the text format, by default the output will be in uncompressed binary format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError in extracting binary ark file #52

MemoryError in extracting binary ark file #52

asadullah797 commented Aug 16, 2018

MaigoAkisame commented Aug 16, 2018

asadullah797 commented Aug 16, 2018

MaigoAkisame commented Aug 17, 2018

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018 •

edited

Loading

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018 •

edited

Loading

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018

MemoryError in extracting binary ark file #52

MemoryError in extracting binary ark file #52

Comments

asadullah797 commented Aug 16, 2018

MaigoAkisame commented Aug 16, 2018

asadullah797 commented Aug 16, 2018

MaigoAkisame commented Aug 17, 2018

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018 • edited Loading

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018 • edited Loading

asadullah797 commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018

MaigoAkisame commented Aug 17, 2018 •

edited

Loading

MaigoAkisame commented Aug 17, 2018 •

edited

Loading