Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory error #22

Open
anarucu opened this issue Jul 28, 2015 · 2 comments
Open

memory error #22

anarucu opened this issue Jul 28, 2015 · 2 comments

Comments

@anarucu
Copy link

anarucu commented Jul 28, 2015

hi everyone,
I use copy-feats binary from kaldi, to convert my ascii features in .ark and .scp
Then I copied all the independent .scp files into a unique one which I called SmallSet0.scp:

SESS0003BLOCKA_06 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_06.ark:18
SESS0003BLOCKA_07 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_07.ark:18
SESS0003BLOCKA_08 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_08.ark:18
SESS0003BLOCKA_09 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_09.ark:18
SESS0003BLOCKA_10 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_10.ark:18
SESS0003BLOCKA_11 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_11.ark:18

Then I tryed to train 4 stacked RBM using run_RBM.py and got the following memory error:

ana@ana-HP-EliteBook-Folio-9470m:~/PDNN/pdnn$ python /home/ana/PDNN/pdnn/cmds/run_RBM.py --train-data "/home/ana/DB/SmallSet0/feat/SmallSet0.scp,partition=600m,stream=true,random=true" --nnet-spec "215:1024:1024:43:1024" --wdir ./ --ptr-layer-number 4 --epoch-number 10 --batch-size 128 --learning-rate 0.08 --gbrbm-learning-rate 0.005 --momentum 0.5:0.9:5 --first_layer_type gb --param-output-file /home/ana/PDNN/Working_dir/rbm.mdl
[2015-07-28 23:06:57.528732] > ... initializing the model
Traceback (most recent call last):
File "/home/ana/PDNN/pdnn/cmds/run_RBM.py", line 62, in
cfg.init_data_reading(train_data_spec)
File "/home/ana/PDNN/pdnn/utils/rbm_config.py", line 65, in init_data_reading
self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
File "/home/ana/PDNN/pdnn/io_func/data_io.py", line 92, in read_dataset
data_reader.initialize_read(first_time_reading = True)
File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 102, in initialize_read
utt_id, utt_mat = self.read_next_utt()
File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 89, in read_next_utt
tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
MemoryError

what did I do wrong?
best regards
ana

@vipular
Copy link

vipular commented Oct 12, 2015

Hi,
I am also getting the same memory error.
The minimal python code is:

from io_func.kaldi_feat import KaldiReadIn
in_scp_file = '/data/raw_mfcc_test.1.scp'
kaldiread = KaldiReadIn(in_scp_file)
utt_number = 0
while True:
    uttid, in_matrix = kaldiread.read_next_utt()
    if uttid == '':
        break

On debugging, I found that in kaldi_feat.py, the following lines:

m, rows = struct.unpack('<bi', ark_read_buffer.read(5))
n, cols = struct.unpack('<bi', ark_read_buffer.read(5))

give rows and cols to be extremely large numbers.
The next line of uses rows*cols to form a numpy array, and hence raises the error.

I am using mac OS X Yosemite 10.10.3

Sincerely,
-Vipul

@a00achild1
Copy link

a00achild1 commented Oct 31, 2016

Hi,
I also have this problem while I was trying to train a simple digits speech recognition by using DNN.
After I got the mfcc features from Kaldi in .scp format, I was trying to use the command below:

run_DNN.py --train-data "./mfcc/raw_mfcc_train.1.scp,partition=600m,random=true" \
           --valid-data "./mfcc/raw_mfcc_test.1.scp,partition=600m,random=true" \
           --nnet-spec "250:1024:1024:1024:1024:1024:10" --wdir ./ \
           --output-format kaldi \
           --lrate "D:0.08:0.5:0.05,0.05:15" \
           --output-file dnn.nnet >& dnn.training.log

But I got the error in log file:

Traceback (most recent call last):
  File "/home/cssp/pdnn-master/cmds/run_DNN.py", line 56, in <module>
    cfg.init_data_reading(train_data_spec, valid_data_spec)
  File "/home/cssp/pdnn-master/utils/network_config.py", line 94, in init_data_reading
    self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
  File "/home/cssp/pdnn-master/io_func/data_io.py", line 92, in read_dataset
    data_reader.initialize_read(first_time_reading = True)
  File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 102, in initialize_read
    utt_id, utt_mat = self.read_next_utt()
  File "/home/cssp/pdnn-master/io_func/kaldi_io.py", line 89, in read_next_utt
    tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
MemoryError

Did anyone has solutions for this?
Thanks,
-a00a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants