Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when processing (old ?) GENEActiv .bin files #120

Open
cWam-zz opened this issue Jun 2, 2024 · 9 comments
Open

IndexError when processing (old ?) GENEActiv .bin files #120

cWam-zz opened this issue Jun 2, 2024 · 9 comments

Comments

@cWam-zz
Copy link

cWam-zz commented Jun 2, 2024

Hello,
I work with Python 3.8.19 on Windows 10 - 64 bits.

An error appeared after running the following command line to process GENEActiv .bin files, only in some cases: stepcount "E:\file\directory\GENEActiv_file.bin" -o "E:\output\directory"

Here is the output message:
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at GENEActivReader.parseBinFileHeader(GENEActivReader.java:221) at GENEActivReader.main(GENEActivReader.java:75) Reading file... Done! (0.16s) Error: C:\Users\***\AppData\Local\Temp\tmphxr_w9yo\data.npy - Le processus ne peut pas accéder au fichier car ce fichier est utilisé par un autre processus. Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 730, in read data, info = actipy.read_device( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 50, in read_device data, info = _read_device(input_file, verbose) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 220, in _read_device info['StartTime'] = t.iloc[0].strftime(strftime) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1103, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1656, in _getitem_axis self._validate_integer(key, axis) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1589, in _validate_integer raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds

The first error does not matter. It appeared everytime but the files can be processed. However, the IndexError stops the process.
I noted that the error did not appear for recent files (collected in 2023) but it appeared for old files (collected in 2018), even if the devices used to record the data were the same from one year to another.

@chanshing
Copy link
Member

@cWam-zz Hi. Any chance the file was empty? Do you know the size of the file? Is there a way you can share the file for me to debug?

@cWam-zz
Copy link
Author

cWam-zz commented Jun 11, 2024

@chanshing Thank you for your message.
No, files are not empty. File sizes are fom 250 Mo to 780 Mo.
You can find an example of such a file on Zenodo. It is a 260 Mo file.
Thank you.

@chanshing
Copy link
Member

@cWam-zz Thanks! We will investigate and get back to you. In the meantime, a workaround for you could be to first convert your file to a CSV (using GENEActiv's own parser) then use our tool. Sorry for the inconvenience!

@cWam-zz
Copy link
Author

cWam-zz commented Jun 11, 2024

@chanshing Thank you for your suggestion.
I forgot to tell (I don't know if this could help)... I have already read these files using the GENEAread R package without any problems.

@chanshing
Copy link
Member

@chanshing Thank you for your suggestion. I forgot to tell (I don't know if this could help)... I have already read these files using the GENEAread R package without any problems.

Thank you @cWam-zz , maybe you can try exporting your file to CSV using that tool.

@cWam-zz
Copy link
Author

cWam-zz commented Nov 25, 2024

Hello,
I'm sorry to come again in this issue.
I converted a .bin file into a .csv file (1-second epoch) with the following colum names: time, x, y, z. But I'm still facing an issue (probably because of a time format?).
I ran the following command: stepcount "E:\***\XXXXXX_left wrist_047218_2018-10-18 11-10-57.csv"

Here is the full output message:

Gravity calibration... Done! (0.08s) Nonwear detection... Done! (0.21s) Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 711, in read data, info = actipy.process( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 153, in process data, info_resample = P.resample(data, resample_hz) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\processing.py", line 34, in resample pd.Timedelta(pd.infer_freq(data.index)).total_seconds(), File "pandas\_libs\tslibs\timedeltas.pyx", line 1766, in pandas._libs.tslibs.timedeltas.Timedelta.__new__ File "pandas\_libs\tslibs\timedeltas.pyx", line 649, in pandas._libs.tslibs.timedeltas.parse_timedelta_string ValueError: unit abbreviation w/o a number

I uploaded the .csv files I used into the same Zenodo repository.
As explained, for the time column, I used 1) a string and 2) a "POSIXct" "POSIXt" R class before saving into a csv file.
I may miss of forget something for the time?

@chanshing
Copy link
Member

Hi @cWam-zz It seems that the file only has second-level summaries, which would make it impossible for our models to work (requiring at least 15Hz frequency, ideally more - yours is 1Hz).

@cWam-zz
Copy link
Author

cWam-zz commented Dec 5, 2024

Hi @chanshing
Thank you for your clarification.
I have tested with 20Hz data and it seems to work good.

I however noted that it is necessary to have files with 2 or 3 digits precision for seconds at each row. The accepted format can be
_ like this with 2 digits precision:

"time","x","y","z","id"
2018-07-05 09:24:51.00,0.249348574720771,-0.245585232050615,-1.00636620318577
2018-07-05 09:24:51.04,0.1982820704194,-0.206876760234386,-0.923792019604709
2018-07-05 09:24:51.09,0.156381348941352,-0.2442504571604,-1.08361302008417
2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291
2018-07-05 09:24:51.20,0.267680140367417,-0.193529011332239,-0.902482552874114

Or like this with 3 digits precision:
"time","x","y","z"
"2018-07-05 09:24:51.000",0.249348574720771,-0.245585232050615,-1.00636620318577
"2018-07-05 09:24:51.049",0.1982820704194,-0.206876760234386,-0.923792019604709
"2018-07-05 09:24:51.099",0.156381348941352,-0.2442504571604,-1.08361302008417
"2018-07-05 09:24:51.150",0.207447853242723,-0.202872435563742,-0.949097011347291
"2018-07-05 09:24:51.200",0.267680140367417,-0.193529011332239,-0.902482552874114

But I got an error with file contents like:
"time","x","y","z","id"
2018-07-05 09:24:51,0.249348574720771,-0.245585232050615,-1.00636620318577
2018-07-05 09:24:51.05,0.1982820704194,-0.206876760234386,-0.923792019604709
2018-07-05 09:24:51.1,0.156381348941352,-0.2442504571604,-1.08361302008417
2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291
2018-07-05 09:24:51.2,0.267680140367417,-0.193529011332239,-0.902482552874114

Here is the encountered error with this type of file:
Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 705, in read freq = infer_freq(data.index) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 750, in infer_freq tdiff = t.to_series().diff() File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\series.py", line 2870, in diff result = algorithms.diff(self._values, periods) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\algorithms.py", line 1454, in diff out_arr[res_indexer] = op(arr[res_indexer], arr[lag_indexer]) TypeError: unsupported operand type(s) for -: 'str' and 'str'

Thanks again for your help.

@chanshing
Copy link
Member

Thanks @cWam-zz that's a good diagnosis.
Yes, decimals will be needed for >1Hz data (to show the milliseconds).
For the 3rd scenario, I think the problem is that the very first timestamp does not have decimals, so the parser infers that all remaining rows will have no decimals.

The following modification should probably work (note that I added .00 to the first timestamp):

"time","x","y","z","id"
2018-07-05 09:24:51.00,0.249348574720771,-0.245585232050615,-1.00636620318577
2018-07-05 09:24:51.05,0.1982820704194,-0.206876760234386,-0.923792019604709
2018-07-05 09:24:51.1,0.156381348941352,-0.2442504571604,-1.08361302008417
2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291
2018-07-05 09:24:51.2,0.267680140367417,-0.193529011332239,-0.902482552874114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants