Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_gctx Error: segfault from C stack overflow #65

Open
FogatoHub opened this issue Feb 9, 2021 · 13 comments
Open

parse_gctx Error: segfault from C stack overflow #65

FogatoHub opened this issue Feb 9, 2021 · 13 comments

Comments

@FogatoHub
Copy link

FogatoHub commented Feb 9, 2021

Hi, i need help to use the parsing function of cmapR

I was testing the cmapR library by following the tutorial

and i have problems with the function parse_gctx when i try to parse the small 77kb "modzs_n25x50.gctx" file provided with the cmapR library.

ds_path <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
my_ds <- parse_gctx(ds_path)
reading /home/usr/R/x86_64-pc-linux-gnu-library/4.0/cmapR/extdata/modzs_n25x50.gctx
Error: segfault from C stack overflow

same error if i try to parse a subset of the file as described in the tutorial
my_ds_10_columns <- parse_gctx(ds_path, cid=1:10)

I checked my memory usage and it's all set to infinity,

> library(unix)
> rlimit_all() 
$cur
      as     core      cpu     data    fsize  memlock   nofile    nproc    stack 
     Inf        0      Inf      Inf      Inf 67108864     8192    63355  8388608 

$max
      as     core      cpu     data    fsize  memlock   nofile    nproc    stack 
     Inf      Inf      Inf      Inf      Inf 67108864  1048576    63355      Inf 

my operative system is Ubuntu 20.04
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Thank you

@innesbre
Copy link

Same problem here:

> library(cmapR)
> lvl4_data <- parse_gctx("~/Data_LINCS/2020beta/level4_beta_trt_misc_n26428x12328.gctx")
reading ~/Data_LINCS/2020beta/level4_beta_trt_misc_n26428x12328.gctx
Error: segfault from C stack overflow
> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          0.3                         
year           2020                        
month          10                          
day            10                          
svn rev        79318                       
language       R                           
version.string R version 4.0.3 (2020-10-10)
nickname       Bunny-Wunnies Freak Out 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmapR_1.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  XVector_0.30.0             
 [3] GenomicRanges_1.42.0        BiocGenerics_0.36.0        
 [5] zlibbioc_1.36.0             IRanges_2.24.1             
 [7] flowCore_2.2.0              lattice_0.20-41            
 [9] GenomeInfoDb_1.26.2         tools_4.0.3                
[11] SummarizedExperiment_1.20.0 parallel_4.0.3             
[13] grid_4.0.3                  rhdf5_2.34.0               
[15] Biobase_2.50.0              matrixStats_0.58.0         
[17] RcppParallel_5.0.2          Matrix_1.3-2               
[19] GenomeInfoDbData_1.2.4      Rhdf5lib_1.12.1            
[21] cytolib_2.2.1               RProtoBufLib_2.2.0         
[23] rhdf5filters_1.2.0          S4Vectors_0.28.1           
[25] bitops_1.0-6                RCurl_1.98-1.2             
[27] DelayedArray_0.16.1         compiler_4.0.3             
[29] MatrixGenerics_1.2.1        stats4_4.0.3 

@innesbre
Copy link

I used cmapR::parse.gctx() just fine with R4.0.2 last spring, so its must be a relatively new bug.

@RussBainer
Copy link

I'm also catching this bug- any ideas about workarounds?

> ds_10_columns <- parse_gctx(ds_path, cid=1:10, rid= 1:10)
reading GSE92742_Broad_LINCS_Level5_COMPZ.MODZ_n473647x12328.gctx
Error: segfault from C stack overflow

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] cmapR_1.2.1         ckanr_0.6.0         DBI_1.1.1
[4] BiocManager_1.30.10

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  cytolib_2.2.1
 [3] XVector_0.30.0              pillar_1.5.1
 [5] compiler_4.0.3              dbplyr_2.1.0
 [7] GenomeInfoDb_1.26.4         rhdf5filters_1.2.0
 [9] zlibbioc_1.36.0             MatrixGenerics_1.2.1
[11] bitops_1.0-6                tools_4.0.3
[13] rhdf5_2.34.0                lattice_0.20-41
[15] jsonlite_1.7.2              lifecycle_1.0.0
[17] tibble_3.1.0                debugme_1.1.0
[19] pkgconfig_2.0.3             rlang_0.4.10
[21] Matrix_1.3-2                DelayedArray_0.16.2
[23] crul_1.1.0                  curl_4.3
[25] parallel_4.0.3              GenomeInfoDbData_1.2.4
[27] dplyr_1.0.5                 generics_0.1.0
[29] vctrs_0.3.6                 S4Vectors_0.28.1
[31] IRanges_2.24.1              grid_4.0.3
[33] stats4_4.0.3                tidyselect_1.1.0
[35] Biobase_2.50.0              glue_1.4.2
[37] httpcode_0.3.0              R6_2.5.0
[39] fansi_0.4.2                 Rhdf5lib_1.12.1
[41] RProtoBufLib_2.2.0          purrr_0.3.4
[43] magrittr_2.0.1              ellipsis_0.3.1
[45] matrixStats_0.58.0          BiocGenerics_0.36.0
[47] GenomicRanges_1.42.0        assertthat_0.2.1
[49] SummarizedExperiment_1.20.0 flowCore_2.2.0
[51] utf8_1.2.1                  RcppParallel_5.0.3
[53] RCurl_1.98-1.2              crayon_1.4.1

@rajivnarayan
Copy link

I ran into similar segfaulting with R-4.0.3. and the latest cmapR package. Noticed that pre-loading the rhdf5 library before cmapR seems to work for me i.e

library(rhdf5)
library(cmapR)
gctx_file <- system.file('extdata', 'modzs_n25x50.gctx', package='cmapR') 
x <- parse_gctx(gctx_file)

Equivalently rebuilding the cmapR package after adding rhdf5 as a dependency instead of an import in the DESCRIPTION file also works.

Version info:
R-4.0.3
cmapR-1.2.1
rhdf5 2.34.4
rhdf5lib 1.10.1

@tnat1031
Copy link
Contributor

tnat1031 commented Mar 16, 2021 via email

@RussBainer
Copy link

RussBainer commented Mar 16, 2021

Following up, I tried the rhdf5 workaround suggested by @rajivnarayan and I continue to get the segfault, but I notice that I am getting a slightly older version from Bioconductor (2.34.0 vs 2.34.4).

Sadly, the current github build of rhdf5 is erroring out on install, so I haven't been able to test further.

@tnat1031
Copy link
Contributor

tnat1031 commented Mar 17, 2021 via email

@RussBainer
Copy link

Hi Ted,

Here's the error I get from an devtools::install_github() call:

[Many lines of compiler output]

** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘rhdf5’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/site-library/00LOCK-rhdf5/00new/rhdf5/libs/rhdf5.so':
  /usr/local/lib/R/site-library/00LOCK-rhdf5/00new/rhdf5/libs/rhdf5.so: undefined symbol: H5Scombine_select
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/local/lib/R/site-library/rhdf5’
Error: Failed to install 'rhdf5' from GitHub:
  (converted from warning) installation of package ‘/tmp/RtmpscBSTb/file261c4a49497f/rhdf5_2.35.2.tar.gz’ had non-zero exit status

Will try installing into the docker container that you suggest as a workaround. Thanks!

@tnat1031
Copy link
Contributor

tnat1031 commented Mar 18, 2021 via email

@RussBainer
Copy link

RussBainer commented Mar 18, 2021 via email

@tnat1031
Copy link
Contributor

tnat1031 commented Mar 18, 2021 via email

@RussBainer
Copy link

Can confirm that the install and run works as promised in the specified docker container; this is a fine workaround IMHO.

Not sure how interested you are in tracking down architecture-specific bugs, but I'm running this on an AWS EC2 instance:

uname -a
Linux 4.15.0-1045-aws #47-Ubuntu SMP Fri Aug 2 13:50:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@tnat1031
Copy link
Contributor

tnat1031 commented Mar 22, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants