Question

oligo rma() background normalization crash in basicRMA()

0

Entering edit mode

Steffen Moeller ▴ 90

@steffen-moeller-2412

Last seen 2.7 years ago

Germany

Hello,

I ran into a problem with the background normalization of clariom D arrays

> dat <- read.celfiles(list.celfiles())
> dat
HTAFeatureSet (storageMode: lockedEnvironment)
assayData: 6892960 features, 18 samples 
  element names: exprs 
protocolData
  rowNames: P_ADMSC_C1.CEL P_ADMSC_C2.CEL ... P_FB.CEL (18 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: P_ADMSC_C1.CEL P_ADMSC_C2.CEL ... P_FB.CEL (18 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.clariom.d.human

If not activating the background correction then everything is fine:

> 
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 138745 features, 18 samples 
  element names: exprs 
protocolData
  rowNames: P_ADMSC_C1.CEL P_ADMSC_C2.CEL ... P_FB.CEL (18 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: P_ADMSC_C1.CEL P_ADMSC_C2.CEL ... P_FB.CEL (18 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.clariom.d.human

but when correcting the background, I get this crash:

> eset.bg.normalized <- rma(dat,target="core",background=T,normalize=T)
Background correcting

 *** caught segfault ***
address 0x562729e52000, cause 'memory not mapped'

Traceback:
 1: basicRMA(pms, pnVec, normalize, background)
 2: .local(object, ...)
 3: rma(dat, target = "core", background = T, normalize = T)
 4: rma(dat, target = "core", background = T, normalize = T)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
R is aborting now ...
Segmentation fault (core dumped)

Bisecting through the input data, I have now found the culprit .CEL file. With that one excluded, all remaining 17 are background-normalizing and I can trigger the crash with that single .CEL file.

What shall I do to chase (or to help others chasing) this up?

There are no infinite values (tested as seen suggested for rma of oligo feature set crashes R.

Many thanks! Steffen

My session:


R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/sm718/miniconda3/lib/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] pd.clariom.d.human_3.14.1 DBI_1.1.3
 [3] RSQLite_2.2.20            oligo_1.62.2
 [5] Biostrings_2.66.0         GenomeInfoDb_1.34.8
 [7] XVector_0.38.0            IRanges_2.32.0
 [9] S4Vectors_0.36.0          Biobase_2.58.0
[11] oligoClasses_1.60.0       BiocGenerics_0.44.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10                 compiler_4.2.2
 [3] BiocManager_1.30.19         MatrixGenerics_1.10.0
 [5] bitops_1.0-7                iterators_1.0.14
 [7] tools_4.2.2                 zlibbioc_1.44.0
 [9] bit_4.0.5                   memoise_2.0.1
[11] preprocessCore_1.60.2       lattice_0.20-45
[13] ff_4.0.9                    pkgconfig_2.0.3
[15] rlang_1.0.6                 Matrix_1.5-3
[17] foreach_1.5.2               cli_3.6.0
[19] DelayedArray_0.24.0         fastmap_1.1.0
[21] GenomeInfoDbData_1.2.9      affxparser_1.70.0
[23] vctrs_0.5.2                 bit64_4.0.5
[25] grid_4.2.2                  blob_1.2.3
[27] codetools_0.2-19            matrixStats_0.63.0
[29] GenomicRanges_1.50.0        splines_4.2.2
[31] SummarizedExperiment_1.28.0 RCurl_1.98-1.10
[33] cachem_1.0.6                crayon_1.5.2
[35] affyio_1.68.0

basicRMA crash rma oligo • 1.9k views

ADD COMMENT • link 2.7 years ago Steffen Moeller ▴ 90

score 0 · Answer 1 · 2023-02-24

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

My guess would be that you have one or more bad files. But without having access, I am not sure how anybody can help. If I just get some random files from GEO it works for me.

> getGEOSuppFiles("GSE213056")
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE213nnn/GSE213056/suppl//GSE213056_RAW.tar?tool=geoquery'
Content type 'application/x-tar' length 515471360 bytes (491.6 MB)
downloaded 491.6 MB

> setwd("GSE213056")
> untar(dir())
> library(oligo)

> dat <- read.celfiles(list.celfiles(listGzipped = TRUE))
Loading required package: pd.clariom.d.human
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM6570825_091219PD12.CEL.gz
Reading in : GSM6570826_091219PD11.CEL.gz
Reading in : GSM6570827_091219PD10.CEL.gz
Reading in : GSM6570828_091219PD09.CEL.gz
Reading in : GSM6570829_091219PD08.CEL.gz
Reading in : GSM6570830_091219PD07.CEL.gz
Reading in : GSM6570831_092519PD_24.CEL.gz
Reading in : GSM6570832_092519PD_23.CEL.gz
Reading in : GSM6570833_092519PD_22.CEL.gz
Reading in : GSM6570834_092519PD_20.CEL.gz
Reading in : GSM6570835_092519PD_19.CEL.gz
Reading in : GSM6570836_091219PD06.CEL.gz
Reading in : GSM6570837_091219PD05.CEL.gz
Reading in : GSM6570838_091219PD04.CEL.gz
Reading in : GSM6570839_091219PD03.CEL.gz
Reading in : GSM6570840_091219PD02.CEL.gz
Reading in : GSM6570841_091219PD01.CEL.gz
Reading in : GSM6570842_092519PD_18.CEL.gz
Reading in : GSM6570843_092519PD_17.CEL.gz
Reading in : GSM6570844_092519PD_16.CEL.gz
Reading in : GSM6570845_092519PD_14.CEL.gz
Reading in : GSM6570846_092519PD_13.CEL.gz
> eset <- rma(dat)
Background correcting
Normalizing
Calculating Expression
>

ADD COMMENT • link 2.7 years ago James W. MacDonald 68k

0

Entering edit mode

I forgot this...

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] pd.clariom.d.human_3.14.1 DBI_1.1.3                
 [3] RSQLite_2.2.20            oligo_1.62.2             
 [5] Biostrings_2.66.0         GenomeInfoDb_1.34.9      
 [7] XVector_0.38.0            IRanges_2.32.0           
 [9] S4Vectors_0.36.1          oligoClasses_1.60.0      
[11] GEOquery_2.66.0           Biobase_2.58.0           
[13] BiocGenerics_0.44.0      

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.28.0 tidyselect_1.2.0           
 [3] purrr_1.0.1                 splines_4.2.2              
 [5] lattice_0.20-45             vctrs_0.5.2                
 [7] generics_0.1.3              utf8_1.2.2                 
 [9] blob_1.2.3                  rlang_1.0.6                
[11] pillar_1.8.1                glue_1.6.2                 
[13] bit64_4.0.5                 affyio_1.68.0              
[15] matrixStats_0.63.0          GenomeInfoDbData_1.2.9     
[17] foreach_1.5.2               lifecycle_1.0.3            
[19] zlibbioc_1.44.0             MatrixGenerics_1.10.0      
[21] memoise_2.0.1               codetools_0.2-18           
[23] ff_4.0.9                    fastmap_1.1.0              
[25] tzdb_0.3.0                  curl_5.0.0                 
[27] fansi_1.0.4                 preprocessCore_1.60.2      
[29] Rcpp_1.0.10                 readr_2.1.3                
[31] BiocManager_1.30.19         cachem_1.0.6               
[33] limma_3.54.1                DelayedArray_0.23.2        
[35] affxparser_1.70.0           bit_4.0.5                  
[37] hms_1.1.2                   dplyr_1.1.0                
[39] GenomicRanges_1.50.2        grid_4.2.2                 
[41] cli_3.6.0                   tools_4.2.2                
[43] bitops_1.0-7                magrittr_2.0.3             
[45] RCurl_1.98-1.10             tibble_3.1.8               
[47] crayon_1.5.2                tidyr_1.3.0                
[49] pkgconfig_2.0.3             ellipsis_0.3.2             
[51] Matrix_1.5-1                data.table_1.14.6          
[53] xml2_1.3.3                  iterators_1.0.14           
[55] R6_2.5.1                    compiler_4.2.2             
>

ADD REPLY • link 2.7 years ago James W. MacDonald 68k

0

Entering edit mode

I tried the same code on another local project of ours (with 52 files) and it worked flawlessly, just like it did for the remaining 17 files of the same project. I obviously cannot exclude the possibility that there is a problem with that one file, I just do not have another one, and would happily send it to you, if you allow - 24M gzipped. The normalization without background correction worked, so I have some hope left that it is not a complete disaster.

Thank you tons.

Steffen

================================================================================
Welcome to oligo version 1.62.2
================================================================================
> dat <- read.celfiles("P_DFSC_ctrl.CEL")
Lade nötiges Paket: pd.clariom.d.human
Lade nötiges Paket: RSQLite
Lade nötiges Paket: DBI
Platform design info loaded.
Reading in : P_DFSC_ctrl.CEL
> eset <- rma(dat)
Background correcting

 *** caught segfault ***
address 0x147000000, cause 'invalid permissions'

Traceback:
 1: basicRMA(pms, pnVec, normalize, background)
 2: .local(object, ...)
 3: rma(dat)
 4: rma(dat)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

ADD REPLY • link 2.7 years ago Steffen Moeller ▴ 90

0

Entering edit mode

I don't think it's a bug (the code for processing the arrays was written back in the early 2000's by Ben Bolstad and has been used approximately a gazillion times since then), so there's really only two choices.

Process without using background correction (something that Ben actually argued for, way back in the day).
Remove that sample

Or alternatively I suppose you could use bgversion = 1 and see if that helps. Trying to diagnose the problem for one array is probably not useful - it's obviously that one array, and it's not clear you could fix it anyway - so I would make a choice and go forward.