justRMA and wrong customCDFs causing segfault
Entering edit mode
Last seen 5.9 years ago

Dear developers of justRMA,

thank you for your great work.

We are using your tool in our open source graphical microarray and NGS data analysis environment (called Chipster).

Some time ago we received a help request from a customer who was trying to analyse old CEL-files with newer, wrong CDF-files. We instructed the user to choose the correct CDF-file, but since the error message was a bit curious, I tracked down the problem.  I tested this wrong combination -situation using old HG-133A-files and hugene10sthsentrezgcdf as the CDF. 

 I noticed that this justRMA caused the error (segfault):

custom_cdf = “hugene10sthsentrezgcdf"
dat2 <- justRMA(filenames=list.celfiles(), cdfname=custom_cdf)

The error was repeated when using different .CEL-files and incorrect CDF-files (downloaded and installed from Brainarray). 

I tried debugging (debug()), and was able to track the error inside read.probematrix() and read.celfile.probeintensity.matrices() to this command:

.Call("read_probeintensities", filenames, rm.mask, rm.outliers, 
        rm.extra, ref.cdfName, dim.intensity, verbose, cdfInfo, 
        which, PACKAGE = "affyio")

I tried to look at the parameter values given to the .Call-function. Filenames are ok, but for some reason the ref.cdfName is "HG-U133A" even though I tried to say it is "hugene10sthsentrezgcdf". I don't know at what point the given CDF-file input gets ignored, and I am not sure if the case is happening only when you use arrays with PM and MM -values or when you use wrong combination in general, but it seems to me that the disagreement between these inputs is causing the segfault in .Call. 

Browse[4]> ref.cdfName
[1] "HG-U133A"

Browse[4]> dim.intensity
[1] 712 712

Did I miss something? Maybe some sanity check step + error message would be nice to avoid the segfault?

Thank you in advance!

With best regards,

justrma customCDF segfault • 765 views
Entering edit mode
Last seen 13 minutes ago
United States

Thanks for the report. The problem isn't due to the ref.cdfName, which is extracted from the celfiles directly, the problem arises because the data extracted from the cdf package doesn't match up with the expected dimensions of the array. In other words, if you tell justRMA() that you are using a HuGene 1.0 ST array when in fact it is a HG-U133A array, then the expectation should be that it will fail because these are two different arrays.

However, you are correct that a simple comparison of the array dimensions from the chip and the cdf package could trigger a stoppage along with an error message, which I will add to the package.

Entering edit mode
Daniel ▴ 10
Last seen 22 months ago

It looks that even now this bug is not fixed yet. In my case justRMA still fails with segfault with custom CDF files (even that it works just fine with each individual CEL files).

Entering edit mode

Well, in my case it works fine using the R-3.5.1 and BioC3.8... Using some (9) mouse mogene1.1 arrays in combination with a custom CDF:

> install.packages("http://mbni.org/customcdf/23.0.0/entrezg.download/mogene11stmmentrezgcdf_23.0.0.zip", repos=NULL)
trying URL 'http://mbni.org/customcdf/23.0.0/entrezg.download/mogene11stmmentrezgcdf_23.0.0.zip'
Content type 'application/zip' length 2362544 bytes (2.3 MB)
downloaded 2.3 MB

> library(affy)

> setwd("E:\\location\\of\\myCELfiles")
> data<-justRMA(cdfname = 'mogene11stmmentrezg')
> data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22135 features, 9 samples 
  element names: exprs, se.exprs 
  sampleNames: 70_RV_620_MNC.CEL 71_RV_624_MNC.CEL ...
    78_RV_619_MNMNI.CEL (9 total)
  varLabels: ScanDate
  varMetadata: labelDescription
  sampleNames: 70_RV_620_MNC.CEL 71_RV_624_MNC.CEL ...
    78_RV_619_MNMNI.CEL (9 total)
  varLabels: sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene11stmmentrezg 
> head(exprs(data))
             70_RV_620_MNC.CEL 71_RV_624_MNC.CEL 72_RV_625_MNC.CEL
100009600_at          3.342443          3.700844          3.339561
100009609_at          1.867085          1.972196          1.974900
100009614_at          4.413828          3.952424          3.720021
100012_at             2.230327          2.116131          2.449895
100017_at             7.190408          6.989166          7.277308
100019_at             5.812048          5.938302          6.062205
             73_RV_640_MNMC.CEL 74_RV_641_MNMC.CEL 75_RV_644_MNMC.CEL
100009600_at           3.363132           3.386511           3.530121
100009609_at           1.949646           1.931293           1.949646
100009614_at           3.543611           3.824592           3.684467
100012_at              2.094011           2.080392           1.992252
100017_at              7.603444           7.591256           7.462142
100019_at              6.199659           6.079642           6.062205
             76_RV_615_MNMNI.CEL 77_RV_617_MNMNI.CEL 78_RV_619_MNMNI.CEL
100009600_at            3.386511            3.386511            3.394084
100009609_at            1.949646            2.035164            1.949646
100009614_at            3.603248            3.720021            3.387863
100012_at               2.104808            2.116131            2.303867
100017_at               7.446402            7.662751            7.534519
100019_at               6.062205            5.995572            6.170697
> sessionInfo()
R version 3.5.1 Patched (2018-11-24 r75665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] mogene11stmmentrezgcdf_23.0.0 AnnotationDbi_1.44.0         
[3] IRanges_2.16.0                S4Vectors_0.20.1             
[5] affy_1.60.0                   Biobase_2.42.0               
[7] BiocGenerics_0.28.0          

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0            digest_0.6.18         DBI_1.0.0            
 [4] RSQLite_2.1.1         zlibbioc_1.28.0       affyio_1.52.0        
 [7] blob_1.1.1            preprocessCore_1.44.0 tools_3.5.1          
[10] bit64_0.9-7           bit_1.1-14            compiler_3.5.1       
[13] BiocManager_1.30.4    memoise_1.1.0        

Login before adding your answer.

Traffic: 496 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6