justRMA and wrong customCDFs causing segfault
2
0
Entering edit mode
@marialehtivaara-9103
Last seen 9.1 years ago
Finland

Dear developers of justRMA,

thank you for your great work.

We are using your tool in our open source graphical microarray and NGS data analysis environment (called Chipster).


Some time ago we received a help request from a customer who was trying to analyse old CEL-files with newer, wrong CDF-files. We instructed the user to choose the correct CDF-file, but since the error message was a bit curious, I tracked down the problem.  I tested this wrong combination -situation using old HG-133A-files and hugene10sthsentrezgcdf as the CDF. 

 I noticed that this justRMA caused the error (segfault):

custom_cdf = “hugene10sthsentrezgcdf"
dat2 <- justRMA(filenames=list.celfiles(), cdfname=custom_cdf)


The error was repeated when using different .CEL-files and incorrect CDF-files (downloaded and installed from Brainarray). 

I tried debugging (debug()), and was able to track the error inside read.probematrix() and read.celfile.probeintensity.matrices() to this command:

.Call("read_probeintensities", filenames, rm.mask, rm.outliers, 
        rm.extra, ref.cdfName, dim.intensity, verbose, cdfInfo, 
        which, PACKAGE = "affyio")


I tried to look at the parameter values given to the .Call-function. Filenames are ok, but for some reason the ref.cdfName is "HG-U133A" even though I tried to say it is "hugene10sthsentrezgcdf". I don't know at what point the given CDF-file input gets ignored, and I am not sure if the case is happening only when you use arrays with PM and MM -values or when you use wrong combination in general, but it seems to me that the disagreement between these inputs is causing the segfault in .Call. 

Browse[4]> ref.cdfName
[1] "HG-U133A"

Browse[4]> dim.intensity
[1] 712 712

Did I miss something? Maybe some sanity check step + error message would be nice to avoid the segfault?

Thank you in advance!

With best regards,
Maria 

justrma customCDF segfault • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 minutes ago
United States

Thanks for the report. The problem isn't due to the ref.cdfName, which is extracted from the celfiles directly, the problem arises because the data extracted from the cdf package doesn't match up with the expected dimensions of the array. In other words, if you tell justRMA() that you are using a HuGene 1.0 ST array when in fact it is a HG-U133A array, then the expectation should be that it will fail because these are two different arrays.

However, you are correct that a simple comparison of the array dimensions from the chip and the cdf package could trigger a stoppage along with an error message, which I will add to the package.

ADD COMMENT
0
Entering edit mode
Daniel ▴ 10
@daniel-6619
Last seen 15 months ago
Finland

It looks that even now this bug is not fixed yet. In my case justRMA still fails with segfault with custom CDF files (even that it works just fine with each individual CEL files).

ADD COMMENT
0
Entering edit mode

Well, in my case it works fine using the R-3.5.1 and BioC3.8... Using some (9) mouse mogene1.1 arrays in combination with a custom CDF:

> install.packages("http://mbni.org/customcdf/23.0.0/entrezg.download/mogene11stmmentrezgcdf_23.0.0.zip", repos=NULL)
trying URL 'http://mbni.org/customcdf/23.0.0/entrezg.download/mogene11stmmentrezgcdf_23.0.0.zip'
Content type 'application/zip' length 2362544 bytes (2.3 MB)
downloaded 2.3 MB

> library(affy)
<<snip>>

> setwd("E:\\location\\of\\myCELfiles")
>
> data<-justRMA(cdfname = 'mogene11stmmentrezg')
>
<<snip>>
> 
> data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22135 features, 9 samples 
  element names: exprs, se.exprs 
protocolData
  sampleNames: 70_RV_620_MNC.CEL 71_RV_624_MNC.CEL ...
    78_RV_619_MNMNI.CEL (9 total)
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: 70_RV_620_MNC.CEL 71_RV_624_MNC.CEL ...
    78_RV_619_MNMNI.CEL (9 total)
  varLabels: sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene11stmmentrezg 
> head(exprs(data))
             70_RV_620_MNC.CEL 71_RV_624_MNC.CEL 72_RV_625_MNC.CEL
100009600_at          3.342443          3.700844          3.339561
100009609_at          1.867085          1.972196          1.974900
100009614_at          4.413828          3.952424          3.720021
100012_at             2.230327          2.116131          2.449895
100017_at             7.190408          6.989166          7.277308
100019_at             5.812048          5.938302          6.062205
             73_RV_640_MNMC.CEL 74_RV_641_MNMC.CEL 75_RV_644_MNMC.CEL
100009600_at           3.363132           3.386511           3.530121
100009609_at           1.949646           1.931293           1.949646
100009614_at           3.543611           3.824592           3.684467
100012_at              2.094011           2.080392           1.992252
100017_at              7.603444           7.591256           7.462142
100019_at              6.199659           6.079642           6.062205
             76_RV_615_MNMNI.CEL 77_RV_617_MNMNI.CEL 78_RV_619_MNMNI.CEL
100009600_at            3.386511            3.386511            3.394084
100009609_at            1.949646            2.035164            1.949646
100009614_at            3.603248            3.720021            3.387863
100012_at               2.104808            2.116131            2.303867
100017_at               7.446402            7.662751            7.534519
100019_at               6.062205            5.995572            6.170697
> 
> sessionInfo()
R version 3.5.1 Patched (2018-11-24 r75665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] mogene11stmmentrezgcdf_23.0.0 AnnotationDbi_1.44.0         
[3] IRanges_2.16.0                S4Vectors_0.20.1             
[5] affy_1.60.0                   Biobase_2.42.0               
[7] BiocGenerics_0.28.0          

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0            digest_0.6.18         DBI_1.0.0            
 [4] RSQLite_2.1.1         zlibbioc_1.28.0       affyio_1.52.0        
 [7] blob_1.1.1            preprocessCore_1.44.0 tools_3.5.1          
[10] bit64_0.9-7           bit_1.1-14            compiler_3.5.1       
[13] BiocManager_1.30.4    memoise_1.1.0        
> 
>
ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6