I am attempting to use the mas5calls function to get present/absent calls on a large collection of microarray data that I am attempting to analyze. I will admit I am new to this type of analysis and am looking for some direction to correct the error I am receiving. My error is produced as follows:
> callsESet <- mas5calls(affy_liver_qc)
Getting probe level data...
Computing p-values
Error in `colnames<-`(`*tmp*`, value = c("95686.CEL", "95703.CEL", "95746.CEL", :
length of 'dimnames' [2] not equal to array extent
What does this error signify and how can I correct it? What is the package telling me I need to fix?
The most likely answer is that the files you are trying to analyze are PM-only arrays, in which case it isn't possible to compute conventional MAS5.0 calls. I believe the xps package will allow you to estimate something similar for PM-only arrays, and both xps and oligo offer dabg and psdabg calls (paCalls() in oligo), which are supposed to be equivalent measures for PM-only arrays.
Thanks for your answer, James. My data all comes from the Affymetrix rat genome 230 2.0 array, and multiple papers have indicated that the arrays have been examined for present, marginal, or absent calls. Is the method I am using not suited to these PMA arrays?
in my separate project for testing, I have imported my affy_liver_qc affybatch value. I was able to run
mas5calls(affy_liver_qc)
and received no errors. I switched to my working directory project, ran:
> callsESet<-mas5calls(affy_liver_qc)
Getting probe level data...
Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
Computing p-values
Error in `colnames<-`(`*tmp*`, value = c("95686.CEL", "95703.CEL", "95746.CEL", :
length of 'dimnames' [2] not equal to array extent
You probably have a .Rdata file in your working directory that has something in it that is messing things up, so maybe you should try starting R using --vanilla so you don't load that in. In addition, it seems like you have the wrong versions of Bioconductor packages for that version of R. You should upgrade, following the instructions here.
Thanks for your answer, James. My data all comes from the Affymetrix rat genome 230 2.0 array, and multiple papers have indicated that the arrays have been examined for present, marginal, or absent calls. Is the method I am using not suited to these PMA arrays?
No, those arrays are PM/MM arrays, so it should work. Is 'affy_liver_qc' an AffyBatch?
In environment - values in RStudio, 'affy_liver_qc' is annotated as a 'Large AffyBatch (1.2 Mb)
I can't reproduce your error:
> library(GEOquery) > getGEOSuppFiles("GSE31668") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE31nnn/GSE31668/suppl/ trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE31nnn/GSE31668/suppl//GSE31668_RAW.tar' downloaded 17.1 MB trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE31nnn/GSE31668/suppl//filelist.txt' downloaded 452 bytes > setwd("GSE31668/") > untar("GSE31668_RAW.tar") > library(affy) > abatch <- ReadAffy() > abatch AffyBatch object size of arrays=834x834 features (19 kb) cdf=Rat230_2 (31099 affyids) number of samples=6 number of genes=31099 annotation=rat2302 notes= > mcalls <- mas5calls(abatch) Getting probe level data... Computing p-values Making P/M/A Calls > mcalls ExpressionSet (storageMode: lockedEnvironment) assayData: 31099 features, 6 samples element names: exprs, se.exprs protocolData sampleNames: GSM786182_DAPT1.CEL.gz GSM786183_DAPT2.CEL.gz ... GSM786187_DMSO3.CEL.gz (6 total) varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: GSM786182_DAPT1.CEL.gz GSM786183_DAPT2.CEL.gz ... GSM786187_DMSO3.CEL.gz (6 total) varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: rat2302 > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rat2302cdf_2.18.0 affy_1.48.0 GEOquery_2.36.0 [4] Biobase_2.30.0 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] IRanges_2.4.3 XML_3.98-1.3 bitops_1.0-6 [4] DBI_0.3.1 stats4_3.2.2 RSQLite_1.0.0 [7] BiocInstaller_1.20.1 zlibbioc_1.16.0 affyio_1.40.0 [10] S4Vectors_0.8.3 preprocessCore_1.32.0 tools_3.2.2 [13] RCurl_1.95-4.7 compiler_3.2.2 AnnotationDbi_1.32.0 >Can you get what I just did to work?
I replicated your testing above and it appears to work.
> library(GEOquery) Setting options('download.file.method.GEOquery'='curl') > getGEOSuppFiles("GSE31668") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE31nnn/GSE31668/suppl/ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 17.0M 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 17.0M 0 9576 0 0 22679 0 0:13:09 --:--:-- 0:13:09 116k 46 17.0M 46 8080k 0 0 5765k 0 0:00:03 0:00:01 0:00:02 7630k 93 17.0M 93 15.9M 0 0 6788k 0 0:00:02 0:00:02 --:--:-- 7914k100 17.0M 100 17.0M 0 0 6886k 0 0:00:02 0:00:02 --:--:-- 7957k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 452 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0113 452 113 452 0 0 1155 0 --:--:-- --:--:-- --:--:-- 9416 > setwd("~/GSE31668") > untar("GSE31668_RAW.tar") > library(affy) >abatch AffyBatch object size of arrays=834x834 features (19 kb) cdf=Rat230_2 (31099 affyids) number of samples=6 number of genes=31099 annotation=rat2302 notes= abatch <- ReadAffy() > mcalls <- mas5calls(abatch) Getting probe level data... Computing p-values Making P/M/A Calls > mcalls ExpressionSet (storageMode: lockedEnvironment) assayData: 31099 features, 6 samples element names: exprs, se.exprs protocolData sampleNames: GSM786182_DAPT1.CEL.gz GSM786183_DAPT2.CEL.gz ... GSM786187_DMSO3.CEL.gz (6 total) varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: GSM786182_DAPT1.CEL.gz GSM786183_DAPT2.CEL.gz ... GSM786187_DMSO3.CEL.gz (6 total) varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: rat2302 > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rat2302cdf_2.16.0 GEOquery_2.34.0 BiocInstaller_1.18.5 affy_1.46.1 [5] Biobase_2.28.0 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] IRanges_2.2.9 XML_3.98-1.3 bitops_1.0-6 GenomeInfoDb_1.4.3 [5] DBI_0.3.1 stats4_3.2.2 RSQLite_1.0.0 zlibbioc_1.14.0 [9] affyio_1.36.0 S4Vectors_0.6.6 preprocessCore_1.30.0 tools_3.2.2 [13] RCurl_1.95-4.7 AnnotationDbi_1.30.1I'm able to reproduce what you accomplished above.
in my separate project for testing, I have imported my affy_liver_qc affybatch value. I was able to run
and received no errors. I switched to my working directory project, ran:
> callsESet<-mas5calls(affy_liver_qc) Getting probe level data... Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’ Computing p-values Error in `colnames<-`(`*tmp*`, value = c("95686.CEL", "95703.CEL", "95746.CEL", : length of 'dimnames' [2] not equal to array extentI am unsure as to how to proceed.
You probably have a .Rdata file in your working directory that has something in it that is messing things up, so maybe you should try starting R using --vanilla so you don't load that in. In addition, it seems like you have the wrong versions of Bioconductor packages for that version of R. You should upgrade, following the instructions here.