Question

Making microarray probeset presence/absence calls

0

Entering edit mode

mat149 ▴ 70

@mat149-11450

Last seen 22 days ago

United States

Hello,

I have an inquiry on how to apply the "paCalls" function of the oligo package. I received a wicked error message after running the "paCalls(CELdat, "PSDABG")" line which states:

Computing DABG calls... Error in 0:max(counts) : result would be too long a vector
In addition: Warning message:
In max(counts) : no non-missing arguments to max; returning -Inf

I am unsure how to interpret this message; obviously the resulting vector is really long but how long is too long? Can anyone provide some insight as to what's going on here, and how I might fix it? Thanks in advance. Best,

- Matt

library(oligo)
setwd("C:\\...\\microarray")
CELlist <- list.celfiles("C:\\...\\microarray\\CEL", full.names=TRUE, pattern=NULL, all.file=FALSE)
pdat<-read.AnnotatedDataFrame("C:\\...\\x.txt",header=TRUE,row.name="Filename",sep="\t")
CELdat <- read.celfiles(filenames = CELlist,experimentData=TRUE,phenoData=pdat,verbose=TRUE)

> CELdat
GeneFeatureSet (storageMode: lockedEnvironment)
assayData: 1416100 features, 16 samples 
  element names: exprs 
protocolData
  rowNames: WT1 WT2 ... RS4 (16 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: WT1 WT2 ... RS4 (16 total)
  varLabels: Sample Treatment ... BATCH_GROUP (6 total)
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.zebgene.1.1.st


> paCalls(CELdat, "PSDABG")
Computing DABG calls... Error in 0:max(counts) : result would be too long a vector
In addition: Warning message:
In max(counts) : no non-missing arguments to max; returning -Inf

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] pd.zebgene.1.1.st_3.12.0 DBI_0.7                  RSQLite_2.0             
 [4] oligo_1.40.2             Biostrings_2.44.2        XVector_0.16.0          
 [7] IRanges_2.10.2           S4Vectors_0.14.3         Biobase_2.36.2          
[10] oligoClasses_1.38.0      BiocGenerics_0.22.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12               compiler_3.4.1             BiocInstaller_1.26.1      
 [4] GenomeInfoDb_1.12.2        bitops_1.0-6               iterators_1.0.8           
 [7] tools_3.4.1                zlibbioc_1.22.0            digest_0.6.12             
[10] bit_1.1-12                 memoise_1.1.0              tibble_1.3.3              
[13] preprocessCore_1.38.1      lattice_0.20-35            ff_2.2-13                 
[16] pkgconfig_2.0.1            rlang_0.1.1                Matrix_1.2-10             
[19] foreach_1.4.3              DelayedArray_0.2.7         GenomeInfoDbData_0.99.0   
[22] affxparser_1.48.0          bit64_0.9-7                grid_3.4.1                
[25] blob_1.1.0                 splines_3.4.1              codetools_0.2-15          
[28] matrixStats_0.52.2         GenomicRanges_1.28.4       SummarizedExperiment_1.6.3
[31] RCurl_1.95-4.8             affyio_1.46.0

oligo • 1.0k views

ADD COMMENT • link updated 6.2 years ago by James W. MacDonald 65k • written 6.2 years ago by mat149 ▴ 70

score 0 · Answer 1 · 2018-02-20

First, a note about using R. If you change the working directory to the directory that contains the CEL files that you wish to work with, you no longer need to tell R where they are, because they are in your working directory.

Second, the argument for experimentData is supposed to be a MIAME instance. From ?ExpressionSet (which is what a GeneFeatureSet inherits from):

experimentData: An optional 'MIAME' instance with meta-data (e.g., the
          lab and resulting publications from the analysis) about the
          experiment.

Anyway, to your question. Neither of the zebgene arrays are particularly suited for doing DABG or PADABG, as this requires some probesets that are of the control->bgp->antigenomic type. For the zebgene1.0 array there are just 23 such probesets:

> con <- db(pg.zebgene.1.0.st)
> dim(dbGetQuery(con, "select * from featureSet where type=7;"))
[1] 23 10

and for the 1.1 array there are NO such probesets:

> conn <- db(pd.zebgene.1.1.st)
> dim(dbGetQuery(conn, "select * from featureSet where type=7;"))
[1]  0 10

And if there are no such probesets, you cannot run paCalls.