Question: Trouble annotating HTA microarray
0
gravatar for giroudpaul
3.1 years ago by
giroudpaul40
France
giroudpaul40 wrote:

Dear bioconductor members,

I finally got my hands on results of HTA 2.0 microarray experiments, and I started processing them using the standard methods.

Reading CEL files, and performing RMA doesn't pose problems, for RMA I used oligo this way :

> data.rma = oligo::rma(data, background=TRUE, normalize=TRUE, subset=NULL, target="core")

But then, I tried to annotate my dataset, with two different methods, but neither works.

First using the PDInfo :

> data.ann <- annotateEset(data.rma, pd.hta.2.0, type = "core")
Error: There appears to be a mismatch between the ExpressionSet and the annotation data.
Please ensure that the summarization level for the ExpressionSet and the 'type' argument are the same.
 See ?annotateEset for more information on the type argument.

Then, using the ChipDb :

> data.ann <- annotateEset(data.rma, hta20sttranscriptcluster.db, columns = c("PROBEID", "ENTREZID", "SYMBOL", "ENSEMBL", "GENENAME"))
Error: cannot allocate vector of size 37.1 Gb
In addition: Warning messages:
1: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) :
  Reached total allocation of 8089Mb: see help(memory.size)
2: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) :
  Reached total allocation of 8089Mb: see help(memory.size)
3: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) :
  Reached total allocation of 8089Mb: see help(memory.size)
4: In unique(.Internal(unlist(lapply(x, levels), recursive, FALSE))) :
  Reached total allocation of 8089Mb: see help(memory.size)

So, I don't understand the trouble with the PDInfo, since I used the same level of summarization (ie "core") in both commands. The second one is simply my computer not being able to process so much data. For the moment, I don't have access to a bioinformatic server, I will see if that's possible, but is there no way to annotate HTA arrays with 8Go of RAM.

For the details :

Computer : W10 64 bits, i5-2410M CPU (dual core, 2.3 Ghz), 8Go RAM, using R with Rstudio

Session Info :

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hta20sttranscriptcluster.db_8.3.1 org.Hs.eg.db_3.3.0               
 [3] AnnotationDbi_1.34.4              NMF_0.23.3                       
 [5] cluster_2.0.4                     rngtools_1.2.4                   
 [7] pkgmaker_0.25.10                  registry_0.3                     
 [9] limma_3.28.14                     pd.hta.2.0_3.12.1                
[11] RSQLite_1.0.0                     DBI_0.4-1                        
[13] oligo_1.36.1                      Biostrings_2.40.2                
[15] XVector_0.12.0                    IRanges_2.6.1                    
[17] S4Vectors_0.10.2                  genefilter_1.54.2                
[19] affycoretools_1.44.2              BiocInstaller_1.22.3             
[21] Biobase_2.32.0                    BiocGenerics_0.18.0              
[23] ggplot2_2.1.0                     rpart_4.1-10                     
[25] Matrix_1.2-6                      lattice_0.20-33                  
[27] oligoClasses_1.34.0              

loaded via a namespace (and not attached):
  [1] colorspace_1.2-6              hwriter_1.3.2                 class_7.3-14                 
  [4] modeltools_0.2-21             mclust_5.2                    biovizBase_1.20.0            
  [7] GenomicRanges_1.24.2          dichromat_2.0-0               affyio_1.42.0                
 [10] flexmix_2.3-13                mvtnorm_1.0-5                 interactiveDisplayBase_1.10.3
 [13] codetools_0.2-14              splines_3.3.0                 R.methodsS3_1.7.1            
 [16] ggbio_1.20.1                  doParallel_1.0.10             robustbase_0.92-6            
 [19] geneplotter_1.50.0            knitr_1.13                    Formula_1.2-1                
 [22] Rsamtools_1.24.0              gridBase_0.4-7                annotate_1.50.0              
 [25] kernlab_0.9-24                GO.db_3.3.0                   R.oo_1.20.0                  
 [28] graph_1.50.0                  shiny_0.13.2                  httr_1.2.1                   
 [31] GOstats_2.38.1                acepack_1.3-3.3               htmltools_0.3.5              
 [34] tools_3.3.0                   gtable_0.2.0                  affy_1.50.0                  
 [37] Category_2.38.0               reshape2_1.4.1                affxparser_1.44.0            
 [40] Rcpp_0.12.5                   trimcluster_0.1-2             gdata_2.17.0                 
 [43] preprocessCore_1.34.0         rtracklayer_1.32.1            fpc_2.1-10                   
 [46] iterators_1.0.8               stringr_1.0.0                 mime_0.5                     
 [49] ensembldb_1.4.7               gtools_3.5.0                  XML_3.98-1.4                 
 [52] dendextend_1.2.0              DEoptimR_1.0-6                AnnotationHub_2.4.2          
 [55] edgeR_3.14.0                  MASS_7.3-45                   zlibbioc_1.18.0              
 [58] scales_0.4.0                  BSgenome_1.40.1               VariantAnnotation_1.18.5     
 [61] SummarizedExperiment_1.2.3    RBGL_1.48.1                   RColorBrewer_1.1-2           
 [64] gridExtra_2.2.1               biomaRt_2.28.0                reshape_0.8.5                
 [67] latticeExtra_0.6-28           stringi_1.1.1                 gcrma_2.44.0                 
 [70] foreach_1.4.3                 GenomicFeatures_1.24.4        caTools_1.17.1               
 [73] BiocParallel_1.6.2            chron_2.3-47                  GenomeInfoDb_1.8.3           
 [76] prabclus_2.2-6                ReportingTools_2.12.2         bitops_1.0-6                 
 [79] GenomicAlignments_1.8.4       bit_1.1-12                    GSEABase_1.34.0              
 [82] AnnotationForge_1.14.2        GGally_1.2.0                  plyr_1.8.4                   
 [85] magrittr_1.5                  DESeq2_1.12.3                 R6_2.1.2                     
 [88] gplots_3.0.1                  Hmisc_3.17-4                  whisker_0.3-2                
 [91] foreign_0.8-66                survival_2.39-5               RCurl_1.95-4.8               
 [94] nnet_7.3-12                   KernSmooth_2.23-15            OrganismDbi_1.14.1           
 [97] PFAM.db_3.3.0                 locfit_1.5-9.1                grid_3.3.0                   
[100] data.table_1.9.6              diptest_0.75-7                digest_0.6.9                 
[103] xtable_1.8-2                  ff_2.2-13                     httpuv_1.3.3                 
[106] R.utils_2.3.0                 munsell_0.4.3     
ADD COMMENTlink modified 3.1 years ago by James W. MacDonald50k • written 3.1 years ago by giroudpaul40
Answer: Trouble annotating HTA microarray
2
gravatar for James W. MacDonald
3.1 years ago by
United States
James W. MacDonald50k wrote:

There is a bug in the code for annotateEset when using the pdInfo package that I will have to fix. And I will also have to put some error checking in for annotateEset when using the chipDb packages. It turns out that you can't do something like

> z <- mapIds(hta20sttranscriptcluster.db, featureNames(eset), "PROBEID","PROBEID")
Error in FUN(X[[i]], ...) : long vectors not supported yet: memory.c:1652

Which is what you are asking for when you do

data.ann <- annotateEset(data.rma, hta20sttranscriptcluster.db, columns = c("PROBEID", "ENTREZID", "SYMBOL", "ENSEMBL", "GENENAME"))

Because it's recursively calling mapIds on all the columns you listed there. Since you ALREADY get the PROBEID back by default, asking for it again is both not going to work, and is redundant.

For now, if you just do

data.ann <- annotateEset(data.rma, hta20sttranscriptcluster.db, columns = c("ENTREZID", "SYMBOL", "ENSEMBL", "GENENAME"))

it will work correctly, and downstream packages like limma will still show the probeset ID in the topTable results.

ADD COMMENTlink written 3.1 years ago by James W. MacDonald50k
1

OK, I have fixed the bugs:

> eset <- rma(read.celfiles(list.celfiles()))
Loading required package: pd.hta.2.0
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Background correcting
Normalizing
Calculating Expression
> library(affycoretools)

> eset2 <- annotateEset(eset, pd.hta.2.0)

> eset3 <- annotateEset(eset, hta20sttranscriptcluster.db)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] hta20sttranscriptcluster.db_8.3.1 org.Hs.eg.db_3.3.0               
 [3] AnnotationDbi_1.34.4              affycoretools_1.44.3             
 [5] pd.hta.2.0_3.12.1                 RSQLite_1.0.0                    
 [7] DBI_0.4-1                         oligo_1.36.1                     
 [9] Biostrings_2.40.2                 XVector_0.12.0                   
[11] IRanges_2.6.1                     S4Vectors_0.10.2                 
[13] Biobase_2.32.0                    oligoClasses_1.34.0              
[15] BiocGenerics_0.18.0              

 

It should progress through the build machines within a day or two - you are looking for version 1.44.3.

 

ADD REPLYlink written 3.1 years ago by James W. MacDonald50k

Thanks, that's nice to see such rapid support and fix ! I will look it up.

ADD REPLYlink written 3.1 years ago by giroudpaul40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 242 users visited in the last hour