Question: affycoretools annotateEset problem using Clariom D arrays
gravatar for willj
16 days ago by
willj30 wrote:

I'm coming up against an annotation mismatch error for probesets, when using annotateEset in the affycoretools package, and after having run rma using the oligo package. The following commands work fine:

> rma.genes <- oligo::rma(rawData, target="core")
Background correcting
Calculating Expression
> rma.genes <- annotateEset(rma.genes, annotation(rma.genes), type='core')
> featureData(rma.genes)
An object of class 'AnnotatedDataFrame'
  rowNames: AFFX-BkGr-GC03_st AFFX-BkGr-GC04_st ... TSUnmapped00001002.hg.1 (138745 total)
  varMetadata: labelDescription

But the following gives a mismatch error, as shown, and the featureData remains empty:

> rma.probesets <- oligo::rma(rawData, target="probeset")
Background correcting
Calculating Expression

> rma.probesets <- annotateEset(rma.probesets, annotation(rma.probesets), type='probeset')
Error: There appears to be a mismatch between the ExpressionSet and the annotation data.
Please ensure that the summarization level for the ExpressionSet and the 'type' argument are the same.
See ?annotateEset for more information on the type argument.

> featureData(rma.probesets)
An object of class 'AnnotatedDataFrame': none


Am I right that this should work? I think I'm correctly following advice given here Alternate expression of splice isoforms on Affy Clariom D assay (also some here


These are Clariom D arrays:

> rawData <- read.celfiles(celFiles)
Loading required package: pd.clariom.d.human


Many many thanks for any help,




> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.clariom.d.human_3.14.1 DBI_0.5-1                 RSQLite_1.1-2             affycoretools_1.46.5     
 [5] oligo_1.38.0              Biostrings_2.42.1         XVector_0.14.0            IRanges_2.8.1            
 [9] S4Vectors_0.12.1          Biobase_2.34.0            oligoClasses_1.36.0       BiocGenerics_0.20.0      

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2              hwriter_1.3.2                 biovizBase_1.22.0            
  [4] htmlTable_1.8                 GenomicRanges_1.26.2          base64enc_0.1-3              
  [7] dichromat_2.0-0               affyio_1.44.0                 interactiveDisplayBase_1.12.0
 [10] AnnotationDbi_1.36.2          codetools_0.2-15              splines_3.3.2                
 [13] R.methodsS3_1.7.1             ggbio_1.22.4                  geneplotter_1.52.0           
 [16] knitr_1.15.1                  Formula_1.2-1                 Rsamtools_1.26.1             
 [19] annotate_1.52.1               cluster_2.0.5                 GO.db_3.4.0                  
 [22] R.oo_1.21.0                   graph_1.52.0                  shiny_1.0.0                  
 [25] httr_1.2.1                    GOstats_2.40.0                backports_1.0.4              
 [28] assertthat_0.1                Matrix_1.2-7.1                lazyeval_0.2.0               
 [31] limma_3.30.8                  acepack_1.4.1                 htmltools_0.3.5              
 [34] tools_3.3.2                   gtable_0.2.0                  affy_1.52.0                  
 [37] Category_2.40.0               reshape2_1.4.2                affxparser_1.46.0            
 [40] Rcpp_0.12.9                   gdata_2.18.0                  preprocessCore_1.36.0        
 [43] rtracklayer_1.34.1            iterators_1.0.8               stringr_1.1.0                
 [46] mime_0.5                      ensembldb_1.6.2               gtools_3.5.0                 
 [49] XML_3.98-1.5                  AnnotationHub_2.6.4           edgeR_3.16.5                 
 [52] zlibbioc_1.20.0               scales_0.4.1                  BSgenome_1.42.0              
 [55] VariantAnnotation_1.20.2      BiocInstaller_1.24.0          SummarizedExperiment_1.4.0   
 [58] RBGL_1.50.0                   RColorBrewer_1.1-2            yaml_2.1.14                  
 [61] memoise_1.0.0                 gridExtra_2.2.1               ggplot2_2.2.1                
 [64] biomaRt_2.30.0                rpart_4.1-10                  reshape_0.8.6                
 [67] latticeExtra_0.6-28           stringi_1.1.2                 gcrma_2.46.0                 
 [70] genefilter_1.56.0             foreach_1.4.3                 checkmate_1.8.2              
 [73] caTools_1.17.1                GenomicFeatures_1.26.2        BiocParallel_1.8.1           
 [76] GenomeInfoDb_1.10.2           ReportingTools_2.14.0         bitops_1.0-6                 
 [79] lattice_0.20-34               GenomicAlignments_1.10.0      bit_1.1-12                   
 [82] GSEABase_1.36.0               AnnotationForge_1.16.1        GGally_1.3.2                 
 [85] plyr_1.8.4                    magrittr_1.5                  DESeq2_1.14.1                
 [88] R6_2.2.0                      gplots_3.0.1                  Hmisc_4.0-2                  
 [91] foreign_0.8-67                survival_2.40-1               RCurl_1.95-4.8               
 [94] nnet_7.3-12                   tibble_1.2                    KernSmooth_2.23-15           
 [97] OrganismDbi_1.16.0            PFAM.db_3.4.0                 locfit_1.5-9.1               
[100] grid_3.3.2                    data.table_1.10.0             digest_0.6.11                
[103] xtable_1.8-2                  ff_2.2-13                     httpuv_1.3.3                 
[106] R.utils_2.5.0                 munsell_0.4.3   


ADD COMMENTlink modified 15 days ago by James W. MacDonald44k • written 16 days ago by willj30
gravatar for James W. MacDonald
15 days ago by
United States
James W. MacDonald44k wrote:

When you use annotateEset like that, you are reading in the raw annotation data (from the annotation csv) that comes packaged with the pdInfo package. Apparently that annotation data is borked somehow; the test is for at least 95% overlap between the probeset IDs in the annotation csv and the probeset IDs in the ExpressionSet you are trying to annotate. Unfortunately the overlap is 0%. It appears that the probeset annotation file for this package is actually the transcript annotation file (again), which is why you get the problem you see.

There is an alternative way to annotate your data (which is IMO the 'main' way to do such things), which is to use the ChipDb that we supply.

> library(clariomdhumanprobeset.db)
Loading required package: AnnotationDbi
Loading required package:

> eset <- annotateEset(eset, clariomdhumanprobeset.db)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> featureData(eset)
An object of class 'AnnotatedDataFrame'
  rowNames: 24561160 24561161 ... rat-RPTR-XXU09476-1_st (1562457
  varMetadata: labelDescription

> apply(fData(eset), 2, function(x) sum(!
1.0000000 0.4449582 0.4449582 0.4449582
ADD COMMENTlink written 15 days ago by James W. MacDonald44k

Thanks a lot James. By the way: is there some resource or document that would have pointed me to using your ChipDb without having to post a question here? i.e. something I should be keeping up-to-date with for future reference?

ADD REPLYlink written 15 days ago by willj30

You mean other than the help page? Here is the first section:

annotateEset           package:affycoretools           R Documentation

Method to annotate ExpressionSets automatically


     This function fills the featureData slot of the ExpressionSet
     automatically, which is then available to downstream methods to
     provide annotated output. Annotating results is tedious, and can
     be surprisingly difficult to get right. By annotating the data
     automatically, we remove the tedium and add an extra layer of
     security since the resulting ExpressionSet will be tested for
     validity automatically (e.g., annotation data match up correctly
     with the expression data). Current choices for the annoation data
     are a ChipDb object (e.g., hugene10sttranscriptcluster.db) or an
     AffyGenePDInfo object (e.g., In the latter
     case, we use the parsed Affymetrix annotation csv file to get
     data. This is only intended for those situations where the ChipDb
     package is not available.


ADD REPLYlink written 14 days ago by James W. MacDonald44k

Great, thanks - it should have been obvious to me to check there. I note also for anyone else struggling with this that there is also some explanation of ChipDb here in an AnnotationDbi vignette:

ADD REPLYlink written 13 days ago by willj30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 266 users visited in the last hour