I am having trouble figuring out how to transform the probeset ID from the Affy given name to an entrez ID (heck, I would even settle for gene symbols). My session info is listed at the bottom of this post. I run into this problem whether utilizing the custom CDF that was given to me by Affymetrix, or utilizing pdInfoMaker as was described C: Clariom D Human Microarray CDF file to package. Just as there is no CDF publicly available, there is no annotation .db object available.
I have tried the 'AffyCompatable' package to fetch the NetAffx resource, however, it returns a data frame with all the information jumbled into a single column.
I have also tried utilizing the .db for the predecessor, just to be able to match the Affy_id's to the entrez Id's but that returns an error.
> library(hta20sttranscriptcluster.db) Loading required package: org.Hs.eg.db > affyid <- rownames(eset) > egids2 <- hta20sttranscriptclusterENTREZID[affyid] Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : value for "AFFX-BkGr-GC03_st" not found
My session info. Can you tell I have been trying a ton of different things?
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] hta20sttranscriptcluster.db_8.3.1 org.Hs.eg.db_3.3.0 [3] AnnotationDbi_1.35.4 AffyCompatible_1.33.0 [5] RCurl_1.95-4.8 bitops_1.0-6 [7] XML_3.98-1.4 biomaRt_2.29.2 [9] BiocInstaller_1.23.9 genefilter_1.55.2 [11] pd.clariom.d.human_0.0.1 RSQLite_1.0.0 [13] DBI_0.5-1 oligo_1.37.2 [15] Biostrings_2.41.4 XVector_0.13.7 [17] IRanges_2.7.15 S4Vectors_0.11.14 [19] oligoClasses_1.35.0 limma_3.29.21 [21] Biobase_2.33.3 BiocGenerics_0.19.2 loaded via a namespace (and not attached): [1] GenomeInfoDb_1.9.10 iterators_1.0.8 [3] tools_3.3.1 zlibbioc_1.19.0 [5] bit_1.1-12 annotate_1.51.0 [7] preprocessCore_1.35.0 lattice_0.20-34 [9] ff_2.2-13 Matrix_1.2-7.1 [11] foreach_1.4.3 affxparser_1.45.0 [13] grid_3.3.1 survival_2.39-5 [15] codetools_0.2-14 GenomicRanges_1.25.94 [17] splines_3.3.1 SummarizedExperiment_1.3.82 [19] xtable_1.8-2 affyio_1.43.0
Hey Jayme, Let me know if you figured out how to annotate the Clariom D data. I have tried a method mentioned here using Bioconductor package "clariomdhumantranscriptcluster.db" package but that doesn't seem to work for me.
Thanks
Did you read the this full thread? Also the comments of James below?
"The easiest thing to do is to use
annotateEset
in myaffycoretools
package".Some relevant points:
- you can summarize the probes on the levels of known transcripts (= core; default), or on the level of probe set regions, that are intended to measure portions of an exon, or exon-exon junctions. See e.g. James' remarks for much more info on this: C: Transcript to gene in clariom d human affymetrix data.
- you can use the annotation provided by Affymetrix (as available on their NetAffx site), which is contained in the Platform Design (pd) package
pd.clariom.d.human
.- you can also use the annotation that is assembled by the BioC core team using data from public repositories, either on the levels of probe set regions (
clariomdhumanprobeset.db
) or transcripts (clariomdhumantranscriptcluster.db
).Some example code:
An excerpt of the annotated transcript-level data (notice the subtle differences...):