Affymetrix HuGene 2.0 ST annotation

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.2 years ago

Dear List, I recently came across this post, that helped me in the analysis of data using this array. https://stat.ethz.ch/pipermail/bioconductor/2014-May/059408.html However, I am concerned about the annotation and wondered if what I get is usual for this kind of array. Code: eset_mat <- as.matrix(Eset) dim(eset_mat) #53617 6 library(annotate) library(hugene20sttranscriptcluster.db) annodb <- "hugene20sttranscriptcluster.db" ID <- featureNames(Eset) Symbol <- as.character(lookUp(ID, annodb, "SYMBOL")) Name <- as.character(lookUp(ID, annodb, "GENENAME")) Entrez <- as.character(lookUp(ID, annodb, "ENTREZID")) Ensembl <- as.character(lookUp(ID, annodb, "ENSEMBL")) annot = data.frame("ID"=ID,"Symbol"=Symbol,"Description"=Name,"EntrezI D"=Entrez,"EnsemblID"=Ensembl) length(which(Symbol != "NA")) # 23672 =====> is this normal? length(Symbol)) # 53617 ----- Is it normal to get <50% annotation? (At present I have not done any filtering pre limma, used all 53K+ probes for DE). Many Thanks, Natasha -- output of sessionInfo(): -- -- Sent via the guest posting facility at bioconductor.org.

Annotation limma Annotation limma • 1.3k views

ADD COMMENT • link updated 11.4 years ago by James W. MacDonald 68k • written 11.4 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Natasha, On 6/4/2014 10:20 AM, Natasha [guest] wrote: > Dear List, > > I recently came across this post, that helped me in the analysis of data using this array. > https://stat.ethz.ch/pipermail/bioconductor/2014-May/059408.html > > However, I am concerned about the annotation and wondered if what I get is usual for this kind of array. > > Code: > eset_mat <- as.matrix(Eset) > dim(eset_mat) #53617 6 > > library(annotate) > library(hugene20sttranscriptcluster.db) > > annodb <- "hugene20sttranscriptcluster.db" > ID <- featureNames(Eset) > Symbol <- as.character(lookUp(ID, annodb, "SYMBOL")) > Name <- as.character(lookUp(ID, annodb, "GENENAME")) > Entrez <- as.character(lookUp(ID, annodb, "ENTREZID")) > Ensembl <- as.character(lookUp(ID, annodb, "ENSEMBL")) > > annot = data.frame("ID"=ID,"Symbol"=Symbol,"Description"=Name,"Entre zID"=Entrez,"EnsemblID"=Ensembl) > > length(which(Symbol != "NA")) # 23672 =====> is this normal? > length(Symbol)) # 53617 > ----- > Is it normal to get <50% annotation? Sort of. You are using an old method of annotating data that still exists for backwards compatibility, but is not really how you should be doing things these days. Note also that this old method of annotating probesets masked any probes with a one-to-many annotation. If we toggle this masking off, you get about 7000 more symbols: > z <- toggleProbes(hugene20sttranscriptclusterSYMBOL, "all") > symbol2 <- unlist(mget(ID, z)) > symbol2 <- symbol2[!is.na(symbol2)] > sum(!duplicated(names(symbol2))) [1] 30769 This also masks the fact that a given probeset might interrogate lots of things > z <- select(hugene20sttranscriptcluster.db, keys(hugene20sttranscriptcluster.db), c("SYMBOL","GENENAME","ENTREZID","ENSEMBL")) Warning message: In .generateExtraRows(tab, keys, jointype) : 'select' resulted in 1:many mapping between keys and return rows > dim(z) [1] 80172 5 > zlst <- split(z, z[,1]) > zlst[sapply(zlst, nrow) > 5][5] $`16659407` PROBEID SYMBOL GENENAME ENTREZID ENSEMBL 3821 16659407 PRAMEF5 PRAME family member 5 343068 ENSG00000204502 3822 16659407 PRAMEF5 PRAME family member 5 343068 ENSG00000232423 3823 16659407 PRAMEF23 PRAME family member 23 729368 ENSG00000232423 3824 16659407 PRAMEF6 PRAME family member 6 440561 ENSG00000232423 3825 16659407 PRAMEF15 PRAME family member 15 653619 ENSG00000157358 3826 16659407 PRAMEF15 PRAME family member 15 653619 ENSG00000204501 3827 16659407 PRAMEF9 PRAME family member 9 343070 ENSG00000204501 3828 16659407 PRAMEF9 PRAME family member 9 343070 ENSG00000157358 3829 16659407 PRAMEF11 PRAME family member 11 440560 ENSG00000204513 3830 16659407 PRAMEF4 PRAME family member 4 400735 ENSG00000243073 And how you deal with these one-to-many mappings is not trivial. Best, Jim > > (At present I have not done any filtering pre limma, used all 53K+ probes for DE). > > > Many Thanks, > Natasha > > -- output of sessionInfo(): > > -- > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.4 years ago James W. MacDonald 68k

Login before adding your answer.