Question: probe to entrezID mapping with aafLocusLink
Merja Heinaniemi10 wrote:
Hi! I was mapping probeIDs from 133plus2 arrays to entrezIDs using aafLocusLink, some months ago with an earlier version of the package, and now with the current annaffy and hgu133plus2 packages. I compared my results and some probes no longer got mapped with the new package version, e.g POU5F1. The gene does have probes on the array, all just happen to be x_at probes. So I thought maybe all those less specific probes lack entrez mappings but another gene with x_at does have a matching entrezID. So why is e.g POU5F1 missing one? I include below the R code that can be used to reproduce my problem (even the first part if any hgu133Plus2 arrays are read in), sessionInfo is given at the end. And more importantly, how do I get such probes mapped to an entrezID using Bioconductor? I was assuming the hgu133plus2 package contains all manufacturer annotations so I should find a match, or am I wrong? thanks in advance! Merja ##R commands: #affybatch=read.affybatch(filenames=Filenames) #eset=rma(affybatch) #grep("208286_x_at",featureNames(eset)) #[1] 17711 library(annaffy) library(hgu133plus2.db) probeID1="208286_x_at" ##this is POU5F1 entrezID 5460 probeID2="215600_x_at" ##this is FBXW12 entrezID 285231 entrezID1=aafLocusLink(probeID1, "hgu133plus2.db") entrezID1 #integer() entrezID2=aafLocusLink(probeID2, "hgu133plus2.db") entrezID2 #[1] 285231 x <- hgu133plus2ENTREZID ## Get the probe identifiers that are mapped to an ENTREZ Gene ID mapped_probes <- mappedkeys(x) ## Convert to a list xx <- as.list(x[mapped_probes]) xx[xx=="5460"] #list() xx[xx=="285231"] #$1564138_at #[1] "285231" #$215600_x_at #[1] "285231" > sessionInfo() #R version 2.10.0 (2009-10-26) #i386-apple-darwin9.8.0 #locale: #[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 #attached base packages: #[1] stats graphics grDevices utils datasets methods base #other attached packages: # [1] hgu133plus2cdf_2.5.0 hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6 annaffy_1.18.0 KEGG.db_2.3.5 GO.db_2.3.5 # [7] RSQLite_0.7-3 DBI_0.2-4 AnnotationDbi_1.8.1 affy_1.24.2 Biobase_2.6.0 #loaded via a namespace (and not attached): #[1] affyio_1.14.0 preprocessCore_1.8.0 tools_2.10.0
Answer: probe to entrezID mapping with aafLocusLink
James W. MacDonald51k wrote:
Hi Merja, Merja Heinaniemi wrote: > Hi! > > I was mapping probeIDs from 133plus2 arrays to entrezIDs using aafLocusLink, some months ago with an earlier version of the package, and now with the current annaffy and hgu133plus2 packages. I compared my results and some probes no longer got mapped with the new package version, e.g POU5F1. The gene does have probes on the array, all just happen to be x_at probes. So I thought maybe all those less specific probes lack entrez mappings but another gene with x_at does have a matching entrezID. So why is e.g POU5F1 missing one? I include below the R code that can be used to reproduce my problem (even the first part if any hgu133Plus2 arrays are read in), sessionInfo is given at the end. > > And more importantly, how do I get such probes mapped to an entrezID using Bioconductor? I was assuming the hgu133plus2 package contains all manufacturer annotations so I should find a match, or am I wrong? As you note, this symbol is no longer mapped to 208286_x_at in the current hgu133plus2.db package. I don't know why; netaffx still claims this mapping. Perhaps Marc Carlson can shed some light. You could map the Affy IDs to Entrez Gene using biomaRt as well, and that mapping still exists: > getBM("entrezgene","affy_hg_u133_plus_2", "208286_x_at", mart) entrezgene 1 5460 2 5462 I assume you are using aafLocusLink() because you are creating HTML or text tables for your output. Or perhaps you don't know that you can simply do: > mget(c("208286_x_at","215600_x_at"), hgu133plus2ENTREZID) $208286_x_at [1] NA$215600_x_at [1] "285231" to do the mapping? <self promotion=""> If you are trying to create tables and would like to do the mappings via biomaRt, you could use either limma2biomaRt() or probes2tableBM() in the affycoretools package, which will output HTML or text tables with links to various databases, like you get with annaffy (but without the sweet css candy that colors the expression values according to the expression level). </self> Best, Jim > > thanks in advance! > > Merja > > > > ##R commands: > > #affybatch=read.affybatch(filenames=Filenames) > #eset=rma(affybatch) > #grep("208286_x_at",featureNames(eset)) > #[1] 17711 > > library(annaffy) > library(hgu133plus2.db) > probeID1="208286_x_at" ##this is POU5F1 entrezID 5460 > probeID2="215600_x_at" ##this is FBXW12 entrezID 285231 > entrezID1=aafLocusLink(probeID1, "hgu133plus2.db") > entrezID1 > #integer() > entrezID2=aafLocusLink(probeID2, "hgu133plus2.db") > entrezID2 > #[1] 285231 > > x <- hgu133plus2ENTREZID > ## Get the probe identifiers that are mapped to an ENTREZ Gene ID > mapped_probes <- mappedkeys(x) > ## Convert to a list > xx <- as.list(x[mapped_probes]) > xx[xx=="5460"] > #list() > xx[xx=="285231"] > #$1564138_at > #[1] "285231" > > #$215600_x_at > #[1] "285231" > >> sessionInfo() > #R version 2.10.0 (2009-10-26) > #i386-apple-darwin9.8.0 > > #locale: > #[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > > #other attached packages: > # [1] hgu133plus2cdf_2.5.0 hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6 annaffy_1.18.0 KEGG.db_2.3.5 GO.db_2.3.5 > # [7] RSQLite_0.7-3 DBI_0.2-4 AnnotationDbi_1.8.1 affy_1.24.2 Biobase_2.6.0 > > #loaded via a namespace (and not attached): > #[1] affyio_1.14.0 preprocessCore_1.8.0 tools_2.10.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Hi Merja, The probe that you are wondering about actually maps to 4 different genes! And, you can see this result if you take advantage of the newest annotation packages. You can expose this by using toggleProbes as follows: EGMap = toggleProbes(hgu133plus2ENTREZID, "all") get(probeID1, EGMap) The reason that you won't see this with normal usage, is that the annotation packages will by default try to hide such probes from you (in this case, returning nothing instead of the 4 genes). You have to use the toggleProbes() method to uncover the mappings that match to cross hybridizing probes. This was done because 1) most people want to avoid probes that cross hybridize like this and 2) legacy code would probably have broken all over the place if we had just unleashed this change on everyone as a default behavior. The reason that you are seeing so many "_x_" probes that have this problem is that those are probes that Affy knows tend to be cross-hybridizers. So the fact that these probes map to multiple things is not much of a surprise. Please let us know if you still have questions. Marc James W. MacDonald wrote: > Hi Merja, > > Merja Heinaniemi wrote: >> Hi! >> >> I was mapping probeIDs from 133plus2 arrays to entrezIDs using >> aafLocusLink, some months ago with an earlier version of the package, >> and now with the current annaffy and hgu133plus2 packages. I compared >> my results and some probes no longer got mapped with the new package >> version, e.g POU5F1. The gene does have probes on the array, all just >> happen to be x_at probes. So I thought maybe all those less specific >> probes lack entrez mappings but another gene with x_at does have a >> matching entrezID. So why is e.g POU5F1 missing one? I include below >> the R code that can be used to reproduce my problem (even the first >> part if any hgu133Plus2 arrays are read in), sessionInfo is given at >> the end. >> >> And more importantly, how do I get such probes mapped to an entrezID >> using Bioconductor? I was assuming the hgu133plus2 package contains >> all manufacturer annotations so I should find a match, or am I wrong? > > As you note, this symbol is no longer mapped to 208286_x_at in the > current hgu133plus2.db package. I don't know why; netaffx still claims > this mapping. Perhaps Marc Carlson can shed some light. > > You could map the Affy IDs to Entrez Gene using biomaRt as well, and > that mapping still exists: > > > getBM("entrezgene","affy_hg_u133_plus_2", "208286_x_at", mart) > entrezgene > 1 5460 > 2 5462 > > I assume you are using aafLocusLink() because you are creating HTML or > text tables for your output. Or perhaps you don't know that you can > simply do: > > > mget(c("208286_x_at","215600_x_at"), hgu133plus2ENTREZID) > $208286_x_at > [1] NA > >$215600_x_at > [1] "285231" > > to do the mapping? > > <self promotion=""> > > If you are trying to create tables and would like to do the mappings > via biomaRt, you could use either limma2biomaRt() or probes2tableBM() > in the affycoretools package, which will output HTML or text tables > with links to various databases, like you get with annaffy (but > without the sweet css candy that colors the expression values > according to the expression level). > > </self> > > Best, > > Jim > > >> >> thanks in advance! >> >> Merja >> >> >> >> ##R commands: >> >> #affybatch=read.affybatch(filenames=Filenames) >> #eset=rma(affybatch) >> #grep("208286_x_at",featureNames(eset)) >> #[1] 17711 >> >> library(annaffy) >> library(hgu133plus2.db) >> probeID1="208286_x_at" ##this is POU5F1 entrezID 5460 >> probeID2="215600_x_at" ##this is FBXW12 entrezID 285231 >> entrezID1=aafLocusLink(probeID1, "hgu133plus2.db") >> entrezID1 >> #integer() >> entrezID2=aafLocusLink(probeID2, "hgu133plus2.db") >> entrezID2 >> #[1] 285231 >> >> x <- hgu133plus2ENTREZID >> ## Get the probe identifiers that are mapped to an ENTREZ Gene ID >> mapped_probes <- mappedkeys(x) >> ## Convert to a list >> xx <- as.list(x[mapped_probes]) >> xx[xx=="5460"] >> #list() >> xx[xx=="285231"] >> #$1564138_at >> #[1] "285231" >> >> #$215600_x_at >> #[1] "285231" >> >>> sessionInfo() >> #R version 2.10.0 (2009-10-26) >> #i386-apple-darwin9.8.0 >> >> #locale: >> #[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> #attached base packages: >> #[1] stats graphics grDevices utils datasets methods base >> >> #other attached packages: >> # [1] hgu133plus2cdf_2.5.0 hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6 >> annaffy_1.18.0 KEGG.db_2.3.5 GO.db_2.3.5 >> # [7] RSQLite_0.7-3 DBI_0.2-4 AnnotationDbi_1.8.1 >> affy_1.24.2 Biobase_2.6.0 >> >> #loaded via a namespace (and not attached): >> #[1] affyio_1.14.0 preprocessCore_1.8.0 tools_2.10.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >