confused about mget with u133x3p.db and u133x3pALIAS2PROBE
1
0
Entering edit mode
Andrew Yee ▴ 350
@andrew-yee-2667
Last seen 7.3 years ago
I was trying to figure out why I was getting this output with the u133x3p.db package. Specifically, if I enter: mget("MYC", env=u133x3pALIAS2PROBE) It returns: $MYC [1] "4854714C_3p_s_at" "Hs.300470.0.A1_3p_at" "Hs2.372887.1.S1_3p_at" "g12962934_3p_a_at" "g3126906_3p_a_at" However, if you try to convert the first probe set ID returned, mget( "4854714C_3p_s_at", env=u133x3pSYMBOL) it returns:$4854714C_3p_s_at [1] "NOL3" I'm puzzled why the output from u133x3pALIAS2PROBE doesn't exactly match up with u133x3pSYMBOL Thanks, Andrew [[alternative HTML version deleted]]
probe convert probe convert • 483 views
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 5.3 years ago
United States
Hi Andrew, I think that what is confusing you is that the "alias" and "symbol" mappings have different meanings. The "alias" mapping is for mapping all possible gene symbols (as used by scientists) and the "symbol" mapping only returns one (the "official" one according to NCBI) symbol per probeset. If you were to look at the reverse map of the alias map, you could see this demonstrated here: mget("4854714C_3p_s_at", env=revmap(u133x3pALIAS2PROBE)) This returns all the symbols associated with this probeset including the "official" one of NOL3: $4854714C_3p_s_at [1] "ARC" "CARD2" "MYC" "MYP" "NOL3" "NOP" "NOP30" In contrast, the u133x3pSYMBOL mapping only maps to the "official" gene symbol for a probeset, so all you get there is NOL3. Marc Andrew Yee wrote: > I was trying to figure out why I was getting this output with the u133x3p.db > package. > > Specifically, if I enter: > > mget("MYC", env=u133x3pALIAS2PROBE) > > It returns: > >$MYC > [1] "4854714C_3p_s_at" "Hs.300470.0.A1_3p_at" "Hs2.372887.1.S1_3p_at" > "g12962934_3p_a_at" "g3126906_3p_a_at" > > However, if you try to convert the first probe set ID returned, > > mget( "4854714C_3p_s_at", env=u133x3pSYMBOL) > > it returns: > > $4854714C_3p_s_at > [1] "NOL3" > > I'm puzzled why the output from u133x3pALIAS2PROBE doesn't exactly match up > with u133x3pSYMBOL > > Thanks, > Andrew > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ADD COMMENT 0 Entering edit mode Thanks for your help. And I should have looked at the entry on NCBI more carefully for the aliases of NOL3. In the future, I should just use the revmap() to restrict the query to official symbols. mget("MYC", env=revmap(u133x3pSYMBOL)) Thanks, Andrew On Mon, Jan 26, 2009 at 7:29 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Andrew, > > I think that what is confusing you is that the "alias" and "symbol" > mappings have different meanings. The "alias" mapping is for mapping > all possible gene symbols (as used by scientists) and the "symbol" > mapping only returns one (the "official" one according to NCBI) symbol > per probeset. > > If you were to look at the reverse map of the alias map, you could see > this demonstrated here: > > mget("4854714C_3p_s_at", env=revmap(u133x3pALIAS2PROBE)) > > This returns all the symbols associated with this probeset including the > "official" one of NOL3: >$4854714C_3p_s_at > [1] "ARC" "CARD2" "MYC" "MYP" "NOL3" "NOP" "NOP30" > > In contrast, the u133x3pSYMBOL mapping only maps to the "official" gene > symbol for a probeset, so all you get there is NOL3. > > > Marc > > > > Andrew Yee wrote: > > I was trying to figure out why I was getting this output with the > u133x3p.db > > package. > > > > Specifically, if I enter: > > > > mget("MYC", env=u133x3pALIAS2PROBE) > > > > It returns: > > > > $MYC > > [1] "4854714C_3p_s_at" "Hs.300470.0.A1_3p_at" > "Hs2.372887.1.S1_3p_at" > > "g12962934_3p_a_at" "g3126906_3p_a_at" > > > > However, if you try to convert the first probe set ID returned, > > > > mget( "4854714C_3p_s_at", env=u133x3pSYMBOL) > > > > it returns: > > > >$4854714C_3p_s_at > > [1] "NOL3" > > > > I'm puzzled why the output from u133x3pALIAS2PROBE doesn't exactly match > up > > with u133x3pSYMBOL > > > > Thanks, > > Andrew > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Andrew, Another thing your example points out that is worth mentioning is that gene symbols are AWFUL as identifiers, because not only are there many symbols for a given gene, but a given symbol is not even guaranteed to be unique. Because of the problems with them, the best thing to do is to avoid using gene symbols to identify genes at all. Usually you only want to provide them as additional information about a gene, and never as a primary identifier. The alias mapping is therefore really just for those times when your only option is to use a gene symbol and you really need to go fishing for what something might be starting with a gene symbol. But be careful when using it. Sometimes a symbol might map to more than one gene. And there is not really much we can do about that. We simply have no way to know which gene you will want in those cases. That actually happened in this case as well, which is why its a good idea to start with the symbol mapping 1st when you are forced to go fishing for a genes identity like that. Marc Andrew Yee wrote: > Thanks for your help. And I should have looked at the entry on NCBI > more carefully for the aliases of NOL3. > > In the future, I should just use the revmap() to restrict the query to > official symbols. > > mget("MYC", env=revmap(u133x3pSYMBOL)) > > Thanks, > Andrew > > > On Mon, Jan 26, 2009 at 7:29 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> <mailto:mcarlson at="" fhcrc.org="">> wrote: > > Hi Andrew, > > I think that what is confusing you is that the "alias" and "symbol" > mappings have different meanings. The "alias" mapping is for mapping > all possible gene symbols (as used by scientists) and the "symbol" > mapping only returns one (the "official" one according to NCBI) symbol > per probeset. > > If you were to look at the reverse map of the alias map, you could see > this demonstrated here: > > mget("4854714C_3p_s_at", env=revmap(u133x3pALIAS2PROBE)) > > This returns all the symbols associated with this probeset > including the > "official" one of NOL3: > $4854714C_3p_s_at > [1] "ARC" "CARD2" "MYC" "MYP" "NOL3" "NOP" "NOP30" > > In contrast, the u133x3pSYMBOL mapping only maps to the "official" > gene > symbol for a probeset, so all you get there is NOL3. > > > Marc > > > > Andrew Yee wrote: > > I was trying to figure out why I was getting this output with > the u133x3p.db > > package. > > > > Specifically, if I enter: > > > > mget("MYC", env=u133x3pALIAS2PROBE) > > > > It returns: > > > >$MYC > > [1] "4854714C_3p_s_at" "Hs.300470.0.A1_3p_at" > "Hs2.372887.1.S1_3p_at" > > "g12962934_3p_a_at" "g3126906_3p_a_at" > > > > However, if you try to convert the first probe set ID returned, > > > > mget( "4854714C_3p_s_at", env=u133x3pSYMBOL) > > > > it returns: > > > > \$4854714C_3p_s_at > > [1] "NOL3" > > > > I'm puzzled why the output from u133x3pALIAS2PROBE doesn't > exactly match up > > with u133x3pSYMBOL > > > > Thanks, > > Andrew > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >