how does an annotation package handle ambigious probe set id mappings
1
0
Entering edit mode
Andrew Yee ▴ 350
@andrew-yee-2667
Last seen 10.2 years ago
Apologies if this has been asked before, but how does an annotation package handle an ambiguous probe set ID mapping? Take for example the Affymetrix chip U133X3P. When I use the annotation for this chip for probe set ID 1552641_3p_s_at, it returns only one match: > library('u133x3p.db') > mget('1552641_3p_s_at', env=u133x3pSYMBOL) $`1552641_3p_s_at` [1] "ATAD3B" > mget('1552641_3p_s_at', env=u133x3pENTREZID) $`1552641_3p_s_at` [1] "83858" However, when I search Affymetrix, with: https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=U133_X3 P:1552641_3P_S_AT it states that it ambiguously maps to three gene symbols,?ATAD3A, ATAD3B, and?LOC732419. How does the annotation package determine which gene symbol it should map to? Thanks, Andrew
Annotation u133x3p probe Annotation u133x3p probe • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
Hi Andrew, Andrew Yee wrote: > Apologies if this has been asked before, but how does an annotation > package handle an ambiguous probe set ID mapping? > > Take for example the Affymetrix chip U133X3P. > > When I use the annotation for this chip for probe set ID > 1552641_3p_s_at, it returns only one match: > >> library('u133x3p.db') >> mget('1552641_3p_s_at', env=u133x3pSYMBOL) > $`1552641_3p_s_at` > [1] "ATAD3B" >> mget('1552641_3p_s_at', env=u133x3pENTREZID) > $`1552641_3p_s_at` > [1] "83858" > > However, when I search Affymetrix, with: > > https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=U133_ X3P:1552641_3P_S_AT > > it states that it ambiguously maps to three gene symbols, ATAD3A, > ATAD3B, and LOC732419. > > How does the annotation package determine which gene symbol it should map to? In the past we just used the first probeset ==> Entrez Gene ID mapping. However, in the soon to be released BioC 2.5 annotation packages all the mappings are included (thanks to Marc Carlson). > tmp <- toggleProbes(u133x3pENTREZID, "all") > get('1552641_3p_s_at', tmp) [1] "55210" "732419" "83858" > tmp2 <- toggleProbes(u133x3pSYMBOL, "all") > get('1552641_3p_s_at', tmp2) [1] "ATAD3A" "LOC732419" "ATAD3B" Oddly enough, this probeset isn't mapped in the 'regular' mappings: > get('1552641_3p_s_at', u133x3pENTREZID) [1] NA > get('1552641_3p_s_at', u133x3pSYMBOL) [1] NA Marc? > sessionInfo() R version 2.10.0 Under development (unstable) (2009-09-21 r49780) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] u133x3p.db_2.3.5 org.Hs.eg.db_2.3.4 RSQLite_0.7-2 [4] DBI_0.2-4 AnnotationDbi_1.7.17 Biobase_2.5.6 loaded via a namespace (and not attached): [1] tools_2.10.0 > Best, Jim > > Thanks, > Andrew > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
Thank you Jim, The probeset is not displayed in the regular mappings precisely because it maps to multiple things. Because it is of ambiguous assignment, most people will probably want to avoid it the majority of the time. Also, legacy code that depends on getting one value back for such operations needs to be able to continue working. However, as you have so carefully illustrated, you can now get the complete data for any mapping by using toggleProbes(). There are 3 settings for any mapping that can be set using toggleProbes: "all" which gives you every mapping regardless of what it is, "multiple" which only exposes mappings where many probes map to the same item, and "single" which is the default and which will expose only those probe IDs that have been assigned by the manufacturer to a single gene. So if you don't use "all" then the troublesome ambiguously assigned probes will not be represented in the mapping (ie. you will get an NA). The majority of the time, probes are assigned to a single gene and so normally most things are represented just fine by "single". This default has the side benefit that it shields you from those probes where the manufacturer is less than certain about the identity. But for those cases where there are multiple genes assigned to a probe or probeset, you can now also get all the assignments out if you wish (or just the troublesome ones if you want to focus on them) so that you can make a guess about which one you think it is you have actually measured. Marc James W. MacDonald wrote: > Hi Andrew, > > Andrew Yee wrote: >> Apologies if this has been asked before, but how does an annotation >> package handle an ambiguous probe set ID mapping? >> >> Take for example the Affymetrix chip U133X3P. >> >> When I use the annotation for this chip for probe set ID >> 1552641_3p_s_at, it returns only one match: >> >>> library('u133x3p.db') >>> mget('1552641_3p_s_at', env=u133x3pSYMBOL) >> $`1552641_3p_s_at` >> [1] "ATAD3B" >>> mget('1552641_3p_s_at', env=u133x3pENTREZID) >> $`1552641_3p_s_at` >> [1] "83858" >> >> However, when I search Affymetrix, with: >> >> https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=U133 _X3P:1552641_3P_S_AT >> >> >> it states that it ambiguously maps to three gene symbols, ATAD3A, >> ATAD3B, and LOC732419. >> >> How does the annotation package determine which gene symbol it should >> map to? > > In the past we just used the first probeset ==> Entrez Gene ID > mapping. However, in the soon to be released BioC 2.5 annotation > packages all the mappings are included (thanks to Marc Carlson). > > > tmp <- toggleProbes(u133x3pENTREZID, "all") > > get('1552641_3p_s_at', tmp) > [1] "55210" "732419" "83858" > > tmp2 <- toggleProbes(u133x3pSYMBOL, "all") > > get('1552641_3p_s_at', tmp2) > [1] "ATAD3A" "LOC732419" "ATAD3B" > > Oddly enough, this probeset isn't mapped in the 'regular' mappings: > > > get('1552641_3p_s_at', u133x3pENTREZID) > [1] NA > > get('1552641_3p_s_at', u133x3pSYMBOL) > [1] NA > > Marc? > > > sessionInfo() > R version 2.10.0 Under development (unstable) (2009-09-21 r49780) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] u133x3p.db_2.3.5 org.Hs.eg.db_2.3.4 RSQLite_0.7-2 > [4] DBI_0.2-4 AnnotationDbi_1.7.17 Biobase_2.5.6 > > loaded via a namespace (and not attached): > [1] tools_2.10.0 > > > > Best, > > Jim > > >> >> Thanks, >> Andrew >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Thanks Marc and James, this discussion has been very informative! Andrew On Mon, Oct 19, 2009 at 1:53 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Thank you Jim, > > The probeset is not displayed in the regular mappings precisely because > it maps to multiple things. Because it is of ambiguous assignment, most > people will probably want to avoid it the majority of the time. Also, > legacy code that depends on getting one value back for such operations > needs to be able to continue working. However, as you have so carefully > illustrated, you can now get the complete data for any mapping by using > toggleProbes(). There are 3 settings for any mapping that can be set > using toggleProbes: "all" which gives you every mapping regardless of > what it is, "multiple" which only exposes mappings where many probes map > to the same item, and "single" which is the default and which will > expose only those probe IDs that have been assigned by the manufacturer > to a single gene. So if you don't use "all" then the troublesome > ambiguously assigned probes will not be represented in the mapping (ie. > you will get an NA). The majority of the time, probes are assigned to a > single gene and so normally most things are represented just fine by > "single". This default has the side benefit that it shields you from > those probes where the manufacturer is less than certain about the > identity. But for those cases where there are multiple genes assigned > to a probe or probeset, you can now also get all the assignments out if > you wish (or just the troublesome ones if you want to focus on them) so > that you can make a guess about which one you think it is you have > actually measured. > > > Marc > > > > > James W. MacDonald wrote: > > Hi Andrew, > > > > Andrew Yee wrote: > >> Apologies if this has been asked before, but how does an annotation > >> package handle an ambiguous probe set ID mapping? > >> > >> Take for example the Affymetrix chip U133X3P. > >> > >> When I use the annotation for this chip for probe set ID > >> 1552641_3p_s_at, it returns only one match: > >> > >>> library('u133x3p.db') > >>> mget('1552641_3p_s_at', env=u133x3pSYMBOL) > >> $`1552641_3p_s_at` > >> [1] "ATAD3B" > >>> mget('1552641_3p_s_at', env=u133x3pENTREZID) > >> $`1552641_3p_s_at` > >> [1] "83858" > >> > >> However, when I search Affymetrix, with: > >> > >> > https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=U133_ X3P:1552641_3P_S_AT > >> > >> > >> it states that it ambiguously maps to three gene symbols, ATAD3A, > >> ATAD3B, and LOC732419. > >> > >> How does the annotation package determine which gene symbol it should > >> map to? > > > > In the past we just used the first probeset ==> Entrez Gene ID > > mapping. However, in the soon to be released BioC 2.5 annotation > > packages all the mappings are included (thanks to Marc Carlson). > > > > > tmp <- toggleProbes(u133x3pENTREZID, "all") > > > get('1552641_3p_s_at', tmp) > > [1] "55210" "732419" "83858" > > > tmp2 <- toggleProbes(u133x3pSYMBOL, "all") > > > get('1552641_3p_s_at', tmp2) > > [1] "ATAD3A" "LOC732419" "ATAD3B" > > > > Oddly enough, this probeset isn't mapped in the 'regular' mappings: > > > > > get('1552641_3p_s_at', u133x3pENTREZID) > > [1] NA > > > get('1552641_3p_s_at', u133x3pSYMBOL) > > [1] NA > > > > Marc? > > > > > sessionInfo() > > R version 2.10.0 Under development (unstable) (2009-09-21 r49780) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] u133x3p.db_2.3.5 org.Hs.eg.db_2.3.4 RSQLite_0.7-2 > > [4] DBI_0.2-4 AnnotationDbi_1.7.17 Biobase_2.5.6 > > > > loaded via a namespace (and not attached): > > [1] tools_2.10.0 > > > > > > > Best, > > > > Jim > > > > > >> > >> Thanks, > >> Andrew > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6