Fwd: How to handle the case a Affymetrix probe set ID mapped to multiple genes?
1
0
Entering edit mode
Levi Waldron ★ 1.1k
@levi-waldron-3429
Last seen 10 weeks ago
CUNY Graduate School of Public Health a…
On Tue, Jul 30, 2013 at 9:14 AM, Feng Tian <fengtian@bu.edu> wrote: > Hi Levi, > > Thanks for your reply very much. > My purpose is to do GSEA analysis. So is there a general way to handle > these "_x" probes? > > Regards, > Feng > After mapping, I would just drop anything with "///" for GSEA analysis. I suppose you could also choose one representative, or if you are using the Broad's tool, provide probe sets and let it deal with the mapping (although I don't know how it deals with non-specific probe sets). I doubt such probe sets will have much effect on GSEA results, since most of those genes will have a more specific probeset available. E.g.: > library(hgu133plus2.db) > x=as.character(hgu133plus2SYMBOL) > length(x) [1] 41293 #probe sets > length(unique(x)) [1] 19944 #gene symbols > ind=grep("_x", names(x)) > summary(x[ind] %in% x[-ind]) Mode FALSE TRUE NA's logical 623 2469 0 > So for hgu133plus2 you would lose 623 out of 19944 genes - IMO if that changes your GSEA in an important way, it probably wasn't a robust result anyways. > > On Tue, Jul 30, 2013 at 9:00 AM, Levi Waldron <lwaldron.research@gmail.com> > wrote: > >> Hi Feng, >> >> probe sets labelled with "_x" cross-hybridize to multiple genes: >> >> http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp >> >> Genecards gives more detail for this probe set: >> >> >> http://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?keyw ord_type=probe_set_id&array=HG-U133&target=genecards&keyword=200012_x_ at >> >> How to handle such a case depends on how interested you are in that probe >> set; at the extremes you could ignore it, or follow up with PCR to >> establish which transcript you are observing. >> >> -Levi >> >> >> On Mon, Jul 29, 2013 at 6:06 PM, Feng Tian <fengtian@bu.edu> wrote: >> >> > Dear all, >> > >> > In the Affymetrix annotation file, I find that some probe set ID are >> mapped >> > to multiple genes separated by '///', such as 200012_x_at is mapped >> > to RPL21P16///RPL21P119///RPL21. How to handle this case? >> > >> > Thank you! >> > >> > Feng >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> >> >> -- >> Levi Waldron >> Post-doctoral fellow >> Department of Biostatistics, Harvard School of Public Health >> Department of Biostatistics and Computational Biology, Dana-Farber Cancer >> Institute >> Building 1, room 412C >> 655 Huntington Avenue >> Boston, Massachusetts 02115 >> mobile: (617) 851-6849 >> fax: (617) 432-5619 >> http://www.hsph.harvard.edu/research/levi-waldron/ >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
Annotation hgu133plus2 probe Annotation hgu133plus2 probe • 1.7k views
ADD COMMENT
0
Entering edit mode
Yuan Hao ▴ 240
@yuan-hao-3658
Last seen 9.6 years ago
United States
GSEA mostly uses entrez gene ids during test. Most "_x" probe sets eventually won't have corresponding entrez ids mapped to, which would be automatically excluded before the test, so they shouldn't be a problem for you. Cheers, Yuan On Jul 30, 2013, at 9:43 AM, Levi Waldron <lwaldron.research at="" gmail.com=""> wrote: > On Tue, Jul 30, 2013 at 9:14 AM, Feng Tian <fengtian at="" bu.edu=""> wrote: > >> Hi Levi, >> >> Thanks for your reply very much. >> My purpose is to do GSEA analysis. So is there a general way to handle >> these "_x" probes? >> >> Regards, >> Feng >> > > After mapping, I would just drop anything with "///" for GSEA analysis. I > suppose you could also choose one representative, or if you are using the > Broad's tool, provide probe sets and let it deal with the mapping (although > I don't know how it deals with non-specific probe sets). I doubt such > probe sets will have much effect on GSEA results, since most of those genes > will have a more specific probeset available. E.g.: > >> library(hgu133plus2.db) >> x=as.character(hgu133plus2SYMBOL) >> length(x) > [1] 41293 #probe sets >> length(unique(x)) > [1] 19944 #gene symbols >> ind=grep("_x", names(x)) >> summary(x[ind] %in% x[-ind]) > Mode FALSE TRUE NA's > logical 623 2469 0 >> > > So for hgu133plus2 you would lose 623 out of 19944 genes - IMO if that > changes your GSEA in an important way, it probably wasn't a robust result > anyways. > > > > > > >> >> On Tue, Jul 30, 2013 at 9:00 AM, Levi Waldron <lwaldron.research at="" gmail.com="">>> wrote: >> >>> Hi Feng, >>> >>> probe sets labelled with "_x" cross-hybridize to multiple genes: >>> >>> http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp >>> >>> Genecards gives more detail for this probe set: >>> >>> >>> http://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?key word_type=probe_set_id&array=HG-U133&target=genecards&keyword=200012_x _at >>> >>> How to handle such a case depends on how interested you are in that probe >>> set; at the extremes you could ignore it, or follow up with PCR to >>> establish which transcript you are observing. >>> >>> -Levi >>> >>> >>> On Mon, Jul 29, 2013 at 6:06 PM, Feng Tian <fengtian at="" bu.edu=""> wrote: >>> >>>> Dear all, >>>> >>>> In the Affymetrix annotation file, I find that some probe set ID are >>> mapped >>>> to multiple genes separated by '///', such as 200012_x_at is mapped >>>> to RPL21P16///RPL21P119///RPL21. How to handle this case? >>>> >>>> Thank you! >>>> >>>> Feng >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >>> >>> -- >>> Levi Waldron >>> Post-doctoral fellow >>> Department of Biostatistics, Harvard School of Public Health >>> Department of Biostatistics and Computational Biology, Dana-Farber Cancer >>> Institute >>> Building 1, room 412C >>> 655 Huntington Avenue >>> Boston, Massachusetts 02115 >>> mobile: (617) 851-6849 >>> fax: (617) 432-5619 >>> http://www.hsph.harvard.edu/research/levi-waldron/ >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On Tue, Jul 30, 2013 at 11:07 AM, Yuan Hao <yuan.x.hao@gmail.com> wrote: > GSEA mostly uses entrez gene ids during test. Most "_x" probe sets > eventually won't have corresponding entrez ids mapped to, which would be > automatically excluded before the test, so they shouldn't be a problem for > you. > > Cheers, > Yuan > I think the number of excluded genes is the same whether you use symbols or Entrez IDs: > library(hgu133plus2.db) > x=as.character(hgu133plus2ENTREZID) > length(x) [1] 41293 > length(unique(x)) [1] 19944 > ind=grep("_x", names(x)) > summary(x[ind] %in% x[-ind]) Mode FALSE TRUE NA's logical 623 2469 0 > head(x) 1053_at 117_at 121_at 1255_g_at 1294_at 1316_at "5982" "3310" "7849" "2978" "7318" "7067" > > On Jul 30, 2013, at 9:43 AM, Levi Waldron <lwaldron.research@gmail.com> > wrote: > > > On Tue, Jul 30, 2013 at 9:14 AM, Feng Tian <fengtian@bu.edu> wrote: > > > >> Hi Levi, > >> > >> Thanks for your reply very much. > >> My purpose is to do GSEA analysis. So is there a general way to handle > >> these "_x" probes? > >> > >> Regards, > >> Feng > >> > > > > After mapping, I would just drop anything with "///" for GSEA analysis. I > > suppose you could also choose one representative, or if you are using the > > Broad's tool, provide probe sets and let it deal with the mapping > (although > > I don't know how it deals with non-specific probe sets). I doubt such > > probe sets will have much effect on GSEA results, since most of those > genes > > will have a more specific probeset available. E.g.: > > > >> library(hgu133plus2.db) > >> x=as.character(hgu133plus2SYMBOL) > >> length(x) > > [1] 41293 #probe sets > >> length(unique(x)) > > [1] 19944 #gene symbols > >> ind=grep("_x", names(x)) > >> summary(x[ind] %in% x[-ind]) > > Mode FALSE TRUE NA's > > logical 623 2469 0 > >> > > > > So for hgu133plus2 you would lose 623 out of 19944 genes - IMO if that > > changes your GSEA in an important way, it probably wasn't a robust result > > anyways. > > > > > > > > > > > > > >> > >> On Tue, Jul 30, 2013 at 9:00 AM, Levi Waldron < > lwaldron.research@gmail.com > >>> wrote: > >> > >>> Hi Feng, > >>> > >>> probe sets labelled with "_x" cross-hybridize to multiple genes: > >>> > >>> http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp > >>> > >>> Genecards gives more detail for this probe set: > >>> > >>> > >>> > http://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?keywo rd_type=probe_set_id&array=HG-U133&target=genecards&keyword=200012_x_a t > >>> > >>> How to handle such a case depends on how interested you are in that > probe > >>> set; at the extremes you could ignore it, or follow up with PCR to > >>> establish which transcript you are observing. > >>> > >>> -Levi > >>> > >>> > >>> On Mon, Jul 29, 2013 at 6:06 PM, Feng Tian <fengtian@bu.edu> wrote: > >>> > >>>> Dear all, > >>>> > >>>> In the Affymetrix annotation file, I find that some probe set ID are > >>> mapped > >>>> to multiple genes separated by '///', such as 200012_x_at is mapped > >>>> to RPL21P16///RPL21P119///RPL21. How to handle this case? > >>>> > >>>> Thank you! > >>>> > >>>> Feng > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> > >>> > >>> -- > >>> Levi Waldron > >>> Post-doctoral fellow > >>> Department of Biostatistics, Harvard School of Public Health > >>> Department of Biostatistics and Computational Biology, Dana- Farber > Cancer > >>> Institute > >>> Building 1, room 412C > >>> 655 Huntington Avenue > >>> Boston, Massachusetts 02115 > >>> mobile: (617) 851-6849 > >>> fax: (617) 432-5619 > >>> http://www.hsph.harvard.edu/research/levi-waldron/ > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6