Which genes are in the GO count column?
5
0
Entering edit mode
@ingrid-h-g-stensen-1971
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070820/ e8ca235e/attachment.pl
• 1.2k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
"James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > Hi Ingrid, > > Ingrid H. G. ?stensen wrote: >> Hi >> >> I am testing for GO in my dataset and I am able to make html pages >> that contains different type of information. But I was wondering if >> there is some way to find out which genes are in the Count column? It >> might say 2, but not which 2 genes. > > See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. > Note that for probeSetSummary() to work correctly you have to pass in a > *named* vector of Entrez Gene IDs, which you can get by using unlist(): > > my.named.probeids <- unlist(mget(probeID.vector, > "chip.annotation.package.name")) So assuming the OP is using GOstats, R-2.5.x, and the latest available version installed using biocLite... Please try help("HyperGResult-accessors") + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > I am testing for GO in my dataset and I am able to make html pages > that contains different type of information. But I was wondering if > there is some way to find out which genes are in the Count column? It > might say 2, but not which 2 genes. See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. Note that for probeSetSummary() to work correctly you have to pass in a *named* vector of Entrez Gene IDs, which you can get by using unlist(): my.named.probeids <- unlist(mget(probeID.vector, "chip.annotation.package.name")) Best, Jim > > Regards, Ingrid > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
@ingrid-h-g-stensen-1971
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070821/ 6614c836/attachment.pl
ADD COMMENT
0
Entering edit mode
Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > Thanks for tips but I still a bit lost. This is what I have done: > > > # Lots of QC and limma things. > :-) > > probe <- top2[,1] #top2 is from topTable > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, > ifnotfound=NA))) It appears that unique() strips off the names, so you should probably substitute something like this: sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) sigLL <- sigLL[!duplicated(sigLL)] > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > params <- new("GOHyperGParams", geneIds= sigLL, > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > conditional=FALSE, testDirection="under") > hgOver <- hyperGTest(params) > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > htmlReport(hgOver,file=res_filNavn) > > > summary(hgOver) > GOCCID Pvalue OddsRatio ExpCount Count > Size Term > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 > 2345 organelle part > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 > intracellular organelle part > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 > 3989 nucleus > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 > ribonucleoprotein complex > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 > 850 nuclear part > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 > 246 ribosome > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 > 12107 cell > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 > 12106 cell part > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 > 510 nuclear lumen > > > # Find the ID in the count colunm > probeSetSummary(hgOver) > > # This gives me all the genes (some entrez id are dublicatet because > of their linkage to different probes) but I get a > warning message: > > Warning message: > The vector of geneIds used to create the GOHyperGParams object was > not a named vector. > If you want to know the probesets that contributed to this result > you need to pass a named vector for geneIds. > > > > > I have tried to make a named vector but apparently I do not understand > what it is, how can I make it work? > And how can I get the probeSetSummary into a file? Any suggestions? Sure. As I mentioned in my first email, you can use hyperG2annaffy() in affycoretools. Alternatively you can always use write.table(). Best, Jim > > Regards, > Ingrid > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > > Hi Ingrid, > > > > Ingrid H. G. ?stensen wrote: > >> Hi > >> > >> I am testing for GO in my dataset and I am able to make html pages > >> that contains different type of information. But I was wondering if > >> there is some way to find out which genes are in the Count column? It > >> might say 2, but not which 2 genes. > > > > See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. > > Note that for probeSetSummary() to work correctly you have to pass in a > > *named* vector of Entrez Gene IDs, which you can get by using unlist(): > > > > my.named.probeids <- unlist(mget(probeID.vector, > > "chip.annotation.package.name")) > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > version installed using biocLite... > > Please try > > help("HyperGResult-accessors") > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > BioC: http://bioconductor.org/ > Blog: http://userprimary.net/user/ > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
Hi Ingrid, You're loosing the "named" part of your list when you use unique here: sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA))) try looking at the object (head(sigLL) maybe) with and without the unique applied and you'll see what the named vector looks like. how about something like this instead: uniqGenes <- findLargest(top2$ID, top2$t, "illuminaHumanv2") top2 <- top2[top2$ID %in% uniqGenes,] check ?findLargest which is part of the genefilter package or maybe think on something on the lines of: myEntrez <- unlist(mget(top2$ID, env=illuminaHumanv2ENTREZID)) myEntrez <- myEntrez[!duplicated(myEntrez)] which you can now use as an input for your GOHyperGParams object In either case I would probably subset the top2 table first by removing any probe that doesn't have a ENTREZID: entrezIds <- mget(top2$ID, env=illuminaHumanv2ENTREZID,ifnotfound=NA) withEntrez <- top2$ID[!is.na(entrezIds)] top2 <- top2[top2$ID %in% withEntrez,] Hope it helps (since I didn't test the code), Cei Ingrid H. G. ?stensen wrote: > Hi > > Thanks for tips but I still a bit lost. This is what I have done: > > > # Lots of QC and limma things. :-) > > probe <- top2[,1] #top2 is from topTable > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA))) > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > params <- new("GOHyperGParams", geneIds= sigLL, annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > conditional=FALSE, testDirection="under") > hgOver <- hyperGTest(params) > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > htmlReport(hgOver,file=res_filNavn) > > > summary(hgOver) > GOCCID Pvalue OddsRatio ExpCount Count Size Term > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 2345 organelle part > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 intracellular organelle part > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 3989 nucleus > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 ribonucleoprotein complex > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 850 nuclear part > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 246 ribosome > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 12107 cell > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 12106 cell part > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 510 nuclear lumen > > > # Find the ID in the count colunm > probeSetSummary(hgOver) > > # This gives me all the genes (some entrez id are dublicatet because of their linkage to different probes) but I get a > warning message: > > Warning message: > The vector of geneIds used to create the GOHyperGParams object was not a named vector. > If you want to know the probesets that contributed to this result > you need to pass a named vector for geneIds. > > > > > I have tried to make a named vector but apparently I do not understand what it is, how can I make it work? > And how can I get the probeSetSummary into a file? Any suggestions? > > Regards, > Ingrid > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > >> Hi Ingrid, >> >> Ingrid H. G. ?stensen wrote: >> >>> Hi >>> >>> I am testing for GO in my dataset and I am able to make html pages >>> that contains different type of information. But I was wondering if >>> there is some way to find out which genes are in the Count column? It >>> might say 2, but not which 2 genes. >>> >> See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. >> Note that for probeSetSummary() to work correctly you have to pass in a >> *named* vector of Entrez Gene IDs, which you can get by using unlist(): >> >> my.named.probeids <- unlist(mget(probeID.vector, >> "chip.annotation.package.name")) >> > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > version installed using biocLite... > > Please try > > help("HyperGResult-accessors") > > + seth > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Cei Abreu-Goodger, PhD Wellcome Trust Sanger Institute Computational and Functional Genomics Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD REPLY
0
Entering edit mode
@ingrid-h-g-stensen-1971
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070821/ 86e38511/attachment.pl
ADD COMMENT
0
Entering edit mode
Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > I tried: > > sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) > sigLL <- sigLL[!duplicated(sigLL)] > > insted of: > > sigLL <- unique(unlist(mget(probe, > env=illuminaHumanv2ENTREZID,ifnotfound=NA))) > > and then: > > probeSetSummary(hgOver, sigLL) > > Now I get (all?) the GO, not only the ones found in summary(hgOver), and > I still get the error message regrading the named vector. Did you read the help page for probeSetSummary()? If so, you should have seen this: Value: A 'list' of 'data.frame'. Each element of the list corresponds to one of the GO terms (the term is provides as the name of the element). Each 'data.frame' has three columns: the Entrez Gene ID ('EntrezID'), the probe set ID ('ProbeSetID'), and a 0/1 indicator of whether the probe set ID was provided as part of the initial input ('selected') Which should have helped explain the output you get. Anyway, I don't see the same problems you do. Are you using a current version of R/BioC (the posting guide _does_ ask you to give the output of sessionInfo()). > sel <- unlist(mget(ls(illuminaHumanv2ENTREZID)[sample(1:48000, 300)],illuminaHumanv2ENTREZID, ifnotfound=NA)) > sel <- sel[!is.na(sel)] > sel <- sel[!duplicated(sel)] > univ <- unique(getLL(ls(illuminaHumanv2ENTREZID), "illuminaHumanv2")) > p <- new("GOHyperGParams", geneIds=sel, universeGeneIds=univ, ontology="BP", annotation="illuminaHumanv2", conditional=T) > hyp <- hyperGTest(p) > ps <- probeSetSummary(hyp, categorySize=10) > ps[[1]] EntrezID ProbeSetID selected ILMN_2110 841 ILMN_2110 0 ILMN_29186 841 ILMN_29186 1 ILMN_29639 841 ILMN_29639 0 ILMN_5213 133396 ILMN_5213 1 > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "splines" "tools" "stats" [4] "graphics" "grDevices" "datasets" [7] "utils" "methods" "base" other attached packages: illuminaHumanv2 hgu133plus2 GOstats "1.2.0" "1.16.0" "2.2.6" Category Matrix lattice "2.2.3" "0.999375-0" "0.15-11" genefilter survival KEGG "1.14.1" "2.32" "1.16.0" RBGL annotate Biobase "1.12.0" "1.14.1" "1.14.0" GO graph rcompgen "1.16.0" "1.14.2" "0.1-13" Best, Jim > > And writing probeSetSummary object to file with write or write.table > does not work. > > Maybe I have to look into the affycoretool package to morrow. :-) > > Regards, > Ingrid > Hi Ingrid, > > Ingrid H. G. ?stensen wrote: > > Hi > > > > Thanks for tips but I still a bit lost. This is what I have done: > > > > > > # Lots of QC and limma things. > > > :-) > > > > probe <- top2[,1] #top2 is from topTable > > > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, > > ifnotfound=NA))) > > It appears that unique() strips off the names, so you should probably > substitute something like this: > > sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) > sigLL <- sigLL[!duplicated(sigLL)] > > > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > > > params <- new("GOHyperGParams", geneIds= sigLL, > > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > > conditional=FALSE, testDirection="under") > > hgOver <- hyperGTest(params) > > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > > htmlReport(hgOver,file=res_filNavn) > > > > > > summary(hgOver) > > GOCCID Pvalue OddsRatio ExpCount Count > > Size Term > > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 > > 2345 organelle part > > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 > > intracellular organelle part > > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 > > 3989 nucleus > > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 > > ribonucleoprotein complex > > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 > > 850 nuclear part > > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 > > 246 ribosome > > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 > > 12107 cell > > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 > > 12106 cell part > > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 > > 510 nuclear lumen > > > > > > # Find the ID in the count colunm > > probeSetSummary(hgOver) > > > > # This gives me all the genes (some entrez id are dublicatet because > > of their linkage to different probes) but I get a > > warning message: > > > > Warning message: > > The vector of geneIds used to create the GOHyperGParams object was > > not a named vector. > > If you want to know the probesets that contributed to this result > > you need to pass a named vector for geneIds. > > > > > > > > > > I have tried to make a named vector but apparently I do not understand > > what it is, how can I make it work? > > And how can I get the probeSetSummary into a file? Any suggestions? > > Sure. As I mentioned in my first email, you can use hyperG2annaffy() in > affycoretools. Alternatively you can always use write.table(). > > Best, > > Jim > > > > > > Regards, > > Ingrid > > > > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > > > > Hi Ingrid, > > > > > > Ingrid H. G. ?stensen wrote: > > >> Hi > > >> > > >> I am testing for GO in my dataset and I am able to make html pages > > >> that contains different type of information. But I was wondering if > > >> there is some way to find out which genes are in the Count column? It > > >> might say 2, but not which 2 genes. > > > > > > See probeSetSummary() in GOstats and hyperG2annaffy() in > affycoretools. > > > Note that for probeSetSummary() to work correctly you have to pass > in a > > > *named* vector of Entrez Gene IDs, which you can get by using > unlist(): > > > > > > my.named.probeids <- unlist(mget(probeID.vector, > > > "chip.annotation.package.name")) > > > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > > version installed using biocLite... > > > > Please try > > > > help("HyperGResult-accessors") > > > > + seth > > > > -- > > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research > Center > > BioC: http://bioconductor.org/ > > Blog: http://userprimary.net/user/ > > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
@ingrid-h-g-stensen-1971
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070824/ 5463a377/attachment.pl
ADD COMMENT

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6