Which genes are in the GO count column?

0

Entering edit mode

Ingrid H. G. Østensen ▴ 700

@ingrid-h-g-stensen-1971

Last seen 11.5 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070820/ e8ca235e/attachment.pl

• 1.5k views

ADD COMMENT • link 18.5 years ago Ingrid H. G. Østensen ▴ 700

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.5 years ago

"James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > Hi Ingrid, > > Ingrid H. G. ?stensen wrote: >> Hi >> >> I am testing for GO in my dataset and I am able to make html pages >> that contains different type of information. But I was wondering if >> there is some way to find out which genes are in the Count column? It >> might say 2, but not which 2 genes. > > See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. > Note that for probeSetSummary() to work correctly you have to pass in a > *named* vector of Entrez Gene IDs, which you can get by using unlist(): > > my.named.probeids <- unlist(mget(probeID.vector, > "chip.annotation.package.name")) So assuming the OP is using GOstats, R-2.5.x, and the latest available version installed using biocLite... Please try help("HyperGResult-accessors") + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/

ADD COMMENT • link 18.5 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > I am testing for GO in my dataset and I am able to make html pages > that contains different type of information. But I was wondering if > there is some way to find out which genes are in the Count column? It > might say 2, but not which 2 genes. See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. Note that for probeSetSummary() to work correctly you have to pass in a *named* vector of Entrez Gene IDs, which you can get by using unlist(): my.named.probeids <- unlist(mget(probeID.vector, "chip.annotation.package.name")) Best, Jim > > Regards, Ingrid > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 18.5 years ago James W. MacDonald 68k

0

Entering edit mode

Ingrid H. G. Østensen ▴ 700

@ingrid-h-g-stensen-1971

Last seen 11.5 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070821/ 6614c836/attachment.pl

ADD COMMENT • link 18.5 years ago Ingrid H. G. Østensen ▴ 700

0

Entering edit mode

Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > Thanks for tips but I still a bit lost. This is what I have done: > > > # Lots of QC and limma things. > :-) > > probe <- top2[,1] #top2 is from topTable > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, > ifnotfound=NA))) It appears that unique() strips off the names, so you should probably substitute something like this: sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) sigLL <- sigLL[!duplicated(sigLL)] > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > params <- new("GOHyperGParams", geneIds= sigLL, > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > conditional=FALSE, testDirection="under") > hgOver <- hyperGTest(params) > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > htmlReport(hgOver,file=res_filNavn) > > > summary(hgOver) > GOCCID Pvalue OddsRatio ExpCount Count > Size Term > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 > 2345 organelle part > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 > intracellular organelle part > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 > 3989 nucleus > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 > ribonucleoprotein complex > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 > 850 nuclear part > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 > 246 ribosome > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 > 12107 cell > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 > 12106 cell part > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 > 510 nuclear lumen > > > # Find the ID in the count colunm > probeSetSummary(hgOver) > > # This gives me all the genes (some entrez id are dublicatet because > of their linkage to different probes) but I get a > warning message: > > Warning message: > The vector of geneIds used to create the GOHyperGParams object was > not a named vector. > If you want to know the probesets that contributed to this result > you need to pass a named vector for geneIds. > > > > > I have tried to make a named vector but apparently I do not understand > what it is, how can I make it work? > And how can I get the probeSetSummary into a file? Any suggestions? Sure. As I mentioned in my first email, you can use hyperG2annaffy() in affycoretools. Alternatively you can always use write.table(). Best, Jim > > Regards, > Ingrid > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > > Hi Ingrid, > > > > Ingrid H. G. ?stensen wrote: > >> Hi > >> > >> I am testing for GO in my dataset and I am able to make html pages > >> that contains different type of information. But I was wondering if > >> there is some way to find out which genes are in the Count column? It > >> might say 2, but not which 2 genes. > > > > See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. > > Note that for probeSetSummary() to work correctly you have to pass in a > > *named* vector of Entrez Gene IDs, which you can get by using unlist(): > > > > my.named.probeids <- unlist(mget(probeID.vector, > > "chip.annotation.package.name")) > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > version installed using biocLite... > > Please try > > help("HyperGResult-accessors") > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > BioC: http://bioconductor.org/ > Blog: http://userprimary.net/user/ > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 18.5 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Ingrid, You're loosing the "named" part of your list when you use unique here: sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA))) try looking at the object (head(sigLL) maybe) with and without the unique applied and you'll see what the named vector looks like. how about something like this instead: uniqGenes <- findLargest(top2$ID, top2$t, "illuminaHumanv2") top2 <- top2[top2$ID %in% uniqGenes,] check ?findLargest which is part of the genefilter package or maybe think on something on the lines of: myEntrez <- unlist(mget(top2$ID, env=illuminaHumanv2ENTREZID)) myEntrez <- myEntrez[!duplicated(myEntrez)] which you can now use as an input for your GOHyperGParams object In either case I would probably subset the top2 table first by removing any probe that doesn't have a ENTREZID: entrezIds <- mget(top2$ID, env=illuminaHumanv2ENTREZID,ifnotfound=NA) withEntrez <- top2$ID[!is.na(entrezIds)] top2 <- top2[top2$ID %in% withEntrez,] Hope it helps (since I didn't test the code), Cei Ingrid H. G. ?stensen wrote: > Hi > > Thanks for tips but I still a bit lost. This is what I have done: > > > # Lots of QC and limma things. :-) > > probe <- top2[,1] #top2 is from topTable > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, ifnotfound=NA))) > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > params <- new("GOHyperGParams", geneIds= sigLL, annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > conditional=FALSE, testDirection="under") > hgOver <- hyperGTest(params) > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > htmlReport(hgOver,file=res_filNavn) > > > summary(hgOver) > GOCCID Pvalue OddsRatio ExpCount Count Size Term > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 2345 organelle part > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 intracellular organelle part > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 3989 nucleus > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 ribonucleoprotein complex > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 850 nuclear part > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 246 ribosome > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 12107 cell > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 12106 cell part > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 510 nuclear lumen > > > # Find the ID in the count colunm > probeSetSummary(hgOver) > > # This gives me all the genes (some entrez id are dublicatet because of their linkage to different probes) but I get a > warning message: > > Warning message: > The vector of geneIds used to create the GOHyperGParams object was not a named vector. > If you want to know the probesets that contributed to this result > you need to pass a named vector for geneIds. > > > > > I have tried to make a named vector but apparently I do not understand what it is, how can I make it work? > And how can I get the probeSetSummary into a file? Any suggestions? > > Regards, > Ingrid > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > >> Hi Ingrid, >> >> Ingrid H. G. ?stensen wrote: >> >>> Hi >>> >>> I am testing for GO in my dataset and I am able to make html pages >>> that contains different type of information. But I was wondering if >>> there is some way to find out which genes are in the Count column? It >>> might say 2, but not which 2 genes. >>> >> See probeSetSummary() in GOstats and hyperG2annaffy() in affycoretools. >> Note that for probeSetSummary() to work correctly you have to pass in a >> *named* vector of Entrez Gene IDs, which you can get by using unlist(): >> >> my.named.probeids <- unlist(mget(probeID.vector, >> "chip.annotation.package.name")) >> > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > version installed using biocLite... > > Please try > > help("HyperGResult-accessors") > > + seth > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Cei Abreu-Goodger, PhD Wellcome Trust Sanger Institute Computational and Functional Genomics Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

ADD REPLY • link 18.5 years ago Cei Abreu-Goodger ▴ 830

0

Entering edit mode

Ingrid H. G. Østensen ▴ 700

@ingrid-h-g-stensen-1971

Last seen 11.5 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070821/ 86e38511/attachment.pl

ADD COMMENT • link 18.5 years ago Ingrid H. G. Østensen ▴ 700

0

Entering edit mode

Hi Ingrid, Ingrid H. G. ?stensen wrote: > Hi > > I tried: > > sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) > sigLL <- sigLL[!duplicated(sigLL)] > > insted of: > > sigLL <- unique(unlist(mget(probe, > env=illuminaHumanv2ENTREZID,ifnotfound=NA))) > > and then: > > probeSetSummary(hgOver, sigLL) > > Now I get (all?) the GO, not only the ones found in summary(hgOver), and > I still get the error message regrading the named vector. Did you read the help page for probeSetSummary()? If so, you should have seen this: Value: A 'list' of 'data.frame'. Each element of the list corresponds to one of the GO terms (the term is provides as the name of the element). Each 'data.frame' has three columns: the Entrez Gene ID ('EntrezID'), the probe set ID ('ProbeSetID'), and a 0/1 indicator of whether the probe set ID was provided as part of the initial input ('selected') Which should have helped explain the output you get. Anyway, I don't see the same problems you do. Are you using a current version of R/BioC (the posting guide _does_ ask you to give the output of sessionInfo()). > sel <- unlist(mget(ls(illuminaHumanv2ENTREZID)[sample(1:48000, 300)],illuminaHumanv2ENTREZID, ifnotfound=NA)) > sel <- sel[!is.na(sel)] > sel <- sel[!duplicated(sel)] > univ <- unique(getLL(ls(illuminaHumanv2ENTREZID), "illuminaHumanv2")) > p <- new("GOHyperGParams", geneIds=sel, universeGeneIds=univ, ontology="BP", annotation="illuminaHumanv2", conditional=T) > hyp <- hyperGTest(p) > ps <- probeSetSummary(hyp, categorySize=10) > ps[[1]] EntrezID ProbeSetID selected ILMN_2110 841 ILMN_2110 0 ILMN_29186 841 ILMN_29186 1 ILMN_29639 841 ILMN_29639 0 ILMN_5213 133396 ILMN_5213 1 > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "splines" "tools" "stats" [4] "graphics" "grDevices" "datasets" [7] "utils" "methods" "base" other attached packages: illuminaHumanv2 hgu133plus2 GOstats "1.2.0" "1.16.0" "2.2.6" Category Matrix lattice "2.2.3" "0.999375-0" "0.15-11" genefilter survival KEGG "1.14.1" "2.32" "1.16.0" RBGL annotate Biobase "1.12.0" "1.14.1" "1.14.0" GO graph rcompgen "1.16.0" "1.14.2" "0.1-13" Best, Jim > > And writing probeSetSummary object to file with write or write.table > does not work. > > Maybe I have to look into the affycoretool package to morrow. :-) > > Regards, > Ingrid > Hi Ingrid, > > Ingrid H. G. ?stensen wrote: > > Hi > > > > Thanks for tips but I still a bit lost. This is what I have done: > > > > > > # Lots of QC and limma things. > > > :-) > > > > probe <- top2[,1] #top2 is from topTable > > > > sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID, > > ifnotfound=NA))) > > It appears that unique() strips off the names, so you should probably > substitute something like this: > > sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID)) > sigLL <- sigLL[!duplicated(sigLL)] > > > sigLL <- as.character(sigLL[!is.na(sigLL)]) > > > > params <- new("GOHyperGParams", geneIds= sigLL, > > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, > > conditional=FALSE, testDirection="under") > > hgOver <- hyperGTest(params) > > res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "") > > htmlReport(hgOver,file=res_filNavn) > > > > > > summary(hgOver) > > GOCCID Pvalue OddsRatio ExpCount Count > > Size Term > > GO:0044422 GO:0044422 0.002652398 0.6469686 65.993365 46 > > 2345 organelle part > > GO:0044446 GO:0044446 0.002652398 0.6469686 65.993365 46 2345 > > intracellular organelle part > > GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076 88 > > 3989 nucleus > > GO:0030529 GO:0030529 0.002771273 0.2913123 13.114246 4 466 > > ribonucleoprotein complex > > GO:0044428 GO:0044428 0.008533045 0.5194381 23.920836 13 > > 850 nuclear part > > GO:0005840 GO:0005840 0.028727650 0.2791575 6.922971 2 > > 246 ribosome > > GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129 331 > > 12107 cell > > GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987 331 > > 12106 cell part > > GO:0031981 GO:0031981 0.046628778 0.5403759 14.352502 8 > > 510 nuclear lumen > > > > > > # Find the ID in the count colunm > > probeSetSummary(hgOver) > > > > # This gives me all the genes (some entrez id are dublicatet because > > of their linkage to different probes) but I get a > > warning message: > > > > Warning message: > > The vector of geneIds used to create the GOHyperGParams object was > > not a named vector. > > If you want to know the probesets that contributed to this result > > you need to pass a named vector for geneIds. > > > > > > > > > > I have tried to make a named vector but apparently I do not understand > > what it is, how can I make it work? > > And how can I get the probeSetSummary into a file? Any suggestions? > > Sure. As I mentioned in my first email, you can use hyperG2annaffy() in > affycoretools. Alternatively you can always use write.table(). > > Best, > > Jim > > > > > > Regards, > > Ingrid > > > > > > "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes: > > > > > Hi Ingrid, > > > > > > Ingrid H. G. ?stensen wrote: > > >> Hi > > >> > > >> I am testing for GO in my dataset and I am able to make html pages > > >> that contains different type of information. But I was wondering if > > >> there is some way to find out which genes are in the Count column? It > > >> might say 2, but not which 2 genes. > > > > > > See probeSetSummary() in GOstats and hyperG2annaffy() in > affycoretools. > > > Note that for probeSetSummary() to work correctly you have to pass > in a > > > *named* vector of Entrez Gene IDs, which you can get by using > unlist(): > > > > > > my.named.probeids <- unlist(mget(probeID.vector, > > > "chip.annotation.package.name")) > > > > So assuming the OP is using GOstats, R-2.5.x, and the latest available > > version installed using biocLite... > > > > Please try > > > > help("HyperGResult-accessors") > > > > + seth > > > > -- > > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research > Center > > BioC: http://bioconductor.org/ > > Blog: http://userprimary.net/user/ > > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 18.5 years ago James W. MacDonald 68k

0

Entering edit mode

Ingrid H. G. Østensen ▴ 700

@ingrid-h-g-stensen-1971

Last seen 11.5 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070824/ 5463a377/attachment.pl

ADD COMMENT • link 18.5 years ago Ingrid H. G. Østensen ▴ 700

Login before adding your answer.