Quick start to linking GO terms and microarray data
5
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.3 years ago
Hi I want to investigate the GO terms associated with my microarray data (normally, a list of genes from topTable() in limma) I have read the vignettes for goTools and GOStats, and to be honest, I am still a little unclear what the overall process is, particularly if I am working with a custom array and not with affy or operon. Lets say, for example, I have my array data in a data.frame containing gene names. In a separate data frame I have a link between my gene names and LocusLink IDs. How do I: 1) Find the GO terms associated with subsets of my genes? (I realise I can use merge() to link my array data to the LocusLink ids, but what do I do then?) 2) Fins out if a particular GO term is statistically over-represented in a particular group Finally, is the only way to link into GO through LocusLink identifiers? Many thanks Mick
Microarray GO affy PROcess goTools GOstats Microarray GO affy PROcess goTools GOstats • 2.4k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> wrote: > Hi > > I want to investigate the GO terms associated with my microarray data > (normally, a list of genes from topTable() in limma) > > I have read the vignettes for goTools and GOStats, and to be honest, I > am still a little unclear what the overall process is, particularly if I > am working with a custom array and not with affy or operon. > > Lets say, for example, I have my array data in a data.frame containing > gene names. In a separate data frame I have a link between my gene > names and LocusLink IDs. How do I: > > 1) Find the GO terms associated with subsets of my genes? (I realise I > can use merge() to link my array data to the LocusLink ids, but what do > I do then?) > > 2) Fins out if a particular GO term is statistically over- represented in > a particular group Hi, Mick. I would take your locuslink IDs for your genes and dump out two lists to a text file: 1) All LocusIDs on your array. 2) All LoucsIDs in your genelist. Then use an external program or web tool such as DAVID/EASE to do the analysis. That said, there was some discussion on using straight locusIDs (rather than requiring a metadata package) in GOHyperG. I don't know where that conversion stands. As to your question about linking genes to GO, that is actually done at the transcript/protein level. Merging to entrez gene (locuslink) happens after the fact. Using various data sources, you can link by refseq, locuslink, ensembl ids, ucsc knowngenes, human invitational ids (human), and probably several others in species other than human. Sean
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.3 years ago
Thanks Sean, but I really wanted to demonstrate this in Bioconductor :-S I tried running the vignettes in goTools, the first time it froze up my PC for about 30 minutes and then gave out a cryptic message about coercing x to a list, the second time it froze up my PC and then R crashed with no warning :-S As far as I can tell, GOStats doesn't have any clear examples of simple mapping of microarray data to GO terms. Given that one of the major, fundamental tasks biologists want to do is find out functional information for significantly differentailly expressed genes, shouldn't this be a little easier, and a little more transparent, in bioconductor? Again, I ask, does anyone have any simple examples of going from a list of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the biological function/term associated with those identifiers) Many thanks Mick -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: 01 March 2006 11:44 To: michael watson (IAH-C); Bioconductor Subject: Re: [BioC] Quick start to linking GO terms and microarray data On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> wrote: > Hi > > I want to investigate the GO terms associated with my microarray data > (normally, a list of genes from topTable() in limma) > > I have read the vignettes for goTools and GOStats, and to be honest, I > am still a little unclear what the overall process is, particularly if I > am working with a custom array and not with affy or operon. > > Lets say, for example, I have my array data in a data.frame containing > gene names. In a separate data frame I have a link between my gene > names and LocusLink IDs. How do I: > > 1) Find the GO terms associated with subsets of my genes? (I realise I > can use merge() to link my array data to the LocusLink ids, but what do > I do then?) > > 2) Fins out if a particular GO term is statistically over- represented in > a particular group Hi, Mick. I would take your locuslink IDs for your genes and dump out two lists to a text file: 1) All LocusIDs on your array. 2) All LoucsIDs in your genelist. Then use an external program or web tool such as DAVID/EASE to do the analysis. That said, there was some discussion on using straight locusIDs (rather than requiring a metadata package) in GOHyperG. I don't know where that conversion stands. As to your question about linking genes to GO, that is actually done at the transcript/protein level. Merging to entrez gene (locuslink) happens after the fact. Using various data sources, you can link by refseq, locuslink, ensembl ids, ucsc knowngenes, human invitational ids (human), and probably several others in species other than human. Sean
ADD COMMENT
0
Entering edit mode
Hi Michael, > Again, I ask, does anyone have any simple examples of going from a list > of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the > biological function/term associated with those identifiers) I think the environment "GOLOCUSID2GO" in the GO package is what you are looking for. Use ?GOLOCUSID2GO to see the examples. HTH, Ting ______________________________________ Ting-Yuan Liu Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, WA, USA ______________________________________ > > Many thanks > Mick > > -----Original Message----- > From: Sean Davis [mailto:sdavis2 at mail.nih.gov] > Sent: 01 March 2006 11:44 > To: michael watson (IAH-C); Bioconductor > Subject: Re: [BioC] Quick start to linking GO terms and microarray data > > > > > On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> > wrote: > > > Hi > > > > I want to investigate the GO terms associated with my microarray data > > (normally, a list of genes from topTable() in limma) > > > > I have read the vignettes for goTools and GOStats, and to be honest, I > > am still a little unclear what the overall process is, particularly if > I > > am working with a custom array and not with affy or operon. > > > > Lets say, for example, I have my array data in a data.frame containing > > gene names. In a separate data frame I have a link between my gene > > names and LocusLink IDs. How do I: > > > > 1) Find the GO terms associated with subsets of my genes? (I realise I > > can use merge() to link my array data to the LocusLink ids, but what > do > > I do then?) > > > > 2) Fins out if a particular GO term is statistically over- represented > in > > a particular group > > Hi, Mick. > > I would take your locuslink IDs for your genes and dump out two lists to > a > text file: > > 1) All LocusIDs on your array. > 2) All LoucsIDs in your genelist. > > Then use an external program or web tool such as DAVID/EASE to do the > analysis. > > That said, there was some discussion on using straight locusIDs (rather > than > requiring a metadata package) in GOHyperG. I don't know where that > conversion stands. > > As to your question about linking genes to GO, that is actually done at > the > transcript/protein level. Merging to entrez gene (locuslink) happens > after > the fact. Using various data sources, you can link by refseq, > locuslink, > ensembl ids, ucsc knowngenes, human invitational ids (human), and > probably > several others in species other than human. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
Hi Mike, As Wolfgang already suggested you can do this with the biomaRt package. Here is how should do this: > library(biomaRt) Loading required package: XML Loading required package: RCurl > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") Checking attributes and filters ... ok > getGO(id=c(100,620),type="entrezgene",mart=mart) go_id go_description evidence_code 1 GO:0004000 adenosine deaminase activity TAS 2 GO:0016787 hydrolase activity IEA 3 GO:0009117 nucleotide metabolism IEA 4 GO:0009168 purine ribonucleoside monophosphate biosynthesis IEA 5 GO:0019735 antimicrobial humoral response (sensu Vertebrata) TAS 6 GO:0006955 immune response IMP 7 GO:0006955 immune response IEA 8 GO:0006163 purine nucleotide metabolism IMP 9 GO:0006163 purine nucleotide metabolism IEA 10 GO:0005737 cytoplasm IDA 11 GO:0005737 cytoplasm IEA ensembl_gene_id ensembl_transcript_id 1 ENSG00000196839 ENST00000359372 2 ENSG00000196839 ENST00000359372 3 ENSG00000196839 ENST00000359372 4 ENSG00000196839 ENST00000359372 5 ENSG00000196839 ENST00000359372 6 ENSG00000196839 ENST00000359372 7 ENSG00000196839 ENST00000359372 8 ENSG00000196839 ENST00000359372 9 ENSG00000196839 ENST00000359372 10 ENSG00000196839 ENST00000359372 11 ENSG00000196839 ENST00000359372 best, Steffen michael watson (IAH-C) wrote: >Thanks Sean, but I really wanted to demonstrate this in Bioconductor :-S > >I tried running the vignettes in goTools, the first time it froze up my >PC for about 30 minutes and then gave out a cryptic message about >coercing x to a list, the second time it froze up my PC and then R >crashed with no warning :-S > >As far as I can tell, GOStats doesn't have any clear examples of simple >mapping of microarray data to GO terms. > >Given that one of the major, fundamental tasks biologists want to do is >find out functional information for significantly differentailly >expressed genes, shouldn't this be a little easier, and a little more >transparent, in bioconductor? > >Again, I ask, does anyone have any simple examples of going from a list >of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the >biological function/term associated with those identifiers) > >Many thanks >Mick > >-----Original Message----- >From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >Sent: 01 March 2006 11:44 >To: michael watson (IAH-C); Bioconductor >Subject: Re: [BioC] Quick start to linking GO terms and microarray data > > > > >On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> >wrote: > > > >>Hi >> >>I want to investigate the GO terms associated with my microarray data >>(normally, a list of genes from topTable() in limma) >> >>I have read the vignettes for goTools and GOStats, and to be honest, I >>am still a little unclear what the overall process is, particularly if >> >> >I > > >>am working with a custom array and not with affy or operon. >> >>Lets say, for example, I have my array data in a data.frame containing >>gene names. In a separate data frame I have a link between my gene >>names and LocusLink IDs. How do I: >> >>1) Find the GO terms associated with subsets of my genes? (I realise I >>can use merge() to link my array data to the LocusLink ids, but what >> >> >do > > >>I do then?) >> >>2) Fins out if a particular GO term is statistically over- represented >> >> >in > > >>a particular group >> >> > >Hi, Mick. > >I would take your locuslink IDs for your genes and dump out two lists to >a >text file: > >1) All LocusIDs on your array. >2) All LoucsIDs in your genelist. > >Then use an external program or web tool such as DAVID/EASE to do the >analysis. > >That said, there was some discussion on using straight locusIDs (rather >than >requiring a metadata package) in GOHyperG. I don't know where that >conversion stands. > >As to your question about linking genes to GO, that is actually done at >the >transcript/protein level. Merging to entrez gene (locuslink) happens >after >the fact. Using various data sources, you can link by refseq, >locuslink, >ensembl ids, ucsc knowngenes, human invitational ids (human), and >probably >several others in species other than human. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >
ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 4 months ago
EMBL European Molecular Biology Laborat…
Hi Michael, regarding your last question, you can also try to use the "biomaRt" package (please use the development version 1.5.10), this allows you to start from all sorts of IDs that you care to have. For task 2, I like the "Category" package, in particular the cateGOry and applyByCategory functions. See their man pages and the vignette of the Category package. The Category package asks a related, but slightly different question from GOstats: not whether a certain GO category is overrepresented in a set of genes, buth rather whether a score (e.g. t statistic, mean expression level difference) tends to be higher in a GO category of genes than in all genes. Cheers Wolfgang michael watson (IAH-C) wrote: > Hi > > I want to investigate the GO terms associated with my microarray data > (normally, a list of genes from topTable() in limma) > > I have read the vignettes for goTools and GOStats, and to be honest, I > am still a little unclear what the overall process is, particularly if I > am working with a custom array and not with affy or operon. > > Lets say, for example, I have my array data in a data.frame containing > gene names. In a separate data frame I have a link between my gene > names and LocusLink IDs. How do I: > > 1) Find the GO terms associated with subsets of my genes? (I realise I > can use merge() to link my array data to the LocusLink ids, but what do > I do then?) > > 2) Fins out if a particular GO term is statistically over- represented in > a particular group > > Finally, is the only way to link into GO through LocusLink identifiers? > > Many thanks > Mick ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.3 years ago
Hi Steffen, Wolfgang Thanks a lot, the biomaRt package looks wonderful for the species that are in ensembl... Are there any functions within it to annotate other species? (Eg bacteria, plants etc) Many thanks Mick -----Original Message----- From: Steffen Durinck [mailto:sdurinck@ebi.ac.uk] Sent: 01 March 2006 13:24 To: michael watson (IAH-C) Cc: Sean Davis; Bioconductor Subject: Re: [BioC] Quick start to linking GO terms and microarray data Hi Mike, As Wolfgang already suggested you can do this with the biomaRt package. Here is how should do this: > library(biomaRt) Loading required package: XML Loading required package: RCurl > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") Checking attributes and filters ... ok > getGO(id=c(100,620),type="entrezgene",mart=mart) go_id go_description evidence_code 1 GO:0004000 adenosine deaminase activity TAS 2 GO:0016787 hydrolase activity IEA 3 GO:0009117 nucleotide metabolism IEA 4 GO:0009168 purine ribonucleoside monophosphate biosynthesis IEA 5 GO:0019735 antimicrobial humoral response (sensu Vertebrata) TAS 6 GO:0006955 immune response IMP 7 GO:0006955 immune response IEA 8 GO:0006163 purine nucleotide metabolism IMP 9 GO:0006163 purine nucleotide metabolism IEA 10 GO:0005737 cytoplasm IDA 11 GO:0005737 cytoplasm IEA ensembl_gene_id ensembl_transcript_id 1 ENSG00000196839 ENST00000359372 2 ENSG00000196839 ENST00000359372 3 ENSG00000196839 ENST00000359372 4 ENSG00000196839 ENST00000359372 5 ENSG00000196839 ENST00000359372 6 ENSG00000196839 ENST00000359372 7 ENSG00000196839 ENST00000359372 8 ENSG00000196839 ENST00000359372 9 ENSG00000196839 ENST00000359372 10 ENSG00000196839 ENST00000359372 11 ENSG00000196839 ENST00000359372 best, Steffen michael watson (IAH-C) wrote: >Thanks Sean, but I really wanted to demonstrate this in Bioconductor :-S > >I tried running the vignettes in goTools, the first time it froze up my >PC for about 30 minutes and then gave out a cryptic message about >coercing x to a list, the second time it froze up my PC and then R >crashed with no warning :-S > >As far as I can tell, GOStats doesn't have any clear examples of simple >mapping of microarray data to GO terms. > >Given that one of the major, fundamental tasks biologists want to do is >find out functional information for significantly differentailly >expressed genes, shouldn't this be a little easier, and a little more >transparent, in bioconductor? > >Again, I ask, does anyone have any simple examples of going from a list >of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the >biological function/term associated with those identifiers) > >Many thanks >Mick > >-----Original Message----- >From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >Sent: 01 March 2006 11:44 >To: michael watson (IAH-C); Bioconductor >Subject: Re: [BioC] Quick start to linking GO terms and microarray data > > > > >On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at="" bbsrc.ac.uk=""> >wrote: > > > >>Hi >> >>I want to investigate the GO terms associated with my microarray data >>(normally, a list of genes from topTable() in limma) >> >>I have read the vignettes for goTools and GOStats, and to be honest, I >>am still a little unclear what the overall process is, particularly if >> >> >I > > >>am working with a custom array and not with affy or operon. >> >>Lets say, for example, I have my array data in a data.frame containing >>gene names. In a separate data frame I have a link between my gene >>names and LocusLink IDs. How do I: >> >>1) Find the GO terms associated with subsets of my genes? (I realise I >>can use merge() to link my array data to the LocusLink ids, but what >> >> >do > > >>I do then?) >> >>2) Fins out if a particular GO term is statistically over- represented >> >> >in > > >>a particular group >> >> > >Hi, Mick. > >I would take your locuslink IDs for your genes and dump out two lists to >a >text file: > >1) All LocusIDs on your array. >2) All LoucsIDs in your genelist. > >Then use an external program or web tool such as DAVID/EASE to do the >analysis. > >That said, there was some discussion on using straight locusIDs (rather >than >requiring a metadata package) in GOHyperG. I don't know where that >conversion stands. > >As to your question about linking genes to GO, that is actually done at >the >transcript/protein level. Merging to entrez gene (locuslink) happens >after >the fact. Using various data sources, you can link by refseq, >locuslink, >ensembl ids, ucsc knowngenes, human invitational ids (human), and >probably >several others in species other than human. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >
ADD COMMENT
0
Entering edit mode
Hi, Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and msd. Soon I expect plants to be represented as well via the Gramene database (http://www.gramene.org). Best, Steffen michael watson (IAH-C) wrote: >Hi Steffen, Wolfgang > >Thanks a lot, the biomaRt package looks wonderful for the species that >are in ensembl... Are there any functions within it to annotate other >species? (Eg bacteria, plants etc) > >Many thanks >Mick > >-----Original Message----- >From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] >Sent: 01 March 2006 13:24 >To: michael watson (IAH-C) >Cc: Sean Davis; Bioconductor >Subject: Re: [BioC] Quick start to linking GO terms and microarray data > >Hi Mike, > >As Wolfgang already suggested you can do this with the biomaRt package. >Here is how should do this: > > > library(biomaRt) >Loading required package: XML >Loading required package: RCurl > > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") >Checking attributes and filters ... ok > > getGO(id=c(100,620),type="entrezgene",mart=mart) > > go_id go_description >evidence_code >1 GO:0004000 adenosine deaminase >activity TAS >2 GO:0016787 hydrolase >activity IEA >3 GO:0009117 nucleotide >metabolism IEA >4 GO:0009168 purine ribonucleoside monophosphate >biosynthesis IEA >5 GO:0019735 antimicrobial humoral response (sensu >Vertebrata) TAS >6 GO:0006955 immune >response IMP >7 GO:0006955 immune >response IEA >8 GO:0006163 purine nucleotide >metabolism IMP >9 GO:0006163 purine nucleotide >metabolism IEA >10 GO:0005737 >cytoplasm IDA >11 GO:0005737 >cytoplasm IEA > ensembl_gene_id ensembl_transcript_id >1 ENSG00000196839 ENST00000359372 >2 ENSG00000196839 ENST00000359372 >3 ENSG00000196839 ENST00000359372 >4 ENSG00000196839 ENST00000359372 >5 ENSG00000196839 ENST00000359372 >6 ENSG00000196839 ENST00000359372 >7 ENSG00000196839 ENST00000359372 >8 ENSG00000196839 ENST00000359372 >9 ENSG00000196839 ENST00000359372 >10 ENSG00000196839 ENST00000359372 >11 ENSG00000196839 ENST00000359372 > > >best, >Steffen > >michael watson (IAH-C) wrote: > > > >>Thanks Sean, but I really wanted to demonstrate this in Bioconductor >> >> >:-S > > >>I tried running the vignettes in goTools, the first time it froze up my >>PC for about 30 minutes and then gave out a cryptic message about >>coercing x to a list, the second time it froze up my PC and then R >>crashed with no warning :-S >> >>As far as I can tell, GOStats doesn't have any clear examples of simple >>mapping of microarray data to GO terms. >> >>Given that one of the major, fundamental tasks biologists want to do is >>find out functional information for significantly differentailly >>expressed genes, shouldn't this be a little easier, and a little more >>transparent, in bioconductor? >> >>Again, I ask, does anyone have any simple examples of going from a list >>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the >>biological function/term associated with those identifiers) >> >>Many thanks >>Mick >> >>-----Original Message----- >>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >>Sent: 01 March 2006 11:44 >>To: michael watson (IAH-C); Bioconductor >>Subject: Re: [BioC] Quick start to linking GO terms and microarray data >> >> >> >> >>On 3/1/06 6:20 AM, "michael watson (IAH-C)" >> >> ><michael.watson at="" bbsrc.ac.uk=""> > > >>wrote: >> >> >> >> >> >>>Hi >>> >>>I want to investigate the GO terms associated with my microarray data >>>(normally, a list of genes from topTable() in limma) >>> >>>I have read the vignettes for goTools and GOStats, and to be honest, I >>>am still a little unclear what the overall process is, particularly if >>> >>> >>> >>> >>I >> >> >> >> >>>am working with a custom array and not with affy or operon. >>> >>>Lets say, for example, I have my array data in a data.frame containing >>>gene names. In a separate data frame I have a link between my gene >>>names and LocusLink IDs. How do I: >>> >>>1) Find the GO terms associated with subsets of my genes? (I realise I >>>can use merge() to link my array data to the LocusLink ids, but what >>> >>> >>> >>> >>do >> >> >> >> >>>I do then?) >>> >>>2) Fins out if a particular GO term is statistically over- represented >>> >>> >>> >>> >>in >> >> >> >> >>>a particular group >>> >>> >>> >>> >>Hi, Mick. >> >>I would take your locuslink IDs for your genes and dump out two lists >> >> >to > > >>a >>text file: >> >>1) All LocusIDs on your array. >>2) All LoucsIDs in your genelist. >> >>Then use an external program or web tool such as DAVID/EASE to do the >>analysis. >> >>That said, there was some discussion on using straight locusIDs (rather >>than >>requiring a metadata package) in GOHyperG. I don't know where that >>conversion stands. >> >>As to your question about linking genes to GO, that is actually done at >>the >>transcript/protein level. Merging to entrez gene (locuslink) happens >>after >>the fact. Using various data sources, you can link by refseq, >>locuslink, >>ensembl ids, ucsc knowngenes, human invitational ids (human), and >>probably >>several others in species other than human. >> >>Sean >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> >> >> > > > > >
ADD REPLY
0
Entering edit mode
Hello Steffen, how do I connect to Wormbase? thanks Giovanni > listMarts() [1] "ensembl_mart_37" "vega_mart_37" "snp_mart_37" "msd_mart_4" [5] "uniprot_mart_17" > sessionInfo() R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0 other attached packages: biomaRt XML RMySQL DBI "1.4.0" "0.99-6" "0.5-7" "0.1-10" On Mar 1, 2006, at 5:42 AM, Steffen Durinck wrote: > Hi, > > Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot > and msd. > Soon I expect plants to be represented as well via the Gramene > database > (http://www.gramene.org). > > Best, > Steffen > > > michael watson (IAH-C) wrote: > >> Hi Steffen, Wolfgang >> >> Thanks a lot, the biomaRt package looks wonderful for the species >> that >> are in ensembl... Are there any functions within it to annotate other >> species? (Eg bacteria, plants etc) >> >> Many thanks >> Mick >> >> -----Original Message----- >> From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] >> Sent: 01 March 2006 13:24 >> To: michael watson (IAH-C) >> Cc: Sean Davis; Bioconductor >> Subject: Re: [BioC] Quick start to linking GO terms and microarray >> data >> >> Hi Mike, >> >> As Wolfgang already suggested you can do this with the biomaRt >> package. >> Here is how should do this: >> >>> library(biomaRt) >> Loading required package: XML >> Loading required package: RCurl >>> mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") >> Checking attributes and filters ... ok >>> getGO(id=c(100,620),type="entrezgene",mart=mart) >> >> go_id go_description >> evidence_code >> 1 GO:0004000 adenosine deaminase >> activity TAS >> 2 GO:0016787 hydrolase >> activity IEA >> 3 GO:0009117 nucleotide >> metabolism IEA >> 4 GO:0009168 purine ribonucleoside monophosphate >> biosynthesis IEA >> 5 GO:0019735 antimicrobial humoral response (sensu >> Vertebrata) TAS >> 6 GO:0006955 immune >> response IMP >> 7 GO:0006955 immune >> response IEA >> 8 GO:0006163 purine nucleotide >> metabolism IMP >> 9 GO:0006163 purine nucleotide >> metabolism IEA >> 10 GO:0005737 >> cytoplasm IDA >> 11 GO:0005737 >> cytoplasm IEA >> ensembl_gene_id ensembl_transcript_id >> 1 ENSG00000196839 ENST00000359372 >> 2 ENSG00000196839 ENST00000359372 >> 3 ENSG00000196839 ENST00000359372 >> 4 ENSG00000196839 ENST00000359372 >> 5 ENSG00000196839 ENST00000359372 >> 6 ENSG00000196839 ENST00000359372 >> 7 ENSG00000196839 ENST00000359372 >> 8 ENSG00000196839 ENST00000359372 >> 9 ENSG00000196839 ENST00000359372 >> 10 ENSG00000196839 ENST00000359372 >> 11 ENSG00000196839 ENST00000359372 >> >> >> best, >> Steffen >> >> michael watson (IAH-C) wrote: >> >> >> >>> Thanks Sean, but I really wanted to demonstrate this in Bioconductor >>> >>> >> :-S >> >> >>> I tried running the vignettes in goTools, the first time it froze >>> up my >>> PC for about 30 minutes and then gave out a cryptic message about >>> coercing x to a list, the second time it froze up my PC and then R >>> crashed with no warning :-S >>> >>> As far as I can tell, GOStats doesn't have any clear examples of >>> simple >>> mapping of microarray data to GO terms. >>> >>> Given that one of the major, fundamental tasks biologists want to >>> do is >>> find out functional information for significantly differentailly >>> expressed genes, shouldn't this be a little easier, and a little >>> more >>> transparent, in bioconductor? >>> >>> Again, I ask, does anyone have any simple examples of going from >>> a list >>> of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and >>> the >>> biological function/term associated with those identifiers) >>> >>> Many thanks >>> Mick >>> >>> -----Original Message----- >>> From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >>> Sent: 01 March 2006 11:44 >>> To: michael watson (IAH-C); Bioconductor >>> Subject: Re: [BioC] Quick start to linking GO terms and >>> microarray data >>> >>> >>> >>> >>> On 3/1/06 6:20 AM, "michael watson (IAH-C)" >>> >>> >> <michael.watson at="" bbsrc.ac.uk=""> >> >> >>> wrote: >>> >>> >>> >>> >>> >>>> Hi >>>> >>>> I want to investigate the GO terms associated with my microarray >>>> data >>>> (normally, a list of genes from topTable() in limma) >>>> >>>> I have read the vignettes for goTools and GOStats, and to be >>>> honest, I >>>> am still a little unclear what the overall process is, >>>> particularly if >>>> >>>> >>>> >>>> >>> I >>> >>> >>> >>> >>>> am working with a custom array and not with affy or operon. >>>> >>>> Lets say, for example, I have my array data in a data.frame >>>> containing >>>> gene names. In a separate data frame I have a link between my gene >>>> names and LocusLink IDs. How do I: >>>> >>>> 1) Find the GO terms associated with subsets of my genes? (I >>>> realise I >>>> can use merge() to link my array data to the LocusLink ids, but >>>> what >>>> >>>> >>>> >>>> >>> do >>> >>> >>> >>> >>>> I do then?) >>>> >>>> 2) Fins out if a particular GO term is statistically over- >>>> represented >>>> >>>> >>>> >>>> >>> in >>> >>> >>> >>> >>>> a particular group >>>> >>>> >>>> >>>> >>> Hi, Mick. >>> >>> I would take your locuslink IDs for your genes and dump out two >>> lists >>> >>> >> to >> >> >>> a >>> text file: >>> >>> 1) All LocusIDs on your array. >>> 2) All LoucsIDs in your genelist. >>> >>> Then use an external program or web tool such as DAVID/EASE to do >>> the >>> analysis. >>> >>> That said, there was some discussion on using straight locusIDs >>> (rather >>> than >>> requiring a metadata package) in GOHyperG. I don't know where that >>> conversion stands. >>> >>> As to your question about linking genes to GO, that is actually >>> done at >>> the >>> transcript/protein level. Merging to entrez gene (locuslink) >>> happens >>> after >>> the fact. Using various data sources, you can link by refseq, >>> locuslink, >>> ensembl ids, ucsc knowngenes, human invitational ids (human), and >>> probably >>> several others in species other than human. >>> >>> Sean >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> >>> >>> >>> >> >> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD REPLY
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.3 years ago
Hi Steffen Sorry if I am confused, but getGO() seems to require a connection to an ensembl database. If I have identifiers for a species that is not in ensembl, can I still use biomaRt to retrieve GO (and other) annotations? If so, it is a little unclear how to do this from the vignettes :-S Thank you for the help Mick -----Original Message----- From: Steffen Durinck [mailto:sdurinck@ebi.ac.uk] Sent: 01 March 2006 13:43 To: michael watson (IAH-C) Cc: Bioconductor Subject: Re: [BioC] Quick start to linking GO terms and microarray data Hi, Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and msd. Soon I expect plants to be represented as well via the Gramene database (http://www.gramene.org). Best, Steffen michael watson (IAH-C) wrote: >Hi Steffen, Wolfgang > >Thanks a lot, the biomaRt package looks wonderful for the species that >are in ensembl... Are there any functions within it to annotate other >species? (Eg bacteria, plants etc) > >Many thanks >Mick > >-----Original Message----- >From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] >Sent: 01 March 2006 13:24 >To: michael watson (IAH-C) >Cc: Sean Davis; Bioconductor >Subject: Re: [BioC] Quick start to linking GO terms and microarray data > >Hi Mike, > >As Wolfgang already suggested you can do this with the biomaRt package. >Here is how should do this: > > > library(biomaRt) >Loading required package: XML >Loading required package: RCurl > > mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") >Checking attributes and filters ... ok > > getGO(id=c(100,620),type="entrezgene",mart=mart) > > go_id go_description >evidence_code >1 GO:0004000 adenosine deaminase >activity TAS >2 GO:0016787 hydrolase >activity IEA >3 GO:0009117 nucleotide >metabolism IEA >4 GO:0009168 purine ribonucleoside monophosphate >biosynthesis IEA >5 GO:0019735 antimicrobial humoral response (sensu >Vertebrata) TAS >6 GO:0006955 immune >response IMP >7 GO:0006955 immune >response IEA >8 GO:0006163 purine nucleotide >metabolism IMP >9 GO:0006163 purine nucleotide >metabolism IEA >10 GO:0005737 >cytoplasm IDA >11 GO:0005737 >cytoplasm IEA > ensembl_gene_id ensembl_transcript_id >1 ENSG00000196839 ENST00000359372 >2 ENSG00000196839 ENST00000359372 >3 ENSG00000196839 ENST00000359372 >4 ENSG00000196839 ENST00000359372 >5 ENSG00000196839 ENST00000359372 >6 ENSG00000196839 ENST00000359372 >7 ENSG00000196839 ENST00000359372 >8 ENSG00000196839 ENST00000359372 >9 ENSG00000196839 ENST00000359372 >10 ENSG00000196839 ENST00000359372 >11 ENSG00000196839 ENST00000359372 > > >best, >Steffen > >michael watson (IAH-C) wrote: > > > >>Thanks Sean, but I really wanted to demonstrate this in Bioconductor >> >> >:-S > > >>I tried running the vignettes in goTools, the first time it froze up my >>PC for about 30 minutes and then gave out a cryptic message about >>coercing x to a list, the second time it froze up my PC and then R >>crashed with no warning :-S >> >>As far as I can tell, GOStats doesn't have any clear examples of simple >>mapping of microarray data to GO terms. >> >>Given that one of the major, fundamental tasks biologists want to do is >>find out functional information for significantly differentailly >>expressed genes, shouldn't this be a little easier, and a little more >>transparent, in bioconductor? >> >>Again, I ask, does anyone have any simple examples of going from a list >>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the >>biological function/term associated with those identifiers) >> >>Many thanks >>Mick >> >>-----Original Message----- >>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >>Sent: 01 March 2006 11:44 >>To: michael watson (IAH-C); Bioconductor >>Subject: Re: [BioC] Quick start to linking GO terms and microarray data >> >> >> >> >>On 3/1/06 6:20 AM, "michael watson (IAH-C)" >> >> ><michael.watson at="" bbsrc.ac.uk=""> > > >>wrote: >> >> >> >> >> >>>Hi >>> >>>I want to investigate the GO terms associated with my microarray data >>>(normally, a list of genes from topTable() in limma) >>> >>>I have read the vignettes for goTools and GOStats, and to be honest, I >>>am still a little unclear what the overall process is, particularly if >>> >>> >>> >>> >>I >> >> >> >> >>>am working with a custom array and not with affy or operon. >>> >>>Lets say, for example, I have my array data in a data.frame containing >>>gene names. In a separate data frame I have a link between my gene >>>names and LocusLink IDs. How do I: >>> >>>1) Find the GO terms associated with subsets of my genes? (I realise I >>>can use merge() to link my array data to the LocusLink ids, but what >>> >>> >>> >>> >>do >> >> >> >> >>>I do then?) >>> >>>2) Fins out if a particular GO term is statistically over- represented >>> >>> >>> >>> >>in >> >> >> >> >>>a particular group >>> >>> >>> >>> >>Hi, Mick. >> >>I would take your locuslink IDs for your genes and dump out two lists >> >> >to > > >>a >>text file: >> >>1) All LocusIDs on your array. >>2) All LoucsIDs in your genelist. >> >>Then use an external program or web tool such as DAVID/EASE to do the >>analysis. >> >>That said, there was some discussion on using straight locusIDs (rather >>than >>requiring a metadata package) in GOHyperG. I don't know where that >>conversion stands. >> >>As to your question about linking genes to GO, that is actually done at >>the >>transcript/protein level. Merging to entrez gene (locuslink) happens >>after >>the fact. Using various data sources, you can link by refseq, >>locuslink, >>ensembl ids, ucsc knowngenes, human invitational ids (human), and >>probably >>several others in species other than human. >> >>Sean >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> >> >> > > > > >
ADD COMMENT
0
Entering edit mode
> > michael watson (IAH-C) wrote: > >> Hi Steffen, Wolfgang >> >> Thanks a lot, the biomaRt package looks wonderful for the species that >> are in ensembl... Are there any functions within it to annotate other >> species? (Eg bacteria, plants etc) Mick, This is a quick-and-dirty solution that will get you whatever NCBI has available for gene ontology, including arabidopsis, for example. Hope this gets you another few species. The species IDs included are: > unique(gene2go$taxID) [1] 3702 4932 6239 7227 7955 9031 9606 10090 10116 36329 [11] 39947 83333 185431 195099 198094 211586 214684 223283 243164 243231 [21] 243233 246200 265669 284812 Hope this helps. Sean > download.file('ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz', destfile='gene2go.gz') trying URL 'ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz' ftp data connection made, file length 5541317 bytes opened URL ================================================== downloaded 5411Kb > gene2go <- read.table(gzfile('gene2go.gz'),sep="\t",header=FALSE,quote="") > colnames(gene2go) <- c('taxID', 'geneID', 'goID', 'evidence', 'qualifier', 'goTerm', 'pubmedlist') > gene2go[match(1:10,gene2go$geneID),] taxID geneID goID evidence qualifier 272227 9606 1 GO:0000004 ND 272230 9606 2 GO:0004867 IEA NA NA NA <na> <na> <na> NA.1 NA NA <na> <na> <na> NA.2 NA NA <na> <na> <na> NA.3 NA NA <na> <na> <na> NA.4 NA NA <na> <na> <na> NA.5 NA NA <na> <na> <na> 272240 9606 9 GO:0004060 TAS 272244 9606 10 GO:0004060 TAS goTerm pubmedlist 272227 biological process unknown - 272230 serine-type endopeptidase inhibitor activity - NA <na> <na> NA.1 <na> <na> NA.2 <na> <na> NA.3 <na> <na> NA.4 <na> <na> NA.5 <na> <na> 272240 arylamine N-acetyltransferase activity 10908296 272244 arylamine N-acetyltransferase activity 2340091 # and an example from A. thaliana # the GO for A. thaliana is from TAIR > gene2go[match(819280,gene2go$geneID),] taxID geneID goID evidence qualifier goTerm 12430 3702 819280 GO:0003700 ISS transcription factor activity pubmedlist 12430 7948864
ADD REPLY
0
Entering edit mode
On 3/1/06 8:28 AM, "Sean Davis" <sdavis2 at="" mail.nih.gov=""> wrote: > >> >> michael watson (IAH-C) wrote: >> >>> Hi Steffen, Wolfgang >>> >>> Thanks a lot, the biomaRt package looks wonderful for the species that >>> are in ensembl... Are there any functions within it to annotate other >>> species? (Eg bacteria, plants etc) > > Mick, > > This is a quick-and-dirty solution that will get you whatever NCBI has > available for gene ontology, including arabidopsis, for example. Hope this > gets you another few species. The species IDs included are: > >> unique(gene2go$taxID) > [1] 3702 4932 6239 7227 7955 9031 9606 10090 10116 36329 > [11] 39947 83333 185431 195099 198094 211586 214684 223283 243164 243231 > [21] 243233 246200 265669 284812 > > Hope this helps. > > Sean > > > >> download.file('ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz', > destfile='gene2go.gz') > trying URL 'ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz' > ftp data connection made, file length 5541317 bytes > opened URL > ================================================== > downloaded 5411Kb > >> gene2go <- read.table(gzfile('gene2go.gz'),sep="\t",header=FALSE,quote="") >> colnames(gene2go) <- c('taxID', 'geneID', 'goID', 'evidence', 'qualifier', > 'goTerm', 'pubmedlist') >> gene2go[match(1:10,gene2go$geneID),] This should be: gene2go[gene2go$geneID %in% 1:10,] >> gene2go[match(819280,gene2go$geneID),] And this should be: gene2go[gene2go$geneID %in% 1:10,] Sorry about that. Sean
ADD REPLY
0
Entering edit mode
Hi Mick, The biomaRt package can retrieve data from BioMart data management systems (see: http://www.biomart.org). Any database that provides such a BioMart implementation can thus be queried. Ensembl and Wormbase for example provide this and are queried in real-time through biomaRt. For species that are not in these systems, the biomaRt package can not provide help unless a local BioMart for this species is set up or you can try to convince the database of interest to include a BioMart system. I expect plants and fly to be included soon but have no information on other species. Best, Steffen michael watson (IAH-C) wrote: >Hi Steffen > >Sorry if I am confused, but getGO() seems to require a connection to an >ensembl database. If I have identifiers for a species that is not in >ensembl, can I still use biomaRt to retrieve GO (and other) annotations? > >If so, it is a little unclear how to do this from the vignettes :-S > >Thank you for the help > >Mick > >-----Original Message----- >From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] >Sent: 01 March 2006 13:43 >To: michael watson (IAH-C) >Cc: Bioconductor >Subject: Re: [BioC] Quick start to linking GO terms and microarray data > >Hi, > >Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and >msd. >Soon I expect plants to be represented as well via the Gramene database >(http://www.gramene.org). > >Best, >Steffen > > >michael watson (IAH-C) wrote: > > > >>Hi Steffen, Wolfgang >> >>Thanks a lot, the biomaRt package looks wonderful for the species that >>are in ensembl... Are there any functions within it to annotate other >>species? (Eg bacteria, plants etc) >> >>Many thanks >>Mick >> >>-----Original Message----- >>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] >>Sent: 01 March 2006 13:24 >>To: michael watson (IAH-C) >>Cc: Sean Davis; Bioconductor >>Subject: Re: [BioC] Quick start to linking GO terms and microarray data >> >>Hi Mike, >> >>As Wolfgang already suggested you can do this with the biomaRt package. >>Here is how should do this: >> >> >> >>>library(biomaRt) >>> >>> >>Loading required package: XML >>Loading required package: RCurl >> >> >>>mart = useMart("ensembl",dataset="hsapiens_gene_ensembl") >>> >>> >>Checking attributes and filters ... ok >> >> >>>getGO(id=c(100,620),type="entrezgene",mart=mart) >>> >>> >> go_id go_description >>evidence_code >>1 GO:0004000 adenosine deaminase >>activity TAS >>2 GO:0016787 hydrolase >>activity IEA >>3 GO:0009117 nucleotide >>metabolism IEA >>4 GO:0009168 purine ribonucleoside monophosphate >>biosynthesis IEA >>5 GO:0019735 antimicrobial humoral response (sensu >>Vertebrata) TAS >>6 GO:0006955 immune >>response IMP >>7 GO:0006955 immune >>response IEA >>8 GO:0006163 purine nucleotide >>metabolism IMP >>9 GO:0006163 purine nucleotide >>metabolism IEA >>10 GO:0005737 >>cytoplasm IDA >>11 GO:0005737 >>cytoplasm IEA >> ensembl_gene_id ensembl_transcript_id >>1 ENSG00000196839 ENST00000359372 >>2 ENSG00000196839 ENST00000359372 >>3 ENSG00000196839 ENST00000359372 >>4 ENSG00000196839 ENST00000359372 >>5 ENSG00000196839 ENST00000359372 >>6 ENSG00000196839 ENST00000359372 >>7 ENSG00000196839 ENST00000359372 >>8 ENSG00000196839 ENST00000359372 >>9 ENSG00000196839 ENST00000359372 >>10 ENSG00000196839 ENST00000359372 >>11 ENSG00000196839 ENST00000359372 >> >> >>best, >>Steffen >> >>michael watson (IAH-C) wrote: >> >> >> >> >> >>>Thanks Sean, but I really wanted to demonstrate this in Bioconductor >>> >>> >>> >>> >>:-S >> >> >> >> >>>I tried running the vignettes in goTools, the first time it froze up >>> >>> >my > > >>>PC for about 30 minutes and then gave out a cryptic message about >>>coercing x to a list, the second time it froze up my PC and then R >>>crashed with no warning :-S >>> >>>As far as I can tell, GOStats doesn't have any clear examples of >>> >>> >simple > > >>>mapping of microarray data to GO terms. >>> >>>Given that one of the major, fundamental tasks biologists want to do >>> >>> >is > > >>>find out functional information for significantly differentailly >>>expressed genes, shouldn't this be a little easier, and a little more >>>transparent, in bioconductor? >>> >>>Again, I ask, does anyone have any simple examples of going from a >>> >>> >list > > >>>of LocusLink IDs to a list of GO Terms? (i.e. GO identifiers and the >>>biological function/term associated with those identifiers) >>> >>>Many thanks >>>Mick >>> >>>-----Original Message----- >>>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] >>>Sent: 01 March 2006 11:44 >>>To: michael watson (IAH-C); Bioconductor >>>Subject: Re: [BioC] Quick start to linking GO terms and microarray >>> >>> >data > > >>> >>> >>>On 3/1/06 6:20 AM, "michael watson (IAH-C)" >>> >>> >>> >>> >><michael.watson at="" bbsrc.ac.uk=""> >> >> >> >> >>>wrote: >>> >>> >>> >>> >>> >>> >>> >>>>Hi >>>> >>>>I want to investigate the GO terms associated with my microarray data >>>>(normally, a list of genes from topTable() in limma) >>>> >>>>I have read the vignettes for goTools and GOStats, and to be honest, >>>> >>>> >I > > >>>>am still a little unclear what the overall process is, particularly >>>> >>>> >if > > >>>> >>>> >>>> >>>> >>>> >>>> >>>I >>> >>> >>> >>> >>> >>> >>>>am working with a custom array and not with affy or operon. >>>> >>>>Lets say, for example, I have my array data in a data.frame >>>> >>>> >containing > > >>>>gene names. In a separate data frame I have a link between my gene >>>>names and LocusLink IDs. How do I: >>>> >>>>1) Find the GO terms associated with subsets of my genes? (I realise >>>> >>>> >I > > >>>>can use merge() to link my array data to the LocusLink ids, but what >>>> >>>> >>>> >>>> >>>> >>>> >>>do >>> >>> >>> >>> >>> >>> >>>>I do then?) >>>> >>>>2) Fins out if a particular GO term is statistically over- represented >>>> >>>> >>>> >>>> >>>> >>>> >>>in >>> >>> >>> >>> >>> >>> >>>>a particular group >>>> >>>> >>>> >>>> >>>> >>>> >>>Hi, Mick. >>> >>>I would take your locuslink IDs for your genes and dump out two lists >>> >>> >>> >>> >>to >> >> >> >> >>>a >>>text file: >>> >>>1) All LocusIDs on your array. >>>2) All LoucsIDs in your genelist. >>> >>>Then use an external program or web tool such as DAVID/EASE to do the >>>analysis. >>> >>>That said, there was some discussion on using straight locusIDs >>> >>> >(rather > > >>>than >>>requiring a metadata package) in GOHyperG. I don't know where that >>>conversion stands. >>> >>>As to your question about linking genes to GO, that is actually done >>> >>> >at > > >>>the >>>transcript/protein level. Merging to entrez gene (locuslink) happens >>>after >>>the fact. Using various data sources, you can link by refseq, >>>locuslink, >>>ensembl ids, ucsc knowngenes, human invitational ids (human), and >>>probably >>>several others in species other than human. >>> >>>Sean >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> >>> >>> >>> >>> >>> >> >> >> >> >> > > > > >
ADD REPLY

Login before adding your answer.

Traffic: 347 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6