How to match Locus IDs with Gene Ontology IDs?

0

Entering edit mode

Glynn, Earl ▴ 170

@glynn-earl-952

Last seen 9.6 years ago

I looked at several Bioconductor packages that deal with Gene Ontology (GO, goTools, ontoTools), and I don't seem to find functionality that does the following: Given Locus ID NM_001533 I can go to NCBI http://www.ncbi.nlm.nih.gov/ and search "Nucleotide" for "NM_001533" I can click on the NM_0015333 hit returned, and about 2/3rds of the way down the page under the CDS section, the go_component, go_function, and go_process subsections give Gene Ontology info for NM_0015333. Likewise, if I do the same thing with Locus ID BC001721, I see a hit and a CDS section, but no gene ontology information. That's OK, I'm not expecting everything to have GO information. (E.g, of the 45,101 probesets on the Mouse430_2 Affy chip, only about 4693 have GO Biological process information, 2573 have celleular info, and 4875 have molecular function info. I'm not working with Affy data, but I know many IDs won't have GO info, but some will.) If I have a long list of Locus IDs, e.g., NM_001533, BC001721, ., are there any Bioconductor packages that "connect" these identifiers to gene ontology identifiers, or perhaps some other identifier (say LocusLink, aka Enterez Gene) that is mapped to the Gene Ontology information? Thanks for any suggestions on how this might be automated using Bioconductor and R. Earl F. Glynn Scientific Programmer Bioinformatics Department Stowers Institute for Medical Research

GO affy GO affy • 2.7k views

ADD COMMENT • link updated 18.4 years ago by rgentleman ★ 5.5k • written 18.4 years ago by Glynn, Earl ▴ 170

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.0 years ago

United States

Hi Earl F. Glynn wrote: > I looked at several Bioconductor packages that deal with Gene Ontology (GO, > goTools, ontoTools), and I don't seem to find functionality that does the > following: > > > > Given Locus ID NM_001533 I can go to NCBI I think that is a RefSeq ID and I am also pretty sure that LocusLink has been retired in favor of Entrez Gene (although we are a bit slow in moving). > > http://www.ncbi.nlm.nih.gov/ > > and search "Nucleotide" for "NM_001533" > > > > I can click on the NM_0015333 hit returned, and about 2/3rds of the way down > the page under the CDS section, the go_component, go_function, and > go_process subsections give Gene Ontology info for NM_0015333. > biomaRt might be your best choice > > > Likewise, if I do the same thing with Locus ID BC001721, I see a hit and a > CDS section, but no gene ontology information. That's OK, I'm not expecting > everything to have GO information. (E.g, of the 45,101 probesets on the > Mouse430_2 Affy chip, only about 4693 have GO Biological process > information, 2573 have celleular info, and 4875 have molecular function > info. I'm not working with Affy data, but I know many IDs won't have GO > info, but some will.) Again I do not believe that BC001721 is an Entrez Gene ID, and it does matter a bit. You can of course always use AnnBuilder to build your own annotation for a microarray (if that is what you are working off). Robert > > > > If I have a long list of Locus IDs, e.g., NM_001533, BC001721, ., are there > any Bioconductor packages that "connect" these identifiers to gene ontology > identifiers, or perhaps some other identifier (say LocusLink, aka Enterez > Gene) that is mapped to the Gene Ontology information? > > > > Thanks for any suggestions on how this might be automated using Bioconductor > and R. > > > > Earl F. Glynn > > Scientific Programmer > > Bioinformatics Department > > Stowers Institute for Medical Research > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 18.4 years ago rgentleman ★ 5.5k

0

Entering edit mode

Hi, There are two ways to do this in biomaRt. The result will be the same but the second way is a very general one and will allow you to extract anything you want that is available from the BioMart databases. 1) Using the simple getGO function: > mart=martConnect() connected to: ensembl_mart_35 #Now retrieve the GO annotation for your id, note that you can also give a vector of id's > getGO(id="NM_001533",type="refseq",species="hsapiens",mart=mart) id GOID description evidence 1 NM_001533 GO:0000166 nucleotide binding IEA 2 NM_001533 GO:0003723 RNA binding TAS 3 NM_001533 GO:0006397 mRNA processing IEA 4 NM_001533 GO:0005654 nucleoplasm TAS 5 NM_001533 GO:0030530 heterogeneous nuclear ribonucleoprotein complex TAS 6 NM_001533 GO:0005634 nucleus IEA martID 1 ENSG00000104824 2 ENSG00000104824 3 ENSG00000104824 4 ENSG00000104824 5 ENSG00000104824 6 ENSG00000104824 >martDisconnect(mart) 2) Using the more powerful getBM function: #List the available BioMart databases > listMarts() [1] "ensembl_mart_35" "vega_mart_35" "snp_mart_35" "msd_mart_4" "uniprot_mart_17" #Select a BioMart to use with the function useMart() and verify database connection. > mart=useMart("ensembl_mart_35") connected to: ensembl_mart_35 #List the available datasets in the selected BioMart database. > listDatasets(mart) dataset version 1 ptroglodytes_gene_ensembl CHIMP1 2 ggallus_gene_ensembl WASHUC1 3 rnorvegicus_gene_ensembl RGSC3.4 4 scerevisiae_gene_ensembl SGD1 5 tnigroviridis_gene_ensembl TETRAODON7 6 xtropicalis_gene_ensembl JGI3 7 frubripes_gene_ensembl FUGU2 8 cintestinalis_gene_ensembl CINT1.95 9 agambiae_gene_ensembl MOZ2a 10 amellifera_gene_ensembl AMEL2.0 11 btaurus_gene_ensembl BDGP4 12 celegans_gene_ensembl CEL140 13 mmusculus_gene_ensembl NCBIM34 14 cfamiliaris_gene_ensembl BROADD1 15 dmelanogaster_gene_ensembl BDGP4 16 drerio_gene_ensembl ZFISH5 17 hsapiens_gene_ensembl NCBI35 18 mdomestica_gene_ensembl JGI3 #Update the mart object by selecting a dataset > mart = useDataset(dataset = "hsapiens_gene_ensembl",mart = mart) Reading database configuration of: hsapiens_gene_ensembl Checking attributes ... ok Checking filters ... ok Checking main tables ... ok #List the available attributes for this dataset. This shows you all the attributes that can be retrieved. > listAttributes(mart) [1] "chr_name" [2] "chrom_start" [3] "chrom_end" [4] "chrom_strand" . #List the available filters for this dataset. This shows all possible filters that can be used on this dataset. > listFilters(mart) [1] "chr_name" [2] "gene_chrom_start" [3] "gene_chrom_end" [4] "in_encode" . #Now you can do a query e.g. Get the GO id, evidence code, and description for refseq id as filter and with values NM_001533. Note that you can also give a vector of identifiers as values. > getBM(attributes=c("go_id", "go_description","evidence_code"), filter="refseq_dna",values="NM_001533",mart=mart) refseq_dna go_id go_description 1 NM_001533 GO:0000166 nucleotide binding 2 NM_001533 GO:0003723 RNA binding 3 NM_001533 GO:0006397 mRNA processing 4 NM_001533 GO:0005654 nucleoplasm 5 NM_001533 GO:0030530 heterogeneous nuclear ribonucleoprotein complex 6 NM_001533 GO:0005634 nucleus evidence_code 1 IEA 2 TAS 3 IEA 4 TAS 5 TAS 6 IEA best, Steffen > Hi > > Earl F. Glynn wrote: >> I looked at several Bioconductor packages that deal with Gene Ontology >> (GO, >> goTools, ontoTools), and I don't seem to find functionality that does >> the >> following: >> >> >> >> Given Locus ID NM_001533 I can go to NCBI > > I think that is a RefSeq ID and I am also pretty sure that LocusLink > has been retired in favor of Entrez Gene (although we are a bit slow in > moving). > >> >> http://www.ncbi.nlm.nih.gov/ >> >> and search "Nucleotide" for "NM_001533" >> >> >> >> I can click on the NM_0015333 hit returned, and about 2/3rds of the way >> down >> the page under the CDS section, the go_component, go_function, and >> go_process subsections give Gene Ontology info for NM_0015333. >> > > biomaRt might be your best choice > >> >> >> Likewise, if I do the same thing with Locus ID BC001721, I see a hit and >> a >> CDS section, but no gene ontology information. That's OK, I'm not >> expecting >> everything to have GO information. (E.g, of the 45,101 probesets on the >> Mouse430_2 Affy chip, only about 4693 have GO Biological process >> information, 2573 have celleular info, and 4875 have molecular function >> info. I'm not working with Affy data, but I know many IDs won't have GO >> info, but some will.) > > Again I do not believe that BC001721 is an Entrez Gene ID, and it does > matter a bit. > > You can of course always use AnnBuilder to build your own annotation > for a microarray (if that is what you are working off). > > Robert > >> >> >> >> If I have a long list of Locus IDs, e.g., NM_001533, BC001721, ., are >> there >> any Bioconductor packages that "connect" these identifiers to gene >> ontology >> identifiers, or perhaps some other identifier (say LocusLink, aka >> Enterez >> Gene) that is mapped to the Gene Ontology information? >> >> >> >> Thanks for any suggestions on how this might be automated using >> Bioconductor >> and R. >> >> >> >> Earl F. Glynn >> >> Scientific Programmer >> >> Bioinformatics Department >> >> Stowers Institute for Medical Research >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 18.4 years ago Steffen Durinck ▴ 420

0

Entering edit mode

Hi, I think I found a small bug in decideTests when heirarchical or nestedF methods are selected in conjunction with adjust.method="BH" (the new default value). In both cases there is a problem with the 'switch' command. In adjust heirarchical: 51 a <- switch(adjust.method, 52 none=1, 53 bonferroni=1/n, 54 holm=1/(n-i+1), 55 BH,fdr=i/n <------- BH=i/n,fdr=i/n 56 ) In adjust nestedF: 65 a <- switch(adjust.method, 66 none=1, 67 bonferroni=1/n, 68 holm=1/(n-i+1), 69 fdr=i/n <------- BH=i/n,fdr=i/n 70 ) btw, the p.adjust function of statmod package makes no distinction between BH and fdr methods, why introduce such distinction in limma? (or am I missing something?) regards, and thanks for an excellent tool! Ariel./ -- Ariel Chernomoretz, Ph.D. Centre de recherche du CHUL 2705 Blv Laurier, bloc T-367 Sainte-Foy, Qc G1V 4G2 (418)-525-4444 ext 46339

ADD REPLY • link 18.4 years ago Ariel Chernomoretz ▴ 380

Login before adding your answer.