Incomplete EntrezID annotations for the Mouse 430 v2.0 probe-set
2
0
Entering edit mode
@anjan-purkayastha-4273
Last seen 9.7 years ago
Hi, I have run into the following problem. I created a probeID-EntrezID mapping for the Affy mouse array from the cognate annotation file Mouse4302.db. Unfortunately about 10000 genes do not have corresponding EntrezID. Many of these are genes with known functions. If I cannot map a EntrezID to these then I cannot retrieve GO annotations and consequently I cannot do a Gene Set Enrichment analysis using GOstats. Does anyone have an update annotation file? Many thanks in advance, Anjan -- =================================== anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 =================================== [[alternative HTML version deleted]]
Annotation GO affy GOstats Annotation GO affy GOstats • 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 12 days ago
United States
On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote: > Hi, > I have run into the following problem. I created a probeID-EntrezID mapping > for the Affy mouse array from the cognate annotation file Mouse4302.db. > Unfortunately about 10000 genes do not have corresponding EntrezID. > Many of these are genes with known functions. If I cannot map a EntrezID to > these then I cannot retrieve GO annotations and consequently I cannot do a > Gene Set Enrichment analysis using GOstats. > Does anyone have an update annotation file? Hi Anjan What is your sessionInfo() (else how could we know what an 'updated' annotation file is?) and how did you preform the mapping (short, hopefully reproducible, code)? Martin > Many thanks in advance, > Anjan > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, Session Info: R version 2.11.1 (2010-05-31) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] affy_1.26.1 GOstats_2.14.0 graph_1.28.0 Category_2.14.0 mouse4302.db_2.4.1 org.Mm.eg.db_2.4.1 RSQLite_0.9-2 [8] DBI_0.2-5 AnnotationDbi_1.10.2 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 annotate_1.26.1 genefilter_1.30.0 GO.db_2.4.1 GSEABase_1.10.0 preprocessCore_1.10.0 [7] RBGL_1.26.0 splines_2.11.1 survival_2.35-8 tools_2.11.1 XML_3.1-1 xtable_1.5-6 Commands used to create the mapping: Library(mouse4302.db) id <- rownames(allMtb.rma.data.frame) map <- mouse4302ENTREZID probe_entrezid <- unlist(mget(id, map)) p <- as.data.frame(probe_entrezid) p now has the probeID_entrezID mappings Thanks, Anjan On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote: > > Hi, > > I have run into the following problem. I created a probeID- EntrezID > mapping > > for the Affy mouse array from the cognate annotation file Mouse4302.db. > > Unfortunately about 10000 genes do not have corresponding EntrezID. > > Many of these are genes with known functions. If I cannot map a EntrezID > to > > these then I cannot retrieve GO annotations and consequently I cannot do > a > > Gene Set Enrichment analysis using GOstats. > > Does anyone have an update annotation file? > > Hi Anjan > > What is your sessionInfo() (else how could we know what an 'updated' > annotation file is?) and how did you preform the mapping (short, > hopefully reproducible, code)? > > Martin > > > Many thanks in advance, > > Anjan > > > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > -- =================================== anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 =================================== [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 11/02/2010 11:20 AM, ANJAN PURKAYASTHA wrote: > Hi Martin, > Session Info: > R version 2.11.1 (2010-05-31) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affy_1.26.1 GOstats_2.14.0 graph_1.28.0 > Category_2.14.0 mouse4302.db_2.4.1 org.Mm.eg.db_2.4.1 > RSQLite_0.9-2 > [8] DBI_0.2-5 AnnotationDbi_1.10.2 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.1 genefilter_1.30.0 > GO.db_2.4.1 GSEABase_1.10.0 preprocessCore_1.10.0 > [7] RBGL_1.26.0 splines_2.11.1 survival_2.35-8 > tools_2.11.1 XML_3.1-1 xtable_1.5-6 > > > Commands used to create the mapping: > Library(mouse4302.db) > id <- rownames(allMtb.rma.data.frame) > map <- mouse4302ENTREZID > probe_entrezid <- unlist(mget(id, map)) > p <- as.data.frame(probe_entrezid) > p now has the probeID_entrezID mappings With R-2-11 I see > mouse4302() [...snip...] mouse4302ENTREZID has 37316 mapped keys (of 45101 keys) [...snip...] Date for NCBI data: 2010-Mar1 The current version of R / Bioconductor is R-2-12, where there are 37413 mapped probes from NCBI data of 2010-Sep7. Using biomaRt I get > library(biomaRt) > mart = useMart("ensembl", "mmusculus_gene_ensembl") > attrs = listAttributes(mart) > attrs[grep("(Entrez|Affy mouse)", attrs[[2]]),] name description 47 entrezgene EntrezGene ID 95 affy_mouse430_2 Affy mouse430 2 96 affy_mouse430a_2 Affy mouse430a 2 > filts = listFilters(mart) > filts[grep("(Entrez|Affy mouse)", filts[[2]]),] name description 52 with_entrezgene with EntrezGene ID(s) 84 entrezgene EntrezGene ID(s) [e.g. 100287163] 121 affy_mouse430_2 Affy mouse430 2 ID(s) [e.g. 1426088_at] 122 affy_mouse430a_2 Affy mouse430a 2 ID(s) [e.g. 1426088_at] > res = getBM(c("affy_mouse430_2","entrezgene"), "with_entrezgene", TRUE, mart) > head(res) affy_mouse430_2 entrezgene 1 338371 2 238944 3 208431 4 1430582_at 268281 5 1458594_at 268281 6 1455882_x_at 319922 > head(table(table(res[[1]]))) 1 2 3 4 5 6 24627 1746 374 96 62 34 which tells me there are 24627 uniquely mapping probes, and some more that could be retrieved with some work (I haven't checked my biomaRt work very carefully here, so could have made mistakes, and I don't know biomaRt well enough to get the provenance of the probes I have identified, unlike with mouse4302.db where ?mouse4302ENTREZID is helpful). I could remap the probes using chromosome coordinates from the mouse4302 package and BSgenome / Biostrings, and then use org.Mm.eg.db to map coordinates to genes, too. So I think the best you can do easily are the ~37,000 probes that are mapped. Martin > > Thanks, > Anjan > > > On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote: > > Hi, > > I have run into the following problem. I created a > probeID-EntrezID mapping > > for the Affy mouse array from the cognate annotation file > Mouse4302.db. > > Unfortunately about 10000 genes do not have corresponding EntrezID. > > Many of these are genes with known functions. If I cannot map a > EntrezID to > > these then I cannot retrieve GO annotations and consequently I > cannot do a > > Gene Set Enrichment analysis using GOstats. > > Does anyone have an update annotation file? > > Hi Anjan > > What is your sessionInfo() (else how could we know what an 'updated' > annotation file is?) and how did you preform the mapping (short, > hopefully reproducible, code)? > > Martin > > > Many thanks in advance, > > Anjan > > > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > > > > -- > =================================== > anjan purkayastha, phd. > research associate > fas center for systems biology, > harvard university > 52 oxford street > cambridge ma 02138 > phone-703.740.6939 > =================================== -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 7 weeks ago
United States
On Tue, Nov 2, 2010 at 2:14 PM, ANJAN PURKAYASTHA <anjan.purkayastha at="" gmail.com=""> wrote: > Hi, > I have run into the following problem. I created a probeID-EntrezID mapping > for the Affy mouse array from the cognate annotation file Mouse4302.db. > Unfortunately about 10000 genes do not have corresponding EntrezID. What do you mean by "10000 genes"? The following shows that 7688 probesets do not have Entrez ID mappings (using current packages). > length(ls(mouse4302ENTREZID)) [1] 45101 > length(setdiff(ls(mouse4302ENTREZID), mappedkeys(mouse4302ENTREZID))) [1] 7688 That's just a fact of life. > Many of these are genes with known functions. If I cannot map a EntrezID to > these then I cannot retrieve GO annotations and consequently I cannot do a > Gene Set Enrichment analysis using GOstats. This is not really correct. You can use whatever groupings and mappings you like with GOstats. See the GOstatsForUnsupportedOrganisms for extensive details on dealing with a somewhat more difficult situation. When you say the genes have "known functions", perhaps you can use that knowledge to provide GO associations for the unmapped genes, or, if the functions you refer to do not have names in GO, you can create your own functional grouping of genes. > Does anyone have an update annotation file? Your sessionInfo shows that you are not using the current version of R, but that is not the main concern. If you have gene:GO mappings and gene sets that you prefer to those available through the annotation packages, you can use those mappings and sets to drive the GOstats analysis. My sessionInfo: R version 2.12.0 Patched (2010-10-15 r53331) Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] mouse4302.db_2.4.5 org.Mm.eg.db_2.4.6 RSQLite_0.9-2 [4] DBI_0.2-5 AnnotationDbi_1.11.9 Biobase_2.10.0 [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 > Many thanks in advance, > Anjan > > -- > =================================== > anjan purkayastha, phd. > research associate > fas center for systems biology, > harvard university > 52 oxford street > cambridge ma 02138 > phone-703.740.6939 > =================================== > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6