mogene10stprobeset.db error

0

Entering edit mode

Maxim ▴ 170

@maxim-3843

Last seen 9.7 years ago

Hi, I try to analyze MoGene-1_0-st gene arrays. I used the aroma package to do this and came up with an expression matrix, but have no clue, how to assign real gene names to the respective "IDs" (column "item numbers" after aroma normalization and summarization). As a workaround I simply tried to load the mogene10stprobeset.db library and did u<-mget(row.names(x),mogene10stprobesetSYMBOL) with x being the expression matrix and rownames(x) are the IDs. But the majority of IDs are unknown: Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "10471503" not found But why? This ID is clearly correct: 10471503 chr2:32530629-32530765 chr2 NC_000068.6 + 32530629 32530765 25 --- ENSMUST00000082819 // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:2:32530629:32530765:1 gene:ENSMUSG00000064753 // chr2 // 100 // 100 // 25 // 25 // 0 /// ENSMUST00000083292 // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:9:15119289:15119425:1 gene:ENSMUSG00000065226 // chr2 // 72 // 100 // 18 // 25 // 0 main What is my problem, obviously I miss something? Maxim

Normalization ASSIGN Normalization ASSIGN • 1.2k views

ADD COMMENT • link updated 13.9 years ago by Vincent J. Carey, Jr. 6.7k • written 13.9 years ago by Maxim ▴ 170

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 5 days ago

United States

Others will have to comment on the details of this mapping. You are finding a "hit" to an eight-digit token of weakly specified provenance (generated with aroma, no indication of version etc) and asserting that an "ID is clearly correct", without telling us the resource where you found the "hit". We can use biomart to follow up a bit > library(biomaRt) > ss = useDataset("mmusculus_gene_ensembl", mart=useMart("ensembl")) > fff = getBM(mart=ss, filters="affy_mogene_1_0_st_v1", values="10471503", attributes=c("ensembl_gene_id", + "ensembl_transcript_id", "chromosome_name", "mgi_symbol")) > fff ensembl_gene_id ensembl_transcript_id chromosome_name mgi_symbol 1 ENSMUSG00000088569 ENSMUST00000157944 2 NA 2 ENSMUSG00000065226 ENSMUST00000083292 9 NA 3 ENSMUSG00000088929 ENSMUST00000158304 9 NA 4 ENSMUSG00000065282 ENSMUST00000083348 9 NA suggesting that current ensembl annotation maps "the ID" to transcripts on chr 2 and chr 9. Perhaps biomaRt will yield more clues for you. > sessionInfo() R version 2.12.0 Under development (unstable) (2010-06-30 r52417) Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] grid splines stats graphics grDevices datasets tools [8] utils methods base other attached packages: [1] biomaRt_2.5.1 mogene10stprobeset.db_5.0.2 [3] mogene10sttranscriptcluster.db_5.0.1 org.Mm.eg.db_2.4.1 etc... On Sat, Jul 3, 2010 at 4:02 PM, Maxim <deeepersound at="" googlemail.com=""> wrote: > Hi, > > I try to analyze MoGene-1_0-st gene arrays. I used the aroma package > to do this and came up with an expression matrix, but have no clue, > how to assign real gene names to the respective "IDs" (column "item > numbers" after aroma normalization and summarization). > > As a workaround I simply tried to load the mogene10stprobeset.db library and did > > u<-mget(row.names(x),mogene10stprobesetSYMBOL) > > with x being the expression matrix and rownames(x) are the IDs. But > the majority of IDs are unknown: > > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > ?value for "10471503" not found > > But why? This ID is clearly correct: > 10471503 ? ? ? ? ? ? ? ?chr2:32530629-32530765 ?chr2 ? ?NC_000068.6 ? ? + ? ? ? 32530629 ? ? ? ?32530765 ? ? ? ?25 ? ? ?--- ? ? ENSMUST00000082819 > // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:2:32530629:32530765:1 > gene:ENSMUSG00000064753 // chr2 // 100 // 100 // 25 // 25 // 0 /// > ENSMUST00000083292 // ENSEMBL // ncrna:snoRNA > chromosome:NCBIM37:9:15119289:15119425:1 gene:ENSMUSG00000065226 // > chr2 // 72 // 100 // 18 // 25 // 0 ? ? ?main > > What is my problem, obviously I miss something? > > Maxim > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.9 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 5 days ago

United States

Please keep responses on list. I am glad your results seem to make more sense now. nevertheless, a tightening of the preprocessing/analysis/annotation pipeline for affy gene/exon 1.0 st arrays would be welcome, and the oligo package addresses this to some extent. more fully worked examples for these arrays are in development. On Sun, Jul 4, 2010 at 5:19 AM, Maxim <deeepersound at="" googlemail.com=""> wrote: > Ooops, sorry, I was not aware of the complexity of the situation. > Meanwhile I found out that using the (before not specified) > mogene-1_0-st-v1,r3.cdf I anyhow just get the gene summaries and not > the individual probesets for individual exons (ending up in roughly > 35000 IDs as compared to the larger 10^6 total IDs/probes), at least > when using the "short" protocol as suggested on the aroma homepage. I > use latest aroma and R 2.10! > > According to this situation I found an older mailing list that > suggested to use the mogene10sttranscriptcluster.db instead. Ineed, > this solved all my problems (most of it, still a number if IDs are not > idnetified). > > Anyhow, for my toptable presentation in Limma it's fine. > > Thanks you! > > Maxim > > 2010/7/4 Vincent Carey <stvjc at="" channing.harvard.edu="">: >> Others will have to comment on the details of this mapping. ?You are >> finding a "hit" to an eight-digit token of weakly specified provenance >> (generated with aroma, no indication of version etc) and asserting >> that an "ID is clearly correct", without telling us the resource where >> you found the "hit". ?We can use biomart to follow up a bit >> >>> library(biomaRt) >>> ss = useDataset("mmusculus_gene_ensembl", mart=useMart("ensembl")) >>> fff = getBM(mart=ss, filters="affy_mogene_1_0_st_v1", values="10471503", attributes=c("ensembl_gene_id", >> + ?"ensembl_transcript_id", "chromosome_name", "mgi_symbol")) >>> fff >> ? ? ensembl_gene_id ensembl_transcript_id chromosome_name mgi_symbol >> 1 ENSMUSG00000088569 ? ?ENSMUST00000157944 ? ? ? ? ? ? ? 2 ? ? ? ? NA >> 2 ENSMUSG00000065226 ? ?ENSMUST00000083292 ? ? ? ? ? ? ? 9 ? ? ? ? NA >> 3 ENSMUSG00000088929 ? ?ENSMUST00000158304 ? ? ? ? ? ? ? 9 ? ? ? ? NA >> 4 ENSMUSG00000065282 ? ?ENSMUST00000083348 ? ? ? ? ? ? ? 9 ? ? ? ? NA >> >> suggesting that current ensembl annotation maps "the ID" to >> transcripts on chr 2 and chr 9. ?Perhaps biomaRt will yield more clues >> for you. >> >>> sessionInfo() >> R version 2.12.0 Under development (unstable) (2010-06-30 r52417) >> Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> ?[1] grid ? ? ?splines ? stats ? ? graphics ?grDevices datasets ?tools >> ?[8] utils ? ? methods ? base >> >> other attached packages: >> ?[1] biomaRt_2.5.1 ? ? ? ? ? ? ? ? ? ? ? ?mogene10stprobeset.db_5.0.2 >> ?[3] mogene10sttranscriptcluster.db_5.0.1 org.Mm.eg.db_2.4.1 ? ? ? ? ? ? ?etc... >> >> >> On Sat, Jul 3, 2010 at 4:02 PM, Maxim <deeepersound at="" googlemail.com=""> wrote: >>> Hi, >>> >>> I try to analyze MoGene-1_0-st gene arrays. I used the aroma package >>> to do this and came up with an expression matrix, but have no clue, >>> how to assign real gene names to the respective "IDs" (column "item >>> numbers" after aroma normalization and summarization). >>> >>> As a workaround I simply tried to load the mogene10stprobeset.db library and did >>> >>> u<-mget(row.names(x),mogene10stprobesetSYMBOL) >>> >>> with x being the expression matrix and rownames(x) are the IDs. But >>> the majority of IDs are unknown: >>> >>> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : >>> ?value for "10471503" not found >>> >>> But why? This ID is clearly correct: >>> 10471503 ? ? ? ? ? ? ? ?chr2:32530629-32530765 ?chr2 ? ?NC_000068.6 ? ? + ? ? ? 32530629 ? ? ? ?32530765 ? ? ? ?25 ? ? ?--- ? ? ENSMUST00000082819 >>> // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:2:32530629:32530765:1 >>> gene:ENSMUSG00000064753 // chr2 // 100 // 100 // 25 // 25 // 0 /// >>> ENSMUST00000083292 // ENSEMBL // ncrna:snoRNA >>> chromosome:NCBIM37:9:15119289:15119425:1 gene:ENSMUSG00000065226 // >>> chr2 // 72 // 100 // 18 // 25 // 0 ? ? ?main >>> >>> What is my problem, obviously I miss something? >>> >>> Maxim >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >

ADD COMMENT • link 13.9 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Good to hear about this! This is actually my first 1.0 st gene chip analysis, before I just did classical 3'-IVT arrays. Despite of the fact, that the aroma manuals I was able to find perhaps may not be as comprehensive as for the limma/affy packages I still was surprised to get my array results in less than 2 hours. Actually what took me most time was to come up with the idea to simply use my normal "limma related approach" to do final analysis of the expression matrix and subsequent annotation. "Fully worked examples" is indeed something many people would love to see, especially for those (like me) that may come over the analysis of an affymetrix array just very occasionally. Many thanks for all the efforts (of the whole mailing list people) Maxim 2010/7/4 Vincent Carey <stvjc at="" channing.harvard.edu="">: > Please keep responses on list. ?I am glad your results seem to make > more sense now. ?nevertheless, a tightening of the > preprocessing/analysis/annotation pipeline for affy gene/exon 1.0 st > arrays would be welcome, and the oligo package addresses this to some > extent. ?more fully worked examples for these arrays are in > development. > > On Sun, Jul 4, 2010 at 5:19 AM, Maxim <deeepersound at="" googlemail.com=""> wrote: >> Ooops, sorry, I was not aware of the complexity of the situation. >> Meanwhile I found out that using the (before not specified) >> mogene-1_0-st-v1,r3.cdf I anyhow just get the gene summaries and not >> the individual probesets for individual exons (ending up in roughly >> 35000 IDs as compared to the larger 10^6 total IDs/probes), at least >> when using the "short" protocol as suggested on the aroma homepage. I >> use latest aroma and R 2.10! >> >> According to this situation I found an older mailing list that >> suggested to use the mogene10sttranscriptcluster.db instead. Ineed, >> this solved all my problems (most of it, still a number if IDs are not >> idnetified). >> >> Anyhow, for my toptable presentation in Limma it's fine. >> >> Thanks you! >> >> Maxim >> >> 2010/7/4 Vincent Carey <stvjc at="" channing.harvard.edu="">: >>> Others will have to comment on the details of this mapping. ?You are >>> finding a "hit" to an eight-digit token of weakly specified provenance >>> (generated with aroma, no indication of version etc) and asserting >>> that an "ID is clearly correct", without telling us the resource where >>> you found the "hit". ?We can use biomart to follow up a bit >>> >>>> library(biomaRt) >>>> ss = useDataset("mmusculus_gene_ensembl", mart=useMart("ensembl")) >>>> fff = getBM(mart=ss, filters="affy_mogene_1_0_st_v1", values="10471503", attributes=c("ensembl_gene_id", >>> + ?"ensembl_transcript_id", "chromosome_name", "mgi_symbol")) >>>> fff >>> ? ? ensembl_gene_id ensembl_transcript_id chromosome_name mgi_symbol >>> 1 ENSMUSG00000088569 ? ?ENSMUST00000157944 ? ? ? ? ? ? ? 2 ? ? ? ? NA >>> 2 ENSMUSG00000065226 ? ?ENSMUST00000083292 ? ? ? ? ? ? ? 9 ? ? ? ? NA >>> 3 ENSMUSG00000088929 ? ?ENSMUST00000158304 ? ? ? ? ? ? ? 9 ? ? ? ? NA >>> 4 ENSMUSG00000065282 ? ?ENSMUST00000083348 ? ? ? ? ? ? ? 9 ? ? ? ? NA >>> >>> suggesting that current ensembl annotation maps "the ID" to >>> transcripts on chr 2 and chr 9. ?Perhaps biomaRt will yield more clues >>> for you. >>> >>>> sessionInfo() >>> R version 2.12.0 Under development (unstable) (2010-06-30 r52417) >>> Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> ?[1] grid ? ? ?splines ? stats ? ? graphics ?grDevices datasets ?tools >>> ?[8] utils ? ? methods ? base >>> >>> other attached packages: >>> ?[1] biomaRt_2.5.1 ? ? ? ? ? ? ? ? ? ? ? ?mogene10stprobeset.db_5.0.2 >>> ?[3] mogene10sttranscriptcluster.db_5.0.1 org.Mm.eg.db_2.4.1 ? ? ? ? ? ? ?etc... >>> >>> >>> On Sat, Jul 3, 2010 at 4:02 PM, Maxim <deeepersound at="" googlemail.com=""> wrote: >>>> Hi, >>>> >>>> I try to analyze MoGene-1_0-st gene arrays. I used the aroma package >>>> to do this and came up with an expression matrix, but have no clue, >>>> how to assign real gene names to the respective "IDs" (column "item >>>> numbers" after aroma normalization and summarization). >>>> >>>> As a workaround I simply tried to load the mogene10stprobeset.db library and did >>>> >>>> u<-mget(row.names(x),mogene10stprobesetSYMBOL) >>>> >>>> with x being the expression matrix and rownames(x) are the IDs. But >>>> the majority of IDs are unknown: >>>> >>>> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : >>>> ?value for "10471503" not found >>>> >>>> But why? This ID is clearly correct: >>>> 10471503 ? ? ? ? ? ? ? ?chr2:32530629-32530765 ?chr2 ? ?NC_000068.6 ? ? + ? ? ? 32530629 ? ? ? ?32530765 ? ? ? ?25 ? ? ?--- ? ? ENSMUST00000082819 >>>> // ENSEMBL // ncrna:snoRNA chromosome:NCBIM37:2:32530629:32530765:1 >>>> gene:ENSMUSG00000064753 // chr2 // 100 // 100 // 25 // 25 // 0 /// >>>> ENSMUST00000083292 // ENSEMBL // ncrna:snoRNA >>>> chromosome:NCBIM37:9:15119289:15119425:1 gene:ENSMUSG00000065226 // >>>> chr2 // 72 // 100 // 18 // 25 // 0 ? ? ?main >>>> >>>> What is my problem, obviously I miss something? >>>> >>>> Maxim >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >

ADD REPLY • link 13.9 years ago Maxim ▴ 170

Login before adding your answer.