AffyID mapping question

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

I am working on a project trying to mapping Affymetrix probeset ID to Entrez ID, Gene Symbol and its chromosomal location. I used R package biomaRt and another one named mouse4302.db for Affymetrix Mouse430 2.0 array specifically. I noticed from the result, for genes have multiple probesets attached, only a small proportion of these probesets have a precise transcription start locations. While most of these probesets share the same start location with the given gene. Is there anyway I can get a better match in terms of the precise transcription start location for each probeset? -- output of sessionInfo(): R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mouse4302.db_2.4.5 org.Mm.eg.db_2.4.6 RSQLite_0.10.0 [4] DBI_0.2-5 AnnotationDbi_1.12.1 mouse4302cdf_2.7.0 [7] affy_1.28.1 Biobase_2.10.0 biomaRt_2.6.0 loaded via a namespace (and not attached): [1] affyio_1.18.0 preprocessCore_1.12.0 RCurl_1.5-0.1 [4] tools_2.12.2 XML_3.2-0.2 -- Sent via the guest posting facility at bioconductor.org.

Transcription mouse4302 Transcription mouse4302 • 1.5k views

ADD COMMENT • link updated 11.8 years ago by Marc Carlson ★ 7.2k • written 11.8 years ago by Guest User ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi, On Monday, July 2, 2012, Jiayi Hou [guest] wrote: > > I am working on a project trying to mapping Affymetrix probeset ID to > Entrez ID, Gene Symbol and its chromosomal location. I used R package > biomaRt and another one named mouse4302.db for Affymetrix Mouse430 2.0 > array specifically. I noticed from the result, for genes have multiple > probesets attached, only a small proportion of these probesets have a > precise transcription start locations. Can you clarify what you mean by a "transcription start location" for a probeset? Is this A function of the probes themselves? Or are you talking about the TSS of the gene that the probeset's probes land in. If it's the latter are these different TSS's just different annotated TSS's of different isoforms of the genes? > While most of these probesets share the same start location with the > given gene. Is there anyway I can get a better match in terms of the > precise transcription start location for each probeset? I guess I don't understand what you mean by the "start location" of a probeset -- perhaps you can clarify a bit more what you are trying to do? Perhaps more details about the problem you are trying to solve would also be helpful. > -- output of sessionInfo(): > > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) Thanks for also including your sessionInfo output -- while we're trying to sort this out, you might take this opportunity to upgrade your version of R to the latest (2.15.1) since we don't really try to support outdated versions of bioc packages. HTH, -Steve > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] mouse4302.db_2.4.5 org.Mm.eg.db_2.4.6 RSQLite_0.10.0 > [4] DBI_0.2-5 AnnotationDbi_1.12.1 mouse4302cdf_2.7.0 > [7] affy_1.28.1 Biobase_2.10.0 biomaRt_2.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 preprocessCore_1.12.0 RCurl_1.5-0.1 > [4] tools_2.12.2 XML_3.2-0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <javascript:;> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD COMMENT • link 11.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Jiayi, Side note: please CC the bioconductor list when replying to emails so they can stay online -- you'll get better help (more eyeballs on your problem), and the list can be used as a resource to others. I guess this might be a pain using the "guest posting" stuff -- but subscribing to the mailing list is easy, and you'll learn a lot by skimming the post that come through here. OK -- now to solver your problem: On Mon, Jul 2, 2012 at 11:03 AM, Jiayi Hou <houj2 at="" vcu.edu=""> wrote: > Hey Steve, > > Sorry let me put it this way, so when a probeset hybridized to a given gene, > the gene has a chromosomal location in terms of base pair. For a given gene, > on average there may be 2-3 probesets attach to the same gene. However, > these 2-3 probesets carrying different sequence of base pairs, are expected > to attach to the different location oin the given gene. What I am looking > for is where precisly these probesets attach to the gene. Thanks, that's a bit clearer now. In the past I've done this with a little elbow grease: you can get the probe sequence info for the chip you're using from this package: http://bioconductor.org/packages/2.10/data/annotation/html/htmg430apro be.html There's a short vignette on matching probe sequences (against each other, which isn't all that helpful for you, but can be a start) using the Biostrings package here: http://bioconductor.org/packages/2.10/bioc/vignettes/Biostrings/inst/d oc/matchprobes.pdf You can extend the examples there by matching your probes against the mouse genome using the appropriate BSgenome package (BSgenome.Mmusculus.UCSC.mm9). Alternatively, you can follow section 4.1 of the biomaRt vignette here: http://bioconductor.org/packages/2.10/bioc/vignettes/biomaRt/inst/doc/ biomaRt.pdf For example: R> ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl") R> affyids <- c("202763_at","209310_s_at","207500_at") R> getBM(attributes=c('affy_hg_u133_plus_2', 'hgnc_symbol', 'chromosome_name','start_position','end_position', 'band'), filters = 'affy_hg_u133_plus_2', values = affyids, mart = ensembl) affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position end_position band 1 202763_at CASP3 4 185548850 185570663 q35.1 2 209310_s_at CASP4 11 104813593 104840163 q22.3 3 207500_at CASP5 11 104864962 104893895 q22.3 You'll have to change the "mart/dataset" you are using, as well as the chip id's, but you should get the idea. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 11.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Although it is not a BioConductor solution, you should check out the Splice Center website at the NIH for a nice view of probe locations across isoforms. On Jul 2, 2012 10:54 AM, "Steve Lianoglou" <mailinglist.honeypot@gmail.com> wrote: > Hi Jiayi, > > Side note: please CC the bioconductor list when replying to emails so > they can stay online -- you'll get better help (more eyeballs on your > problem), and the list can be used as a resource to others. > > I guess this might be a pain using the "guest posting" stuff -- but > subscribing to the mailing list is easy, and you'll learn a lot by > skimming the post that come through here. > > OK -- now to solver your problem: > > On Mon, Jul 2, 2012 at 11:03 AM, Jiayi Hou <houj2@vcu.edu> wrote: > > Hey Steve, > > > > Sorry let me put it this way, so when a probeset hybridized to a given > gene, > > the gene has a chromosomal location in terms of base pair. For a given > gene, > > on average there may be 2-3 probesets attach to the same gene. However, > > these 2-3 probesets carrying different sequence of base pairs, are > expected > > to attach to the different location oin the given gene. What I am looking > > for is where precisly these probesets attach to the gene. > > Thanks, that's a bit clearer now. > > In the past I've done this with a little elbow grease: you can get the > probe sequence info for the chip you're using from this package: > > > http://bioconductor.org/packages/2.10/data/annotation/html/htmg430ap robe.html > > There's a short vignette on matching probe sequences (against each > other, which isn't all that helpful for you, but can be a start) using > the Biostrings package here: > > > http://bioconductor.org/packages/2.10/bioc/vignettes/Biostrings/inst /doc/matchprobes.pdf > > You can extend the examples there by matching your probes against the > mouse genome using the appropriate BSgenome package > (BSgenome.Mmusculus.UCSC.mm9). > > Alternatively, you can follow section 4.1 of the biomaRt vignette here: > > > http://bioconductor.org/packages/2.10/bioc/vignettes/biomaRt/inst/do c/biomaRt.pdf > > For example: > > R> ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl") > R> affyids <- c("202763_at","209310_s_at","207500_at") > R> getBM(attributes=c('affy_hg_u133_plus_2', 'hgnc_symbol', > 'chromosome_name','start_position','end_position', 'band'), > filters = 'affy_hg_u133_plus_2', values = affyids, mart = ensembl) > > affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position > end_position band > 1 202763_at CASP3 4 185548850 > 185570663 q35.1 > 2 209310_s_at CASP4 11 104813593 > 104840163 q22.3 > 3 207500_at CASP5 11 104864962 > 104893895 q22.3 > > You'll have to change the "mart/dataset" you are using, as well as the > chip id's, but you should get the idea. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Kevin Coombes ▴ 430

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Hi Jiayi, If you 1st upgrade to a modern version of R, then you should be able to do stuff like this: library(mouse4302.db) keys = c("1415670_at", "1415671_at", "1415672_at") cols(mouse4302.db) keytypes(mouse4302.db) select(mouse4302.db, keys= keys, cols=c("SYMBOL","CHRLOC"), keytype="PROBEID") Please let us know if you need more help, Marc On 07/02/2012 04:41 AM, Jiayi Hou [guest] wrote: > I am working on a project trying to mapping Affymetrix probeset ID to Entrez ID, Gene Symbol and its chromosomal location. I used R package biomaRt and another one named mouse4302.db for Affymetrix Mouse430 2.0 array specifically. I noticed from the result, for genes have multiple probesets attached, only a small proportion of these probesets have a precise transcription start locations. While most of these probesets share the same start location with the given gene. Is there anyway I can get a better match in terms of the precise transcription start location for each probeset? > > -- output of sessionInfo(): > > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] mouse4302.db_2.4.5 org.Mm.eg.db_2.4.6 RSQLite_0.10.0 > [4] DBI_0.2-5 AnnotationDbi_1.12.1 mouse4302cdf_2.7.0 > [7] affy_1.28.1 Biobase_2.10.0 biomaRt_2.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 preprocessCore_1.12.0 RCurl_1.5-0.1 > [4] tools_2.12.2 XML_3.2-0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.8 years ago Marc Carlson ★ 7.2k

Login before adding your answer.