Annotating HGU133plus2 genes with number of coding changes

0

Entering edit mode

marco zucchelli ▴ 320

@marco-zucchelli-1987

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070419/ 9500deb7/attachment.pl

• 1.8k views

ADD COMMENT • link updated 18.7 years ago by Steffen Durinck ▴ 580 • written 18.7 years ago by marco zucchelli ▴ 320

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 10 months ago

United States

On Thursday 19 April 2007 09:33, marco zucchelli wrote: > Hi Steffen, > > one more question: In the example i reported before seems like some probes > are reported twice, > i.e. 207893_at is listed 2 times matched to the same gene ID. Totally the > "probes" vector contains the probes from hgu133plus2 (54675) but the query > returns 66565 rows. > > I do not understand really the meaning of this .. > > Regards > > Marco > > probe.list <- > getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filters= "affy_h >g_u133_plus_2", values=probes, mart=mart) > > head(probes.list) > > ensembl_gene_id affy_hg_u133_plus_2 > 1 ENSG00000184895 207893_at > 2 ENSG00000184895 207893_at > 3 ENSG00000129824 201909_at > 4 ENSG00000129824 201909_at > 5 ENSG00000067646 207247_s_at > 6 ENSG00000067646 207246_at > > On 4/3/07, Steffen Durinck <durincks at="" mail.nih.gov=""> wrote: > > Hi Marco, > > > > It matches the transcripts and then maps those transcripts to the genes, > > even if you don't include the transcript id in the query. > > To see this you could set attributes = > > c("ensembl_gene_id","ensembl_transcript_id","affy_hg_u133_plus_2") in > > your query. Also if Ensembl didn't find a match for the affy probe then > > it won't be included in the output and if they find multiple matches > > then all of them will be returned. Marco, Try the suggestion that Steffen gave above (setting the attributes to include the transcript). The mapping is NOT done to the gene, but to the transcript, and there may be multiple transcripts for the same gene, each of which may be mapped to one or more affy_ids. Sean

ADD COMMENT • link 18.7 years ago Sean Davis 21k

0

Entering edit mode

Steffen Durinck ▴ 580

@steffen-durinck-1780

Last seen 11.3 years ago

Hi Marco, Ensembl maps everything to the transcript level and when there are multiple transcripts for one gene, a query will return multiple hits for that gene. To see this better you could add the "ensembl_transcript_id" to your query: probe.list <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id","affy_hg_ u133_plus_2"),filters="affy_hg_u133_plus_2", values=probes, mart=mart) You'll see that you'll get a different transcript and that on this level there is no redundancy. The mapping to the transcript level is a choice of the Ensembl team and we can not change this. It makes sense for other annotation information such as protein domains, some alternative spliced transcripts might have a certain domain and other transcripts of the same gene might not have this domain. Or if you would query for 3'UTRs by mapping to the transcript level you can retrieve all different UTRs associated with a gene. Different transcripts of the same gene might even have different functions and the current strategy would allow transcript specific GO annotations... Best regards, Steffen marco zucchelli wrote: > Hi Steffen, > > one more question: In the example i reported before seems like some > probes are reported twice, > i.e. 207893_at is listed 2 times matched to the same gene ID. Totally > the "probes" vector contains the probes from hgu133plus2 (54675) but > the query returns 66565 rows. > > I do not understand really the meaning of this .. > > Regards > > Marco > > probe.list <- > getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filters= "affy_hg_u133_plus_2", > values=probes, mart=mart) > > head(probes.list) > > ensembl_gene_id affy_hg_u133_plus_2 > 1 ENSG00000184895 207893_at > 2 ENSG00000184895 207893_at > 3 ENSG00000129824 201909_at > 4 ENSG00000129824 201909_at > 5 ENSG00000067646 207247_s_at > 6 ENSG00000067646 207246_at > > > > On 4/3/07, *Steffen Durinck * <durincks at="" mail.nih.gov=""> <mailto:durincks at="" mail.nih.gov="">> wrote: > > Hi Marco, > > It matches the transcripts and then maps those transcripts to the > genes, > even if you don't include the transcript id in the query. > To see this you could set attributes = > c("ensembl_gene_id","ensembl_transcript_id","affy_hg_u133_plus_2") in > your query. Also if Ensembl didn't find a match for the affy > probe then > it won't be included in the output and if they find multiple matches > then all of them will be returned. > > For the second part of your question: No, the ordering is random so > you'll have reorder the output with e.g. the match function or loop > over it. > > Cheers, > Steffen > > marco zucchelli wrote: > > Steffen, > > > > Anyway does this procedure match the affy_ID to the specific > > transcript(s) that that probeset is targetting or does it match > to it > > to a gene and then gets all the available transcripts for the gene? > > > > Morover, it seems that the returned values from getBM are not > ordered > > as the input values. > > Infact, if I use: > > > > head(probes) > > [1] "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" > > "AFFX-BioC-5_at" "AFFX-BioC-3_at" "AFFX-BioDn-5_at" > > > > probe.list <- > > > getBM(attributes=c("ensembl_gene_id","affy_hg_u133_plus_2"),filt ers="affy_hg_u133_plus_2", > > values=probes, mart=mart) > > > > head( probes.list) > > > > ensembl_gene_id affy_hg_u133_plus_2 > > 1 ENSG00000184895 207893_at > > 2 ENSG00000184895 207893_at > > 3 ENSG00000129824 201909_at > > 4 ENSG00000129824 201909_at > > 5 ENSG00000067646 207247_s_at > > 6 ENSG00000067646 207246_at > > > > Is there any rule based on which the probes are ordered by getBM? > > Or I am doing somethign wrong? > > > > > > Marco > > > > > > > > On 3/30/07, *Steffen Durinck* <durincks at="" mail.nih.gov=""> <mailto:durincks at="" mail.nih.gov=""> > > <mailto:durincks at="" mail.nih.gov="" <mailto:durincks="" at="" mail.nih.gov="">>> > wrote: > > > > Hi Marco, > > > > You can do this with the biomaRt package (use the devel > version, >= > > 1.9.21) , here's how: > > > > library(biomaRt) > > mart=useMart("ensembl", dataset="hsapiens_gene_ensembl") > > > getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id","sy nonymous_snp_count","non_synonymous_snp_count"), > > filters="affy_hg_u133_plus_2", > values=c("201746_at","231640_at"), > > mart=mart) > > > > it will give: > > > > ensembl_gene_id ensembl_transcript_id synonymous_snp_count > > non_synonymous_snp_count > > 1 ENSG00000141510 ENST00000269305 > > 5 20 > > 2 ENSG00000133703 ENST00000256078 > > 1 1 > > 3 ENSG00000133703 ENST00000311936 > > 1 1 > > > > > > Unfortunately you won't be able to get the affy id in the output > > but you > > can use biomaRt to map the Ensembl ids in the output back to the > > afffy ids. > > > > Cheers, > > Steffen > > > > > > marco zucchelli wrote: > > > Hi, > > > > > > I was wondering if it exists an annotation package for Affy > > 133plus2 > > > reporting the number of synonymous & non synonymous > changes for the > > > genes on the array. > > > > > > If it does not exist does anybody has a good > suggestion about > > how to > > > retrive this information from databases ? > > > > > > > > > Marco > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > > <mailto:bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">> > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > -- > Steffen Durinck, Ph.D. > > Oncogenomics Section > Pediatric Oncology Branch > National Cancer Institute, National Institutes of Health > URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ > > Phone: 301-402-8103 > Address: > Advanced Technology Center, > 8717 Grovemont Circle > Gaithersburg, MD 20877 > > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877

ADD COMMENT • link 18.7 years ago Steffen Durinck ▴ 580

0

Entering edit mode

Dear all! I assume that some of you are aware of a paper by Allison et al from 2002 where they discuss fitting a mixture of beta-distributions (one of them being the uniform) to the distribution of p-values from a microarray experiment. Has this been implemented into Bioconductor somewhere or is one of you aware of other R libraries that could help with fitting mixture of betas? Thanks, Claus -- ********************************************************************** ************* Dr Claus-D. Mayer | http://www.bioss.ac.uk Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk Rowett Research Institute | Telephone: +44 (0) 1224 716652 Aberdeen AB21 9SB, Scotland, UK. | Fax: +44 (0) 1224 715349

ADD REPLY • link 18.7 years ago Claus Mayer ▴ 340

0

Entering edit mode

Hi, An equivalent method (BUM) was also described by Pounds and Morris in Bioinformatics in early 2003. BUM is implemented in OOMPA, which is available at http://bioinformatics.mdanderson.org/software.html, and if I ever feel like I have time to maintain it through the steady pace of new BioConductor releases, then I'll submit OOMPA. For more general mixtures of betas, you might also look at Ji Y, Wu C, Liu P, Wang J, Coombes KR. Applications of beta-mixture models in bioinformatics. Bioinformatics. 2005 May 1;21(9):2118-22. -- Kevin Claus Mayer wrote: > Dear all! > > I assume that some of you are aware of a paper by Allison et al from > 2002 where they discuss fitting a mixture of beta-distributions (one of > them being the uniform) to the distribution of p-values from a > microarray experiment. > > Has this been implemented into Bioconductor somewhere or is one of you > aware of other R libraries that could help with fitting mixture of betas? > > Thanks, > > Claus

ADD REPLY • link 18.7 years ago Kevin R. Coombes ▴ 140

0

Entering edit mode

marco zucchelli ▴ 320

@marco-zucchelli-1987

Last seen 11.3 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070419/ e99c32b6/attachment.pl

ADD COMMENT • link 18.7 years ago marco zucchelli ▴ 320

Login before adding your answer.