how to annotate Illumina HumanHT-12 v3 chips?

0

Entering edit mode

Feng Tian ▴ 110

@feng-tian-5581

Last seen 9.6 years ago

Hi all list, I want to annotate Illumina HumanHT-12 v3 chips by using the annotation file download from Illumina. The Illumina probes are classified by RefSeq NM Coding transcript, well-established annotation XM Coding transcript, provisional annotation NR Non-coding transcript, well-established annotation XR Non-coding transcript, provisional annotation Supplementary Content UniGene (Build 199) Experimentally confirmed mRNA sequences that align to EST clusters. I have the following questions 1) Should I use all kinds of these probes? Should I only use the RefSeq NM probes? 2) If different kinds of probes (such as RefSeq NM and RefSeq XM) are mapped to the same gene, how to combine them? Thanks. [[alternative HTML version deleted]]

annotate annotate • 2.7k views

ADD COMMENT • link updated 11.1 years ago by Ying W ▴ 90 • written 11.2 years ago by Feng Tian ▴ 110

0

Entering edit mode

Ying W ▴ 90

@ying-w-4341

Last seen 8.7 years ago

United States

The answer depends on what you are interested in biologically. If you are only interested in well-annotated coding genes for downstream pathway analysis, then only use the NM_ genes. Is there a reason why you want to combine the probes for each gene? Some software that I've used has used median/mean to collapse multiple probes signals into one gene but I found it useful to work with things on the probe level. As a sidenote, there is a bioconductor package that has Illumina annotations that might be better than the ones the manufacture provides and it can be found here: http://www.bioconductor.org/packages/2.11/data/annotation/html/illumin aHumanv3.db.html Best, Ying On 3/4/2013 3:00 AM, bioconductor-request at r-project.org wrote: > Message: 3 > Date: Sun, 3 Mar 2013 16:30:30 -0500 > From: Feng Tian <fengtian at="" bu.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] how to annotate Illumina HumanHT-12 v3 chips? > Message-ID: > <calimmdjdzoq_11cc4qbhrspaazfdl6rdu_bpbtn9yrnyfddnea at="" mail.gmail.com=""> > Content-Type: text/plain > > Hi all list, > > I want to annotate Illumina HumanHT-12 v3 chips by using the annotation > file download from Illumina. > The Illumina probes are classified by > > RefSeq > NM Coding transcript, well-established annotation > XM Coding transcript, provisional annotation > NR Non-coding transcript, well-established annotation > XR Non-coding transcript, provisional annotation > > Supplementary Content > UniGene (Build 199) Experimentally confirmed mRNA sequences that align to > EST clusters. > > I have the following questions > 1) Should I use all kinds of these probes? Should I only use the RefSeq NM > probes? > > 2) If different kinds of probes (such as RefSeq NM and RefSeq XM) are > mapped to the same gene, how to combine them? > > Thanks.

ADD COMMENT • link 11.1 years ago Ying W ▴ 90

0

Entering edit mode

I tend to work with things on the probe-level too. I have seen many example of genes where some probes target obscure transcripts or are just badly-designed. By including them in the averaging, you would be diluting the signal from the good probes. Sometimes you have to collapse the data to one probe per-gene, and for this I tend to pick the "best" probe for each gene by using a measure such as IQR across the dataset. On Wed, Mar 6, 2013 at 1:52 AM, Ying Wu <daiyingw at="" gmail.com=""> wrote: > The answer depends on what you are interested in biologically. If you are > only interested in well-annotated coding genes for downstream pathway > analysis, then only use the NM_ genes. > > Is there a reason why you want to combine the probes for each gene? Some > software that I've used has used median/mean to collapse multiple probes > signals into one gene but I found it useful to work with things on the probe > level. > > As a sidenote, there is a bioconductor package that has Illumina annotations > that might be better than the ones the manufacture provides and it can be > found here: > http://www.bioconductor.org/packages/2.11/data/annotation/html/illum inaHumanv3.db.html > > Best, > Ying > > On 3/4/2013 3:00 AM, bioconductor-request at r-project.org wrote: >> >> Message: 3 >> Date: Sun, 3 Mar 2013 16:30:30 -0500 >> From: Feng Tian <fengtian at="" bu.edu=""> >> To: bioconductor at r-project.org >> Subject: [BioC] how to annotate Illumina HumanHT-12 v3 chips? >> Message-ID: >> >> <calimmdjdzoq_11cc4qbhrspaazfdl6rdu_bpbtn9yrnyfddnea at="" mail.gmail.com=""> >> Content-Type: text/plain >> >> >> Hi all list, >> >> I want to annotate Illumina HumanHT-12 v3 chips by using the annotation >> file download from Illumina. >> The Illumina probes are classified by >> >> RefSeq >> NM Coding transcript, well-established annotation >> XM Coding transcript, provisional annotation >> NR Non-coding transcript, well-established annotation >> XR Non-coding transcript, provisional annotation >> >> Supplementary Content >> UniGene (Build 199) Experimentally confirmed mRNA sequences that align to >> EST clusters. >> >> I have the following questions >> 1) Should I use all kinds of these probes? Should I only use the RefSeq NM >> probes? >> >> 2) If different kinds of probes (such as RefSeq NM and RefSeq XM) are >> mapped to the same gene, how to combine them? >> >> Thanks. > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.1 years ago Mark Dunning ★ 1.1k

0

Entering edit mode

Hi, For sake of discussion/pedagogy, when you say: > Sometimes you have to collapse the data to one probe per-gene, and for > this I tend to pick the "best" probe for each gene by using a measure > such as IQR across the dataset. In this scenario, which probe is "best" -- is it the one with the largest IQR, or the smallest? Assuming both probes map uniquely to the genome, both of which are (as far as we know) "probing" the same gene of interest, what are some "sound" reasons to pick one over the other? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

My opinion: best = largest IQR (as a first approximation). Some probes are broken, and never give meaningful values; they tend to have small IQR. Keep in mind that this still does not deal with the fact that (regardless of what we or the companies designing the probes) might like to think, some probes simply do not measure what they are supposed to measure. Here is an interesting (in the ambiguous sense of "may you live in interesting times") exercise. Pick any data set using Affymetrix U133 arrays with about 100 samples. Extract the data for all probe-sets that target fibronectin. Construct a pairs plot of the extracted data. The ask yourself (1) if you really believe all probes are measuring the same thing and (2) which probes you want want to use as your best guess at a reasonable measure of fibronectin expression. Kevin On 3/6/2013 10:20 AM, Steve Lianoglou wrote: > Hi, > > For sake of discussion/pedagogy, when you say: > >> Sometimes you have to collapse the data to one probe per-gene, and for >> this I tend to pick the "best" probe for each gene by using a measure >> such as IQR across the dataset. > In this scenario, which probe is "best" -- is it the one with the > largest IQR, or the smallest? > > Assuming both probes map uniquely to the genome, both of which are (as > far as we know) "probing" the same gene of interest, what are some > "sound" reasons to pick one over the other? > > -steve >

ADD REPLY • link 11.1 years ago Kevin Coombes ▴ 430

Login before adding your answer.