Summarizing two-channel data (RGList, MAList) for limma analysis
4
0
Entering edit mode
@stephen-turner-4916
Last seen 5.7 years ago
United States
Hello. I have 4 Agilent two-channel arrays that I read in using read.maimages(). I've done normalization and background subtraction. How do I now summarize the probe information (62976 probes) to gene-level expression values (39430 entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() from the affy package when I have Affymetrix data. Thanks, Stephen [[alternative HTML version deleted]]
Normalization probe affy Normalization probe affy • 1.5k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
Hi, Stephen. You can use an average value, but for long-oligo arrays like Agilent, folks have often used the probe measurements directly. You can use the genefilter package to remove probes that do not vary across samples to reduce some of the redundancy; this increases power to detect differential expression by reducing the number of tests that must be included in the multiple-testing-correction. If you feel a strong need to summarize, using an average is probably not too bad an approach assuming that the probes for the same gene are correlated with each other (and many will be). Sean On Wed, Jan 11, 2012 at 4:45 PM, Stephen Turner <vustephen at="" gmail.com=""> wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Samuel Wuest ▴ 330
@samuel-wuest-2821
Last seen 9.6 years ago
Hi Stephen, one option is to simply average multiple probes matching to the same transcript, using the avereps()-function (from the limma package). It takes an ID-argument, where you specify the transcipt (or maybe even "locus")-names, so that probes with the same IDs will be averaged. If you have duplicate probes (technical replicates, that is the same probe sequences etc), then avedups will do the job. There might be other options too? As far as I know, rma/gcrma for Affy arrays are wrappers for a combination of functions, that includes background correction, quantile normalization and median polish summarization of the 11 probes, and as you have done the first two steps with your arrays, that is no longer necessary (plus I guess you do often have only 1-2 probes per transcript anyway, so there would not be a "median polish" option for that anyway). Hope this helps, best Sam On 11 January 2012 21:45, Stephen Turner <vustephen@gmail.com> wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ----------------------------------------------------- Samuel Wuest Smurfit Institute of Genetics Trinity College Dublin Dublin 2, Ireland Phone: +353-1-896 2444 Web: http://www.tcd.ie/Genetics/wellmer-2/index.html Email: wuests@tcd.ie ------------------------------------------------------ [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Stephen Hasn't the array vendor provided you already with some guidance on this? If there are multiple probes with different sequences supposedly targeting the same gene, I think you need assess (in some automated way) the alignment of the probes to the genome and to the gene model in order to see which of them is the 'better' one. Best wishes Wolfgang On 1/11/12 10:45 PM, Stephen Turner wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT
0
Entering edit mode
@daniel-aaen-hansen-5052
Last seen 10 months ago
Denmark
Dear Stephen, I have a similar situation and my approach has been to use avereps() from the limma package to average repeated probes. Then I use the biomaRt package to retrieve Ensembl's mapping for the probes since this is being updated with every new release of Ensembl. That should ensure a more up-to-date mapping to gene level. If you need one expression value for each gene you could then take the median or mean of repeated genes. If you want to convert the MA values back to RG values you can use the RG.MA() function from the limma package. That depends on how you are going to use your data. However, I think the most common is to use the MA values for downstream analysis, but I would also like to hear other opinions on this. Best, Daniel On Jan 11, 2012, at 10:45 PM, Stephen Turner wrote: > Hello. > > I have 4 Agilent two-channel arrays that I read in using read.maimages(). > I've done normalization and background subtraction. How do I now summarize > the probe information (62976 probes) to gene-level expression values (39430 > entrez RNAs, 16251 lincRNAs). I normally did this using rma() or gcrma() > from the affy package when I have Affymetrix data. > > Thanks, > > Stephen > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6