[EDGER] Normalization issue

0

Entering edit mode

François RICHARD ▴ 20

@francois-richard-5410

Last seen 9.6 years ago

Dear all, I am a master student in France, working on RNA-seq data. I am trying to go through a differential gene expression analysis using EdgeR and starting with 2 conditions * 2 replicates = 4 runs (illumina, mapped with bowtie on known reference genome). I have few questions about the normalization of the dataset. As I understood, the normalization is needed to correct the library size between each samples. It is given by the TMM method, calling the calcNormFactors() function. This give a normalization factor that will correspond to an offset in the model that will test for differential expressed genes. The function estimateCommonDisp() give the dispersion and the exactTest() run the differential analysis (performing negative binomial test). But according to the edgeR manual, those two functions called the equalizeLibSizes() function in order to generate pseudo counts (which corrected the library size as well). What I do not understand here is that the library size should be already corrected by the TMM method. My question is, finally : What is the difference between the calcNormFactors() and equalizeLibSizes()? Does the pseudo-counts generated by equalizeLibSizes() are taking care of the normalization factor? I hope I have been clear enough, and that you will be able to help me, Thanks a lot, Fran?ois

Normalization GO edgeR Normalization GO edgeR • 1.2k views

ADD COMMENT • link updated 11.8 years ago by James W. MacDonald 65k • written 11.8 years ago by François RICHARD ▴ 20

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 44 minutes ago

United States

Hi Francois, On 7/20/2012 9:57 AM, Fran?ois RICHARD wrote: > Dear all, > > I am a master student in France, working on RNA-seq data. > I am trying to go through a differential gene expression analysis > using EdgeR and starting with 2 conditions * 2 replicates = 4 runs > (illumina, mapped with bowtie on known reference genome). I have few > questions about the normalization of the dataset. > > As I understood, the normalization is needed to correct the library > size between each samples. It is given by the TMM method, calling the > calcNormFactors() function. No, the calcNormFactors() function is used to account for 'RNA composition', not library size. See section 2.3.3 in the edgeR User's guide. > This give a normalization factor that will correspond to an offset in > the model that will test for differential expressed genes. > > The function estimateCommonDisp() give the dispersion and the > exactTest() run the differential analysis (performing negative > binomial test). But according to the edgeR manual, those two functions > called the equalizeLibSizes() function in order to generate pseudo > counts (which corrected the library size as well). Right. The library size is automatically corrected. You _may_ need to use calcNormFactors() to account for situations where technical effects can bias your results. Two examples are given in 2.3.3 of the edgeR user's guide. Best, Jim > > What I do not understand here is that the library size should be > already corrected by the TMM method. > > My question is, finally : > What is the difference between the calcNormFactors() and > equalizeLibSizes()? Does the pseudo-counts generated by > equalizeLibSizes() are taking care of the normalization factor? > > I hope I have been clear enough, and that you will be able to help me, > > Thanks a lot, > > Fran?ois > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.8 years ago James W. MacDonald 65k

Login before adding your answer.