normalizing time course RNA-Seq data

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Anand, Some comments injected below ... On 28.12.2011, at 10:50, AKSR wrote: > Hi all, > > I have some RNA-Seq data: > 4 reps per sample, 4 different genotypes & 9 time points > = 144 data points > > I want to essentially know the best method to normalize across > ALL time points and for each INDIVIDUAL genotype. > Is the state of the art normalization method today, TMM? I'm not sure if TMM is "best", but it can certainly improve things. Basically, the whole idea with TMM is that naively using totals of mapped reads can bias differential expression, since different experimental conditions can express different "repertoires". > If yes, is TMM step-by-step procedure available any where? > (I do some Perl scripting, but I am pretty new to R) TMM is available in edgeR's calcNormFactors() function. > I realize that edgeR might be using TMM for pair-wise > comparison, but I need to perform normalization across > time points for each genotype. > Irrespective of normalization strategy, will I have to choose > the base level sample aka reference for normalization? > Or can normalization be done independent of an > overtly defined reference state? > - I know this is a naive question, sorry... > (If required, I would use time point zero as my reference state) With TMM, you can manually define what reference sample to use, or the default is to leave it unspecified ? the docs for calcNormFactors() says: ---- If ?refColumn? is unspecified, the library whose upper quartile is closest to the mean upper quartile is used. ---- While TMM is pairwise in nature, it may work just fine this way across your genotypes and time points. I think it's worth trying it and looking at "smear" plots -- plotSmear() in edgeR -- between some of your time points (of the same genotype, say), just to see whether the normalization factors are aligning the M values. There are other normalization strategies implemented too, that are not explicitly pairwise -- see ?calcNormFactors. For example, method="RLE", as proposed by the DESeq authors: ---- ?method="RLE"? is the scaling factor method proposed by Anders and Huber (2010). We call it "relative log expression", as median library is calculated from the geometric mean of all columns and the median ratio of each sample to the median library is taken as the scale factor. ---- As well, people are actively considering this problem in other directions (e.g. GC content). For example: http://www.bioconductor.org/packages/release/bioc/html/cqn.html http://www.biomedcentral.com/1471-2105/12/480/abstract Hope that helps, Mark > Thanks in advance for guiding me > AKSR > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Normalization edgeR DESeq Normalization edgeR DESeq • 1.6k views

ADD COMMENT • link 12.4 years ago Mark Robinson ▴ 880

Login before adding your answer.