**0**wrote:

Hello,

I have counts from 32 RNA-seq samples and am trying to study drug effect using DESeq2. I have some trouble of understanding the dispersion estimation after applying a set of customized normalization factors.

First I used raw counts as input and called DESeq function directly. DESeq2 estimated size factor and dispersion automatically. I found that fitted values reflected the regression of gene-wised dispersion. Although final value looked almost exact the same as gene-wised dispersion, from my understanding it's fine as I have quite a lot samples. Please correct me if I'm wrong. The figure links to here:

https://drive.google.com/file/d/0B0LC8cC9lS-ycVU3cTdGcy1RY0k/view?usp=sharing

Then I used a set of gene-wised normalization factors. In order to keep the raw counts at a similar level, those normalization factors have been divided by their geometric means. However, after dispersion estimation, I found that the fitted values seemed to be overestimated, as most of gene-wised dispersions were under the line. And final adjustment didn't lift them toward the line. The figure links to here:

https://drive.google.com/file/d/0B0LC8cC9lS-yVkRkeUM4aGRZT28/view?usp=sharing

I also tried to calculate fitted value locally, it changed almost nothing on the overall pattern.

Is this because that the normalization factors here breaks the rule of sharing information across genes? Any help is really appreciated. Codes attached below:

library(DESeq2)`library(matrixStats)`

`gm_mean = function(x, na.rm=TRUE, zero.propagate = FALSE){ if(any(x < 0, na.rm = TRUE)){ return(NaN) } if(zero.propagate){ if(any(x == 0, na.rm = TRUE)){ return(0) } exp(mean(log(x), na.rm = na.rm)) } else { exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x)) } }`

`group=read.table("conditions_new.txt",header=T) group$Phenotype=relevel(group$Phenotype,"wt") group$Treatment=relevel(group$Treatment,"ctrl")`

d=read.table("SEMatrix.txt",header=T,row.names=1,check.names=F) d=d[,2:dim(d)[2]] #get rid of the annotation column d=as.matrix(d)`gc=d[,1:(dim(d)[2]/2)] tc=d[,(1+dim(d)[2]/2):dim(d)[2]] #Raw normalization factors`

`dd = DESeqDataSetFromMatrix(countData = gc, colData = group, design = ~1) gm=apply(tc,1,gm_mean) #Calculate geometric mean for normalization factors normFactors <- tc / gm #dividing geometric means normFactors[normFactors==0]=1 #Set normalization factors of 0 to 1 normalizationFactors(dd) <- normFactors #insert normalization factors back to DESeq2`

`design(dd) = ~Phenotype*Treatment+Gender+Batch`

`dd = estimateDispersions(dd)`

`plotDispEsts(dd)`

Thanks a lot,

Dadi