Question: normalisation of count data
gravatar for sara.beier
11 months ago by
sara.beier0 wrote:


I am using both the normMatrix and the controlGenes option in DeSeq to create size Factors in a RNA-Seq time series experiment to test against changes against T0 using the LRT-test.

If working with a full dataset and a second dataset containing count data only for a subset of genes (though the same genes are indicated in both cases for the controlGene option), as expected, the normalization Factors stay the same for those genes being present in both, the full dataset and the subset. Also the baseMean in the result files is equal for genes being present in both datasets, I however was surprised to see that the log2FoldChange values against T0 change slightly. Shouldn't these values be constant for a specific gene, if both the raw count data and the normalization Factors for this gene are constant, independent from the presence of other genes in the database? Can somebody explain this to me?

thanks in advance,


deseq2 • 165 views
ADD COMMENTlink modified 11 months ago by Ryan C. Thompson7.2k • written 11 months ago by sara.beier0
Answer: normalisation of count data
gravatar for Ryan C. Thompson
11 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.2k wrote:

If you were fitting a linear model, the log fold changes would be identical. However, in a negative binomial GLM, the calculation of log fold changes depends on the estimated dispersion parameter, which depends on other genes. You might consider subsetting your genes after dispersion estimation, which will ensure that you are using the same dispersion estimates for both cases. This is usually the better way to go about things anyway, since using more genes for dispersion estimation yields a more robust trend. (The exception would be genes that you discard for being outliers, since they might distort the dispersion trend.)

ADD COMMENTlink written 11 months ago by Ryan C. Thompson7.2k

Agree with Ryan. One details is that dispersion outliers shouldn’t affect the dispersion trend because the trend is iteratively fit while excluding genes that are outliers. This procedure of DESeq2 goes back to the DESeq method for fitting the trend using a gamma GLM. If it doesn’t converge after 10 iterations it quits and uses loess.

ADD REPLYlink written 11 months ago by Michael Love22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 134 users visited in the last hour