Hello
I am using both the normMatrix and the controlGenes option in DeSeq to create size Factors in a RNA-Seq time series experiment to test against changes against T0 using the LRT-test.
If working with a full dataset and a second dataset containing count data only for a subset of genes (though the same genes are indicated in both cases for the controlGene option), as expected, the normalization Factors stay the same for those genes being present in both, the full dataset and the subset. Also the baseMean in the result files is equal for genes being present in both datasets, I however was surprised to see that the log2FoldChange values against T0 change slightly. Shouldn't these values be constant for a specific gene, if both the raw count data and the normalization Factors for this gene are constant, independent from the presence of other genes in the database? Can somebody explain this to me?
thanks in advance,
Sara
Agree with Ryan. One details is that dispersion outliers shouldn’t affect the dispersion trend because the trend is iteratively fit while excluding genes that are outliers. This procedure of DESeq2 goes back to the DESeq method for fitting the trend using a gamma GLM. If it doesn’t converge after 10 iterations it quits and uses loess.