Hi all,
My understanding is that DESeq2 should by default normalize for differences in the number of reads. However, I am concerned that the magnitude of this difference is a bit extreme and I wonder if a different and/or additional normalization is needed to account for this. I accounted for this in the design, but I still have such a high number of genes that are significant (~900 with padj < 0.01 out of ~3000 genes).
dds = DESeqDataSetFromMatrix(countData=Counts,
colData=metadata,
design=~Batch + HTSeq_assigned_reads_M + Treatment)
For reference, below is a plot that shows the number of input reads for each sample. The darker green is treatment A, the lighter green is treatment B.
Thank you in advance for any guidance!
Hi Michael,
Many thanks for your feedback! Apologies that I missed this in the 2014 paper. I will be sure to review that again. Would you recommend keeping the input reads in the DESeq design or is that redundant?
No, don't recommend. Sequencing depth differences are taken care of with size factor estimation.