Unbalanced design and large difference in number of input reads for DESeq2
1
0
Entering edit mode
@jillianwaters-14491
Last seen 4.0 years ago

Hi all,

My understanding is that DESeq2 should by default normalize for differences in the number of reads. However, I am concerned that the magnitude of this difference is a bit extreme and I wonder if a different and/or additional normalization is needed to account for this. I accounted for this in the design, but I still have such a high number of genes that are significant (~900 with padj < 0.01 out of ~3000 genes).

dds = DESeqDataSetFromMatrix(countData=Counts,
                         colData=metadata,
                         design=~Batch + HTSeq_assigned_reads_M + Treatment)

For reference, below is a plot that shows the number of input reads for each sample. The darker green is treatment A, the lighter green is treatment B.

enter image description here

Thank you in advance for any guidance!

deseq2 normalization • 418 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

We talk about this in the 2014 paper actually, that when there are large differences in sequencing depth that are confounded with the condition, then it is a pathological case. The problem is that you will have positive counts vs 0 for lowly expressed genes, which may be null but look non-null in the data due to confounding. It cannot be fixed by adding sequencing depth to the design (anyway this is redundant with size factor estimation).

If you were concerned about looking at light green vs dark green, I think a safer approach would be to only consider genes in which you have 4 samples with a count of 10 or more.

keep <- rowSums(counts(dds) >= 10) >= 4
dds <- dds[keep,]
# then DESeq()
ADD COMMENT
0
Entering edit mode

Hi Michael,

Many thanks for your feedback! Apologies that I missed this in the 2014 paper. I will be sure to review that again. Would you recommend keeping the input reads in the DESeq design or is that redundant?

ADD REPLY
0
Entering edit mode

No, don't recommend. Sequencing depth differences are taken care of with size factor estimation.

ADD REPLY

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6