Question

Unbalanced design and large difference in number of input reads for DESeq2

0

Entering edit mode

jillian.waters • 0

@jillianwaters-14491

Last seen 5.2 years ago

Hi all,

My understanding is that DESeq2 should by default normalize for differences in the number of reads. However, I am concerned that the magnitude of this difference is a bit extreme and I wonder if a different and/or additional normalization is needed to account for this. I accounted for this in the design, but I still have such a high number of genes that are significant (~900 with padj < 0.01 out of ~3000 genes).

dds = DESeqDataSetFromMatrix(countData=Counts,
                         colData=metadata,
                         design=~Batch + HTSeq_assigned_reads_M + Treatment)

For reference, below is a plot that shows the number of input reads for each sample. The darker green is treatment A, the lighter green is treatment B.

enter image description here

Thank you in advance for any guidance!

deseq2 normalization • 641 views

ADD COMMENT • link updated 5.2 years ago by Michael Love 43k • written 5.2 years ago by jillian.waters • 0

score 0 · Answer 1 · 2020-04-17

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 23 days ago

United States

We talk about this in the 2014 paper actually, that when there are large differences in sequencing depth that are confounded with the condition, then it is a pathological case. The problem is that you will have positive counts vs 0 for lowly expressed genes, which may be null but look non-null in the data due to confounding. It cannot be fixed by adding sequencing depth to the design (anyway this is redundant with size factor estimation).

If you were concerned about looking at light green vs dark green, I think a safer approach would be to only consider genes in which you have 4 samples with a count of 10 or more.

keep <- rowSums(counts(dds) >= 10) >= 4
dds <- dds[keep,]
# then DESeq()

ADD COMMENT • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Many thanks for your feedback! Apologies that I missed this in the 2014 paper. I will be sure to review that again. Would you recommend keeping the input reads in the DESeq design or is that redundant?

ADD REPLY • link 5.2 years ago jillian.waters • 0

0

Entering edit mode

No, don't recommend. Sequencing depth differences are taken care of with size factor estimation.

ADD REPLY • link 5.2 years ago Michael Love 43k