Question

DESeq2, Calculate the size factor by using median-of-ratio

0

Entering edit mode

KEREN • 0

@578388c6

Last seen 22 months ago

United States

Hi there,

When I using DESeq2, can I drop the bottom 10% and top 10% of read counts to calculate the size factor by using median-of-ratio (which means exclude the high and low expressed genes from calculating the size factor)?

Best,

Keren

median DESeq2 • 1.2k views

ADD COMMENT • link 23 months ago KEREN • 0

0

Entering edit mode

Technically yes, but excluding the high count genes takes away those genes most resistant against technical fluctuations, I think that is not smart. Any reason to do so?

ADD REPLY • link 23 months ago ATpoint ★ 4.0k

0

Entering edit mode

Yeah, as you can see in the RLE plot, the variance of the WT samples are larger than ko samples. Also, the dispersion of these sample are also very high. In this case, I got very few genes significantly changed. I'm just wondering this may be introduced by the normalization step.

RLE plot dispersion

But I've tried to used [10%, 90%] read counts to normalize the datasets, it seems that this strategy doesn't work. Do you have any ideas on this?

Thank you

ADD REPLY • link 23 months ago KEREN • 0

0

Entering edit mode

You can easily check normalization using MA-plots, e.g. with the plotMA() function. I would rather look at diagnostics such as PCA to check for unwanted technical variation such as batch effects that needs to be addressed. Unless you have extreme DE profiles with asymmetrical changes the default normalization usually performs very well.

ADD REPLY • link 23 months ago ATpoint ★ 4.0k

0

Entering edit mode

I've checked the normalization using MA-plots. The result looks strange to me, in which the log2FC shows almost around 0.

MA plot

ADD REPLY • link 23 months ago KEREN • 0

score 0 · Answer 1 · 2022-05-16

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

I wouldn't recommend the original suggestion -- I'm not sure what exactly is the concern here, DESeq2 adapts to the higher dispersion in the dataset by being more conservative about what it deems significant. The method has done its job.

ADD COMMENT • link 23 months ago Michael Love 41k

0

Entering edit mode

Hi Michael, thank you for you reply.

My concern here mainly focus on the high variances in samples of condition "mESC_mRNA_WT". I observed that the total read counts in one of the samples is smaller than other samples before the normalization, but higher than any other samples after the normalization. I think this issue must be caused by the normalization step.

Considering the suggestions from @ATpoint that the high count genes are most resistant against technical fluctuations, I just wondering whether I can normalize the data by eliminate the affection by the lowly expressed genes (like remove 10% of the bottom).

ADD REPLY • link 23 months ago KEREN • 0