DESeq2, Calculate the size factor by using median-of-ratio
1
0
Entering edit mode
KEREN • 0
@578388c6
Last seen 9 weeks ago
United States

Hi there,

When I using DESeq2, can I drop the bottom 10% and top 10% of read counts to calculate the size factor by using median-of-ratio (which means exclude the high and low expressed genes from calculating the size factor)?

Best,

Keren

median DESeq2 • 329 views
0
Entering edit mode

Technically yes, but excluding the high count genes takes away those genes most resistant against technical fluctuations, I think that is not smart. Any reason to do so?

0
Entering edit mode

Yeah, as you can see in the RLE plot, the variance of the WT samples are larger than ko samples. Also, the dispersion of these sample are also very high. In this case, I got very few genes significantly changed. I'm just wondering this may be introduced by the normalization step.

But I've tried to used [10%, 90%] read counts to normalize the datasets, it seems that this strategy doesn't work. Do you have any ideas on this?

Thank you

0
Entering edit mode

You can easily check normalization using MA-plots, e.g. with the plotMA() function. I would rather look at diagnostics such as PCA to check for unwanted technical variation such as batch effects that needs to be addressed. Unless you have extreme DE profiles with asymmetrical changes the default normalization usually performs very well.

0
Entering edit mode

I've checked the normalization using MA-plots. The result looks strange to me, in which the log2FC shows almost around 0.

0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I wouldn't recommend the original suggestion -- I'm not sure what exactly is the concern here, DESeq2 adapts to the higher dispersion in the dataset by being more conservative about what it deems significant. The method has done its job.

0
Entering edit mode

Hi Michael, thank you for you reply.

My concern here mainly focus on the high variances in samples of condition "mESC_mRNA_WT". I observed that the total read counts in one of the samples is smaller than other samples before the normalization, but higher than any other samples after the normalization. I think this issue must be caused by the normalization step.

Considering the suggestions from @ATpoint that the high count genes are most resistant against technical fluctuations, I just wondering whether I can normalize the data by eliminate the affection by the lowly expressed genes (like remove 10% of the bottom).