DESeq2, Calculate the size factor by using median-of-ratio
1
0
Entering edit mode
KEREN • 0
@578388c6
Last seen 21 months ago
United States

Hi there,

When I using DESeq2, can I drop the bottom 10% and top 10% of read counts to calculate the size factor by using median-of-ratio (which means exclude the high and low expressed genes from calculating the size factor)?

Best,

Keren

median DESeq2 • 1.2k views
ADD COMMENT
0
Entering edit mode

Technically yes, but excluding the high count genes takes away those genes most resistant against technical fluctuations, I think that is not smart. Any reason to do so?

ADD REPLY
0
Entering edit mode

Yeah, as you can see in the RLE plot, the variance of the WT samples are larger than ko samples. Also, the dispersion of these sample are also very high. In this case, I got very few genes significantly changed. I'm just wondering this may be introduced by the normalization step.

RLE plot dispersion

But I've tried to used [10%, 90%] read counts to normalize the datasets, it seems that this strategy doesn't work. Do you have any ideas on this?

Thank you

ADD REPLY
0
Entering edit mode

You can easily check normalization using MA-plots, e.g. with the plotMA() function. I would rather look at diagnostics such as PCA to check for unwanted technical variation such as batch effects that needs to be addressed. Unless you have extreme DE profiles with asymmetrical changes the default normalization usually performs very well.

ADD REPLY
0
Entering edit mode

I've checked the normalization using MA-plots. The result looks strange to me, in which the log2FC shows almost around 0.

MA plot

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 27 minutes ago
United States

I wouldn't recommend the original suggestion -- I'm not sure what exactly is the concern here, DESeq2 adapts to the higher dispersion in the dataset by being more conservative about what it deems significant. The method has done its job.

ADD COMMENT
0
Entering edit mode

Hi Michael, thank you for you reply.

My concern here mainly focus on the high variances in samples of condition "mESC_mRNA_WT". I observed that the total read counts in one of the samples is smaller than other samples before the normalization, but higher than any other samples after the normalization. I think this issue must be caused by the normalization step.

Considering the suggestions from @ATpoint that the high count genes are most resistant against technical fluctuations, I just wondering whether I can normalize the data by eliminate the affection by the lowly expressed genes (like remove 10% of the bottom).

ADD REPLY

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6