Hi there,
When I using DESeq2, can I drop the bottom 10% and top 10% of read counts to calculate the size factor by using median-of-ratio (which means exclude the high and low expressed genes from calculating the size factor)?
Best,
Keren
Hi there,
When I using DESeq2, can I drop the bottom 10% and top 10% of read counts to calculate the size factor by using median-of-ratio (which means exclude the high and low expressed genes from calculating the size factor)?
Best,
Keren
I wouldn't recommend the original suggestion -- I'm not sure what exactly is the concern here, DESeq2 adapts to the higher dispersion in the dataset by being more conservative about what it deems significant. The method has done its job.
Hi Michael, thank you for you reply.
My concern here mainly focus on the high variances in samples of condition "mESC_mRNA_WT". I observed that the total read counts in one of the samples is smaller than other samples before the normalization, but higher than any other samples after the normalization. I think this issue must be caused by the normalization step.
Considering the suggestions from @ATpoint that the high count genes are most resistant against technical fluctuations, I just wondering whether I can normalize the data by eliminate the affection by the lowly expressed genes (like remove 10% of the bottom).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Technically yes, but excluding the high count genes takes away those genes most resistant against technical fluctuations, I think that is not smart. Any reason to do so?
Yeah, as you can see in the RLE plot, the variance of the WT samples are larger than ko samples. Also, the dispersion of these sample are also very high. In this case, I got very few genes significantly changed. I'm just wondering this may be introduced by the normalization step.
But I've tried to used [10%, 90%] read counts to normalize the datasets, it seems that this strategy doesn't work. Do you have any ideas on this?
Thank you
You can easily check normalization using MA-plots, e.g. with the
plotMA()
function. I would rather look at diagnostics such as PCA to check for unwanted technical variation such as batch effects that needs to be addressed. Unless you have extreme DE profiles with asymmetrical changes the default normalization usually performs very well.I've checked the normalization using MA-plots. The result looks strange to me, in which the log2FC shows almost around 0.