How to calculate dispersion DESeq2 - Step by Step
2
1
Entering edit mode
Zeke Smith ▴ 10
@44055e98
Last seen 9 days ago
Argentina

This is my example of dataframe (this is just a sample for testing the formulas):

            dhBMEC_1 dhBMEC_2 dhBMEC_3 cryo.dhBMEC_1 cryo.dhBMEC_2 cryo.dhBMEC_3 iPSC_1 iPSC_2 iPSC_3
653635         1217      689     1089          1200          1372          1099    729    661    657
102466751       16        5       16             8            24            25      1      1      4
729737         1281     1187     1188          1482          1379          1591   3056   2799   2268


I'm doing the calculations BY HAND of the 3 steps of the DESeq2:

1. Estimate Size Factors
2. Estimate Dispersions
3. Negative Binomial GLM fitting and Wald statistics

For first step, I was using this page: https://github.com/hbctraining/DGE_workshop_salmon/blob/master/lessons/02_DGE_count_normalization.md Using that link, i was able to make the stimation of the size factors in only 4 steps.

That's what I need right now. An easy step by step of the algorithm. The problem is, I only have the information for the step 1 but now I need to make the same for the dispersion calculation and also is the hardest step. Could anyone help me with this?

The objective is to complete this formula:

I make a list of the Inputs that I need based in the previous ecuation (i don't know if it's ok). I put a ✔ on the items that I have now:

normalized counts ✔, dispersion ✔, vector de counts ✔, estimated coefficient vector ✘, matrix model X ✘ (or ✔, but idk what it is yet)

I'm extracting all the formulas from this links:

DGE DESeq2 Differential • 215 views
2
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

The formula you quote is for the edgeR package rather than for DESeq2.

The articles you cite are for three different packages.

I doubt very much that you can compute the formula by hand. edgeR is a sophisticated and substantial package and trying to recreate it from scratch is unrealistic.

0
Entering edit mode

In the third paper I thought they used that calculation because it was named in the dispersion section in the Deseq2 manual and also in the second paper but it seems that I did not interpret the information correctly. I would check that.

I don't think is unrealistic, I only need the correct explanation. I already make the first step in order to complete de algorithm and maked progress in the second step of three. Everyone can spend their time telling me that I can't do it or they can use it to explain how to do it.

Anyway I appreciate your answer, Gordon, it helped me to identify that I got confused with the formula that I shared and that is really helpful to me.

1
Entering edit mode

Step 1 has been explained here.

However, I am also seeking to break down Steps 2 and 3 further to understand how the data is being transformed but because of limited statistical knowledge, I barely understand them theoretically only. Would love to get some inputs as DESeq2 step 2 and 3 are a bit knotty and variegated for starters!!

0
Entering edit mode

Yes, I found the step 1 easily. But no one could explain the other 2 steps in an easy way step by step (like the link you passed) and it's difficult to me to understand the mathematical formulas

1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

There are many steps in dispersion estimation. It’s much more complicated than the size factor estimate. The steps are described in the paper but I don’t have time to rewrite those for you here.

0
Entering edit mode

I understand. Thanks for the reply Michael!