Question

How to calculate dispersion DESeq2 - Step by Step

1

Entering edit mode

Zeke Smith ▴ 10

@44055e98

Last seen 3.5 years ago

Argentina

This is my example of dataframe (this is just a sample for testing the formulas):

            dhBMEC_1 dhBMEC_2 dhBMEC_3 cryo.dhBMEC_1 cryo.dhBMEC_2 cryo.dhBMEC_3 iPSC_1 iPSC_2 iPSC_3
653635         1217      689     1089          1200          1372          1099    729    661    657
102466751       16        5       16             8            24            25      1      1      4
729737         1281     1187     1188          1482          1379          1591   3056   2799   2268

I'm doing the calculations BY HAND of the 3 steps of the DESeq2:

Estimate Size Factors
Estimate Dispersions
Negative Binomial GLM fitting and Wald statistics

For first step, I was using this page: https://github.com/hbctraining/DGE_workshop_salmon/blob/master/lessons/02_DGE_count_normalization.md Using that link, i was able to make the stimation of the size factors in only 4 steps.

That's what I need right now. An easy step by step of the algorithm. The problem is, I only have the information for the step 1 but now I need to make the same for the dispersion calculation and also is the hardest step. Could anyone help me with this?

The objective is to complete this formula:

enter image description here

I make a list of the Inputs that I need based in the previous ecuation (i don't know if it's ok). I put a ✔ on the items that I have now:

normalized counts ✔, dispersion ✔, vector de counts ✔, estimated coefficient vector ✘, matrix model X ✘ (or ✔, but idk what it is yet)

I'm extracting all the formulas from this links:

DGE DESeq2 Differential • 2.1k views

ADD COMMENT • link 3.6 years ago Zeke Smith ▴ 10

score 2 · Answer 1 · 2021-08-25

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

The formula you quote is for the edgeR package rather than for DESeq2.

The articles you cite are for three different packages.

I doubt very much that you can compute the formula by hand. edgeR is a sophisticated and substantial package and trying to recreate it from scratch is unrealistic.

ADD COMMENT • link 3.6 years ago Gordon Smyth 52k

0

Entering edit mode

In the third paper I thought they used that calculation because it was named in the dispersion section in the Deseq2 manual and also in the second paper but it seems that I did not interpret the information correctly. I would check that.

I don't think is unrealistic, I only need the correct explanation. I already make the first step in order to complete de algorithm and maked progress in the second step of three. Everyone can spend their time telling me that I can't do it or they can use it to explain how to do it.

Anyway I appreciate your answer, Gordon, it helped me to identify that I got confused with the formula that I shared and that is really helpful to me.

ADD REPLY • link 3.6 years ago Zeke Smith ▴ 10

1

Entering edit mode

Step 1 has been explained here.

However, I am also seeking to break down Steps 2 and 3 further to understand how the data is being transformed but because of limited statistical knowledge, I barely understand them theoretically only. Would love to get some inputs as DESeq2 step 2 and 3 are a bit knotty and variegated for starters!!

ADD REPLY • link 3.6 years ago rohitsatyam102 ▴ 20

0

Entering edit mode

Yes, I found the step 1 easily. But no one could explain the other 2 steps in an easy way step by step (like the link you passed) and it's difficult to me to understand the mathematical formulas

ADD REPLY • link 3.6 years ago Zeke Smith ▴ 10

score 1 · Answer 2 · 2021-08-29

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

There are many steps in dispersion estimation. It’s much more complicated than the size factor estimate. The steps are described in the paper but I don’t have time to rewrite those for you here.