How to calculate dispersion DESeq2 - Step by Step
2
1
Entering edit mode
Zeke Smith ▴ 10
@44055e98
Last seen 3.2 years ago
Argentina

This is my example of dataframe (this is just a sample for testing the formulas):

            dhBMEC_1 dhBMEC_2 dhBMEC_3 cryo.dhBMEC_1 cryo.dhBMEC_2 cryo.dhBMEC_3 iPSC_1 iPSC_2 iPSC_3
653635         1217      689     1089          1200          1372          1099    729    661    657
102466751       16        5       16             8            24            25      1      1      4
729737         1281     1187     1188          1482          1379          1591   3056   2799   2268

I'm doing the calculations BY HAND of the 3 steps of the DESeq2:

  1. Estimate Size Factors
  2. Estimate Dispersions
  3. Negative Binomial GLM fitting and Wald statistics

For first step, I was using this page: https://github.com/hbctraining/DGE_workshop_salmon/blob/master/lessons/02_DGE_count_normalization.md Using that link, i was able to make the stimation of the size factors in only 4 steps.

That's what I need right now. An easy step by step of the algorithm. The problem is, I only have the information for the step 1 but now I need to make the same for the dispersion calculation and also is the hardest step. Could anyone help me with this?

The objective is to complete this formula:

enter image description here

I make a list of the Inputs that I need based in the previous ecuation (i don't know if it's ok). I put a ✔ on the items that I have now:

normalized counts ✔, dispersion ✔, vector de counts ✔, estimated coefficient vector ✘, matrix model X ✘ (or ✔, but idk what it is yet)

I'm extracting all the formulas from this links:

DGE DESeq2 Differential • 2.0k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The formula you quote is for the edgeR package rather than for DESeq2.

The articles you cite are for three different packages.

I doubt very much that you can compute the formula by hand. edgeR is a sophisticated and substantial package and trying to recreate it from scratch is unrealistic.

ADD COMMENT
0
Entering edit mode

In the third paper I thought they used that calculation because it was named in the dispersion section in the Deseq2 manual and also in the second paper but it seems that I did not interpret the information correctly. I would check that.

I don't think is unrealistic, I only need the correct explanation. I already make the first step in order to complete de algorithm and maked progress in the second step of three. Everyone can spend their time telling me that I can't do it or they can use it to explain how to do it.

Anyway I appreciate your answer, Gordon, it helped me to identify that I got confused with the formula that I shared and that is really helpful to me.

ADD REPLY
1
Entering edit mode

Step 1 has been explained here.

However, I am also seeking to break down Steps 2 and 3 further to understand how the data is being transformed but because of limited statistical knowledge, I barely understand them theoretically only. Would love to get some inputs as DESeq2 step 2 and 3 are a bit knotty and variegated for starters!!

ADD REPLY
0
Entering edit mode

Yes, I found the step 1 easily. But no one could explain the other 2 steps in an easy way step by step (like the link you passed) and it's difficult to me to understand the mathematical formulas

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 20 hours ago
United States

There are many steps in dispersion estimation. It’s much more complicated than the size factor estimate. The steps are described in the paper but I don’t have time to rewrite those for you here.

ADD COMMENT
0
Entering edit mode

I understand. Thanks for the reply Michael!

ADD REPLY

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6