Question

Dispersion outliers genes

0

Entering edit mode

raya.fai ▴ 60

@rayafai-9396

Last seen 10 months ago

Israel

Hi,

I am a big fan of DESeq2 and I use it to analyze my RNA-seq data. I am interested in looking at the genes that are marked as dispersion outliers.

I find in the Love et al., 2014 paper that a gene is considered as a dispersion outlier "if the residual from the trend fit is more than two standard deviations of logarithmic residuals".

When I look at my data I am surprised to find genes that have high variability in expression that are not marked as dispersion outliers while genes with a variability that seems a little less are marked as dispersion outliers. I understand from the description in the paper that this cutoff is made according to the standard deviation. I do not fully understand how is the calculation of shrinking the dispersion toward the trended prior mean is made.

I would appreciate to get some intuition to why some genes are not marked as outliers.

For example, here are the normalized reads of genes that have a high variability but are not marked as dispersion outliers (I have 12 replicates of the same condition):

1424, 1306, 1127, 618, 796, 768, 455, 466, 428, 590, 375, 306

2158, 1758, 1466, 784, 1021, 1061, 520, 494, 573, 720, 515, 438

2162, 1721, 2067, 244, 2501, 2630, 178, 272, 217, 2439, 349, 304

Here are the normalized reads of genes that are marked as a dispersion outliers:

3732, 3180, 2967, 2468, 956, 1742, 1597, 1738, 792, 573, 654, 611

3378, 3080, 2190, 1491, 1001, 1470, 1199, 1288, 874, 794, 515, 783

The genes in both categories have comparable baseMean expression. Why should not the genes in the first group also be considered as outliers? Under specific circumstances, is it correct to change the cutoff of what is considered a dispersion outlier? Is there a parameter that controls it?

Thanks you very much for the helpful forum.

All the best, Raya

Dispersion DESeq2 • 724 views

ADD COMMENT • link 11 months ago raya.fai ▴ 60

score 0 · Answer 1 · 2023-05-17

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

I do not fully understand how is the calculation of shrinking the dispersion toward the trended prior mean is made.

Can you say this a bit more precisely?

You don't understand how we calculate the prior?

Note that dispersion outlier status has a single effect: whether or not the dispersion estimate is moderated (shrunk toward the fitted line) or not.

Also you may benefit from plotting for these genes:

baseMean (x), dispGeneEst (y) and also the dispFit (y)

E.g.

with(mcols(dds), 
  plot(baseMean, dispGeneEst, log="xy")
)

ADD COMMENT • link 11 months ago Michael Love 41k

0

Entering edit mode

Thank you very much for your answer. I now understand better the prior calculation. A technical question, how can I add gene names to the dispersion plot using the code that you mentioned?

Thanks again, Raya

ADD REPLY • link 11 months ago raya.fai ▴ 60

0

Entering edit mode

Hi,

I hope I can ask another question regarding the dispersion minimal values. You kindly previously explained to me that: For the MLE estimate, estimate at the minimal value means that the data is essentially consistent with Poisson (the variance happened to be at or below the mean).

How to get the dispersion values before fitting?

When I manually calculate the dispersion of each gene according to (variance - mean)/mean^2 , I find that the group with negative values of dispersion is only part of the group that get the minimal estimate values of e-8 by the DESeq2 algorithm. I have two questions:

Is there another criteria (except of var < mean) under which a gene gets this minimal estimate value of dispersion?
Do these genes with minimal dispersion values (e-8) enter the prior calculation? Do they affect which genes are chosen as dispersion outliers (by affecting the standard deviation of logarithmic residuals above the fit)?

Thank you very much again, Raya

ADD REPLY • link 11 months ago raya.fai ▴ 60