Question

DEXSeq: question about usage coefficient and splicing plot

0

Entering edit mode

ywu • 0

@ywu-7514

Last seen 9.9 years ago

Canada

Hello,

I have done DEXSeq test for my C, L, S three conditions samples and got the html report like below:

I noticed that the log2 fold changes are caculated by the usage coefficient columns(c, l, s), but I am not clear that how those coefficients are caculated. I assume they are caculated based on statistic fomula described in the paper and vignette, but I couldn't understand it because of lacking related knowledge. So could anyone explain it for me in a understandable way?

In addition, I have a question about the "splicing" plot in the html report. I read the post:DEXSeq Usage and Expression Confusion In the post, Simon explained that the exon usage in splicing plot is by "averaging out effects of overall expression" I could understand the basic concept but I want to know that how the "average out" works and Is it related with the usage coefficients?

Actually, I want to do further filteration based on the exon usage value in the splicing plot, however these value are neither stored in the html report nor in the DEXSeqResults object, instead they are calculated during the plot process. So is there a way that could output the exon usage value?

Please give me any comments and answers, thanks a lot.

Jason Wu

dexseq • 4.5k views

ADD COMMENT • link updated 4.2 years ago by rohitsatyam102 ▴ 20 • written 9.9 years ago by ywu • 0

score 1 · Answer 1 · 2015-04-09

Hi Jason,

The columns that you are showing are the ones showing the exon usage coefficients ("c", "s", "l"). The log fold changes are the result of doing, for example log2(l/c). This columns are the values that you have to use in case you would like to filter by effect sizes.

I can try to explain the exon usage coefficients with an example. The values used in this example are not realistic compared to what is typically seen in a experiment, but they are useful only to exemplify: Imagine a gene that in condition A it is expressed twice as much as in condition B, lets say that condition A has an expression value of 10 and condition B has an expression value of 5. This same gene consists on three exons. In condition A, the middle exon is included twice as frequently compared to condition B. So, for condition A the (relative) exon inclusion levels for each exon would be 1, 2, 1; while in condition B the (relative) values would be 1, 1, 1.

However, when counting the number of reads for each exon we will be measuring a combination of both effects (gene expression plus exon inclusion levels). So the observed counts for each exon of condition A would be 5 * (1, 1, 1) = (5, 5, 5), while the counts for each exon for condition B would be 10 * (1, 2, 1) = (10, 20, 10). What the glm will try to do is to separate such effects, and estimate gene expression and exon inclusion levels based on the observed values, and thus estimate differences in the relative inclusion of exons between the different conditions.

Hope this helps!
Alejandro

score 0 · Answer 2 · 2015-08-26

0

Entering edit mode

micaela.polay • 0

@micaelapolay-8698

Last seen 9.4 years ago

France

Hello,

I'm very interested in this question.
I understand that the exon coefficient is like the NI (Normalized Intensity) in exon microarrays analysis, that is to say the exon inclusion normalized by overall gene expression. If I refer to DEXSeq vignette, the exon usage represented in "splicing" visualization appears to be the same. But values for exon coeff and exon usage are different.

So my questions : How exon usage is calculated in ploDEXSeq? And how is it related to exon coefficient?

Thanks a lot.

Micaela

ADD COMMENT • link 9.4 years ago micaela.polay • 0

0

Entering edit mode

Hi Micaela,

As you mention, the plotted values are the same as the the exon usage coefficients. The only difference is that the exon usage coefficients in the DEXSeqResults are variance-stabilized transformed, while the labels of the y-axis of plotDEXSeq are not.

Alejandro

ADD REPLY • link 9.4 years ago Alejandro Reyes ★ 1.9k

score 0 · Answer 3 · 2015-08-26

0

Entering edit mode

micaela.polay • 0

@micaelapolay-8698

Last seen 9.4 years ago

France

Thanks a lot for your very quick answer Alejandro! Makes more sense now.

Is it possible to know why did you choose to plot values without VST? If I understand correctly, VST uses dispersion values computed by estimateDispersions function. Wouldn't it be more accurate to plot these values?

ADD COMMENT • link 9.4 years ago micaela.polay • 0

0

Entering edit mode

Hi micaela.polay

Were you able to get an answer as to why VST values aren't plotted?

ADD REPLY • link 4.2 years ago rohitsatyam102 ▴ 20