Question: DEXSeq: question about usage coefficient and splicing plot
0
ywu0 wrote:

Hello,

I have done DEXSeq test for my C, L, S three conditions samples and got the html report like below:

I noticed that the log2 fold changes are caculated by the usage coefficient columns(c, l, s), but I am not clear that how those coefficients are caculated. I assume they are caculated based on statistic fomula described in the paper and vignette, but I couldn't understand it because of lacking related knowledge. So could anyone explain it for me in a understandable way?

In addition, I have a question about the "splicing" plot in the html report. I read the post:DEXSeq Usage and Expression Confusion In the post, Simon explained that the exon usage in splicing plot is by "averaging out effects of overall expression"  I could understand the basic concept but I want to know that how the "average out" works and Is it related with the usage coefficients?

Actually, I want to do further filteration based on the exon usage value in the splicing plot, however these value are neither stored in the html report nor in the DEXSeqResults object, instead they are calculated during the plot process. So is there a way that could output the exon usage value?

Jason Wu  dexseq • 2.1k views  modified 4.3 years ago by micaela.polay0 • written 4.7 years ago by ywu0
1
Alejandro Reyes1.7k wrote:

Hi Jason,

The columns that you are showing are the ones showing the exon usage coefficients ("c", "s", "l"). The log fold changes are the result of doing, for example log2(l/c). This columns are the values that you have to use in case you would like to filter by effect sizes.

I can try to explain the exon usage coefficients with an example. The values used in this example are not realistic compared to what is typically seen in a experiment, but they are useful only to exemplify: Imagine a gene that in condition A it is expressed twice as much as in condition B, lets say that condition A has an expression value of 10 and condition B has an expression value of 5. This same gene consists on three exons. In condition A, the middle exon is included twice as frequently compared to condition B. So, for condition A the (relative) exon inclusion levels for each exon would be 1, 2, 1; while in condition B the (relative) values would be 1, 1, 1.

However, when counting the number of reads for each exon we will be measuring a combination of both effects (gene expression plus exon inclusion levels). So the observed counts for each exon of condition A would be 5 * (1, 1, 1) = (5, 5, 5), while the counts for each exon for condition B would be 10 * (1, 2, 1) = (10, 20, 10).   What the glm will try to do is to separate such effects, and estimate gene expression and exon inclusion levels based on the observed values, and thus estimate differences in the relative inclusion of exons between the different conditions.

Hope this helps!
Alejandro

Hi Alejandro,

Thanks for your answer, it really helps me to understand the dexseq! But could you answer my another question. I update my question with a splicing graph added.  From the graph , you can see that the "exon usage" values are completely different comparing with "exon usage coefficients" showed in the result form(For example, E003's exon usage are all nearly 1000 in the graph, while the coefficients are C:54.310, L:52.722, S:53.664 , respectively). But they do share the same trend(e.g. For E003, C has the highest "exon usage" values and C:54.310 is the largest one as well）. So my question is how to map the "exon usage" values in the splicing graph to "exon usage coefficients" in the result form, or how can I fetch the "exon usage" values in the splicing graph.

0
micaela.polay0 wrote:

Hello,

I'm very interested in this question.
I understand that the exon coefficient is like the NI (Normalized Intensity) in exon microarrays analysis, that is to say the exon inclusion normalized by overall gene expression. If I refer to DEXSeq vignette, the exon usage represented in "splicing" visualization appears to be the same. But values for exon coeff and exon usage are different.

So my questions : How exon usage is calculated in ploDEXSeq? And how is it related to exon coefficient?

Thanks a lot.

Micaela

Hi Micaela,

As you mention, the plotted values are the same as the the exon usage coefficients. The only difference is that the exon usage coefficients in the DEXSeqResults are variance-stabilized transformed, while the labels of the y-axis of plotDEXSeq are not.

Alejandro

0
micaela.polay0 wrote:

Thanks a lot for your very quick answer Alejandro! Makes more sense now.

Is it possible to know why did you choose to plot values without VST? If I understand correctly, VST uses dispersion values computed by estimateDispersions function. Wouldn't it be more accurate to plot these values?