Question

DEXSeq Usage and Expression Confusion

5

Entering edit mode

andrew.j.skelton73 ▴ 370

@andrewjskelton73-7074

Last seen 5 weeks ago

United Kingdom

Hi,

I originally posted this on Biostars, but it was suggested I post here too!

I’ve been running DEXSeq and I’m having trouble understanding the results that I’m getting. I’ve used the suggested model

~ sample + exon + condition:exon

Which should give me differential exon usage between my two conditions.

When plotting the exon usage of my two conditions and the exon expression of my two conditions, I’ve generally found that they’re the inverse of each other. For example, if I have an exon in a gene that looks to be differentially expressed between the two conditions, it doesn’t look to have differential exon usage. Vice versa, if an exon in a gene appears to have no differential expression between two conditions, it’s shown to have differential usage.

I’m struggling to understand the difference between these two metrics, does anyone have a good explanation?

Secondly, If I wanted to test for differential exon expression, what would I have to change my model to, to achieve this?

Thanks,

Sample Table:

      condition    libType                                        countName
F1_S1         A paired-end raw_counts//sorted_Flowcell_A_1.bam.dexseq.count
F1_S2         B paired-end raw_counts//sorted_Flowcell_A_2.bam.dexseq.count
F1_S3         A paired-end raw_counts//sorted_Flowcell_A_3.bam.dexseq.count
F1_S4         A paired-end raw_counts//sorted_Flowcell_A_4.bam.dexseq.count
F1_S5         B paired-end raw_counts//sorted_Flowcell_A_5.bam.dexseq.count
F1_S6         A paired-end raw_counts//sorted_Flowcell_A_6.bam.dexseq.count
F1_S7         B paired-end raw_counts//sorted_Flowcell_A_7.bam.dexseq.count
F1_S8         A paired-end raw_counts//sorted_Flowcell_A_8.bam.dexseq.count
F2_S1         A paired-end raw_counts//sorted_Flowcell_B_1.bam.dexseq.count
F2_S2         B paired-end raw_counts//sorted_Flowcell_B_2.bam.dexseq.count
F2_S3         A paired-end raw_counts//sorted_Flowcell_B_3.bam.dexseq.count
F2_S4         A paired-end raw_counts//sorted_Flowcell_B_4.bam.dexseq.count
F2_S5         B paired-end raw_counts//sorted_Flowcell_B_5.bam.dexseq.count
F2_S6         B paired-end raw_counts//sorted_Flowcell_B_6.bam.dexseq.count
F2_S7         A paired-end raw_counts//sorted_Flowcell_B_7.bam.dexseq.count
F2_S8         A paired-end raw_counts//sorted_Flowcell_B_8.bam.dexseq.count

Code Used:

dxd <- DEXSeqDataSetFromHTSeq(list.files("raw_counts/", full.names=T),
                              sampleData    = sampleTable,
                              design        = ~ sample + exon + condition:exon,
                              flattenedfile = "../scrips/genome.gff")

dxd  <- estimateSizeFactors(dxd)
dxd  <- estimateDispersions(dxd)
dxd  <- testForDEU(dxd)
dxd  <- estimateExonFoldChanges(dxd, fitExpToVar="condition")

dxr1 <- DEXSeqResults(dxd)

Example

dexseq • 4.5k views

ADD COMMENT • link updated 9.3 years ago by Simon Anders ★ 3.7k • written 9.3 years ago by andrew.j.skelton73 ▴ 370

score 7 · Accepted Answer · 2015-01-09

"Expression" is the expression strength of the gene, i.e., simply the average number of (normalized) reads that map to the sample in a given condition. "Exon usage" is an exon's usage, compared to all the other exons of the same gene.

If a gene's overall expression goes up, each individual exon will also go up, but this is not what we are looking for with DEXSeq. In your example, your gene is much more strongly expressed in the red samples than in the blue samples, and hence, in the expression panel, most individual exons looks stronger in red than blue. To counter this and so reveal changes in transcript composition, DEXSeq (conceptually) "lifts up" the blue line by some factor and "pulls down" the red line by the same factor, so that they meet in the middle in the exon usage panel.

There, most exons now look similar between red and blue, but a few differ strongly. For example, while most exons went up by a factor of nearly 100 in the read counts, exon E012 only went up by a factor of maybe 2. Hence, the usage of each exon (understood as the fraction of transcripts that include the exon) went down by a factor of 50.

More biologically: Going from blue to red, your organism expresses the gene much more strongly, but nearly all of these additional transcripts seem to be skipping over exon E012, i.e., its usage goes down drastically, and this is why it is below in the exon usage plot. In the same way, E001-E003, and E005, are now skipped as well.