Check DESeq2 contrasts for exon vs intron changes
1
0
Entering edit mode
Jake ▴ 90
@jake-7236
Last seen 9 months ago
United States

I am trying to look at post-transcriptional regulation using exon and intron reads as discussed in this paper (http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3269.html). Essentially they look for changes in exon reads between two samples and changes in intron reads between two samples and classify transcripts as post-transcriptionally regulated when there is a discrepancy between changes in exons and changes in introns.

They use a linear model ~ region + condition + region:condition. Where region is either exon(ex) or intron(in) and condition is treatment or control. I can get post-transcriptionally regulated genes from the default results(), but I also want to plot ∆exons vs ∆introns for treatment vs control and I wanted to check that the contrasts I'm using are correct.

layout <- data.frame(row.names = colnames(countMatrix),
condition = c(rep('control',3), rep('treatment',3)),
region = rep(c("ex","in"),each=ncol(cntEx)))

dds <- DESeqDataSetFromMatrix(countData = countMatrix, colData = layout, design = ~ region*condition)
dds <- DESeq(dds, betaPrior = FALSE)
results <- results(dds, alpha=0.1)
results_exons <- results(dds, contrast=c('condition','treatment','control'))
results_introns <- results(dds, contrast=list(c('condition_treatment_vs_control','regionin.conditiontreatment')))
plot(results_exons$log2FoldChange, results_introns$log2FoldChange)
deseq2 linear model • 1.3k views
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

hi Jake,

The contrasts look right, but I'm wondering, how do you prepare the countMatrix? One row per gene? And then counts in exons for some columns, counts for introns in other columns? Then you have some pairing information about exons and introns from the same sample no? Also, can you print the layout? What is ncol(cntEx)? I'm confused what the layout actually looks like.

0
Entering edit mode

There is one row per gene in the count matrix. I counted exons (cntEx) and introns (cntIn) separately in feautureCounts and then combined them into one matrix.

cnt <- cbind(Ex=as.data.frame(cntEx[genes.selected,]), In=cntIn[genes.selected,])
countMatrix <- as.matrix(cnt)

Below is the layout:

condition region
cell_1_rep1.bam          cell_1     ex
cell_1_rep2.bam          cell_1     ex
cell_1_rep3.bam          cell_1     ex
cell_2_rep1.bam          cell_2     ex
cell_2_rep2.bam          cell_2     ex
cell_2_rep3.bam          cell_2     ex
cell_1_rep1.bam          cell_1     in
cell_1_rep2.bam          cell_1     in
cell_1_rep3.bam          cell_1     in
cell_2_rep1.bam          cell_2     in
cell_2_rep2.bam          cell_2     in
cell_2_rep3.bam          cell_2     in

1
Entering edit mode

Yes, then I'd use your code above. It is possible to also include a term for rep here (by adding a single term condition:rep), but then it makes the ∆exons and ∆introns main effects only for the reference rep, so this would just confuse things. I'd go with what you have.

0
Entering edit mode

Actually, adding such a term might confer quite some gain in power. This is how we do it in DEXSeq. The standard DEXSeq design formula, translated to the present setting, would read:

~ sample + region + condition:region

where sample is a factor with levels cell_1_rep1, cell_1_rep2, cell_1_rep3, cell_2_rep1, cell_2_rep2, and cell_2_rep3. (I'm assuming here that sample cell_2_rep1 is not closer to cell_1_rep1 than to the other cell1 replicates.)

With this, the intercept is the exon expression strength for each sample cell_1_rep1, the sample main effects are the differences between exon expression in the other samples to cell_1_rep1, the region main effect is the log ratio of intron to exon counts for cell_1 and the interaction effect is the logarithm of the double ratio

( cell_2_introns / cell_2_exons )  /  ( cell_1_introns / cell_1_exons ).

Mike, please correct me if I'm wrong. (It's late here.)

0
Entering edit mode

I have a naive follow up question. I thought that most of the differential expression software gains power by additional replicates because they can better calculate the range of expression (between replicates) of a given gene in a sample, it can then determine if the range of expression (between replicates) in another sample is significantly different. By making each replicate essentially it's own sample by cell_#_rep#, don't you lose the replicate information?

1
Entering edit mode

It's a good question, you gain power in this case because you remove some of the unexplained variance. The simplest example of how this works is the paired t-test, where the baseline for each individual are accounted for, and the test focuses on the differences. Even if the baselines have a wide range, and so a simple t-test would not reject the null, if the differences are consistent, the paired t-test will detect the difference. This is the same principal here, or when batches are accounted for in a blocked experimental design.

0
Entering edit mode

If I understand this right, doing it this way would tell me if across all samples the intron count differs significantly from the exon count for any given gene. However, if I want to know does the ratio of intron count to exon count for gene X differ between cell type 1 and cell type 2, I couldn't use this set up and would have to use my original design? Thanks

0
Entering edit mode

Yes, for actually extracting the condition effect, the design with each rep makes it difficult. You could average the effect for all the reps in one condition compared to the average of all the others using contrast=list(), and listValues. But this is not exactly equal to the condition effect in your original design. There are often multiple options in setting up a design and depending on priorities, one chooses one or the other. If you were only interested in testing the interaction term, I would recommend including 'condition:rep' which is equivalent to Simon's suggestion. Or maybe Simon has a better way for you to get the condition effect.

0
Entering edit mode

Yes, I agree that this is preferable for power (I think it's equivalent to adding condition:rep). But then Jake wants to plot the treatment vs control effect for exons and for introns. So pulling out that effect requires building the right contrast which will involve these sample terms I think.