I am trying to look at post-transcriptional regulation using exon and intron reads as discussed in this paper (http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3269.html). Essentially they look for changes in exon reads between two samples and changes in intron reads between two samples and classify transcripts as post-transcriptionally regulated when there is a discrepancy between changes in exons and changes in introns.
They use a linear model ~ region + condition + region:condition. Where region is either exon(ex) or intron(in) and condition is treatment or control. I can get post-transcriptionally regulated genes from the default results(), but I also want to plot ∆exons vs ∆introns for treatment vs control and I wanted to check that the contrasts I'm using are correct.
layout <- data.frame(row.names = colnames(countMatrix), condition = c(rep('control',3), rep('treatment',3)), region = rep(c("ex","in"),each=ncol(cntEx))) dds <- DESeqDataSetFromMatrix(countData = countMatrix, colData = layout, design = ~ region*condition) dds <- DESeq(dds, betaPrior = FALSE) results <- results(dds, alpha=0.1) results_exons <- results(dds, contrast=c('condition','treatment','control')) results_introns <- results(dds, contrast=list(c('condition_treatment_vs_control','regionin.conditiontreatment'))) plot(results_exons$log2FoldChange, results_introns$log2FoldChange)
There is one row per gene in the count matrix. I counted exons (cntEx) and introns (cntIn) separately in feautureCounts and then combined them into one matrix.
Below is the layout:
condition region
cell_1_rep1.bam cell_1 ex
cell_1_rep2.bam cell_1 ex
cell_1_rep3.bam cell_1 ex
cell_2_rep1.bam cell_2 ex
cell_2_rep2.bam cell_2 ex
cell_2_rep3.bam cell_2 ex
cell_1_rep1.bam cell_1 in
cell_1_rep2.bam cell_1 in
cell_1_rep3.bam cell_1 in
cell_2_rep1.bam cell_2 in
cell_2_rep2.bam cell_2 in
cell_2_rep3.bam cell_2 in
Yes, then I'd use your code above. It is possible to also include a term for rep here (by adding a single term condition:rep), but then it makes the ∆exons and ∆introns main effects only for the reference rep, so this would just confuse things. I'd go with what you have.
Actually, adding such a term might confer quite some gain in power. This is how we do it in DEXSeq. The standard DEXSeq design formula, translated to the present setting, would read:
where sample is a factor with levels cell_1_rep1, cell_1_rep2, cell_1_rep3, cell_2_rep1, cell_2_rep2, and cell_2_rep3. (I'm assuming here that sample cell_2_rep1 is not closer to cell_1_rep1 than to the other cell1 replicates.)
With this, the intercept is the exon expression strength for each sample cell_1_rep1, the sample main effects are the differences between exon expression in the other samples to cell_1_rep1, the region main effect is the log ratio of intron to exon counts for cell_1 and the interaction effect is the logarithm of the double ratio
( cell_2_introns / cell_2_exons ) / ( cell_1_introns / cell_1_exons ).
Mike, please correct me if I'm wrong. (It's late here.)
I have a naive follow up question. I thought that most of the differential expression software gains power by additional replicates because they can better calculate the range of expression (between replicates) of a given gene in a sample, it can then determine if the range of expression (between replicates) in another sample is significantly different. By making each replicate essentially it's own sample by cell_#_rep#, don't you lose the replicate information?
It's a good question, you gain power in this case because you remove some of the unexplained variance. The simplest example of how this works is the paired t-test, where the baseline for each individual are accounted for, and the test focuses on the differences. Even if the baselines have a wide range, and so a simple t-test would not reject the null, if the differences are consistent, the paired t-test will detect the difference. This is the same principal here, or when batches are accounted for in a blocked experimental design.
If I understand this right, doing it this way would tell me if across all samples the intron count differs significantly from the exon count for any given gene. However, if I want to know does the ratio of intron count to exon count for gene X differ between cell type 1 and cell type 2, I couldn't use this set up and would have to use my original design? Thanks
Yes, for actually extracting the condition effect, the design with each rep makes it difficult. You could average the effect for all the reps in one condition compared to the average of all the others using contrast=list(), and listValues. But this is not exactly equal to the condition effect in your original design. There are often multiple options in setting up a design and depending on priorities, one chooses one or the other. If you were only interested in testing the interaction term, I would recommend including 'condition:rep' which is equivalent to Simon's suggestion. Or maybe Simon has a better way for you to get the condition effect.
Yes, I agree that this is preferable for power (I think it's equivalent to adding condition:rep). But then Jake wants to plot the treatment vs control effect for exons and for introns. So pulling out that effect requires building the right contrast which will involve these sample terms I think.