Question

Check DESeq2 contrasts for exon vs intron changes

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 20 months ago

United States

I am trying to look at post-transcriptional regulation using exon and intron reads as discussed in this paper (http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3269.html). Essentially they look for changes in exon reads between two samples and changes in intron reads between two samples and classify transcripts as post-transcriptionally regulated when there is a discrepancy between changes in exons and changes in introns.

They use a linear model ~ region + condition + region:condition. Where region is either exon(ex) or intron(in) and condition is treatment or control. I can get post-transcriptionally regulated genes from the default results(), but I also want to plot ∆exons vs ∆introns for treatment vs control and I wanted to check that the contrasts I'm using are correct.

layout <- data.frame(row.names = colnames(countMatrix),
                     condition = c(rep('control',3), rep('treatment',3)),
                     region = rep(c("ex","in"),each=ncol(cntEx)))

dds <- DESeqDataSetFromMatrix(countData = countMatrix, colData = layout, design = ~ region*condition)
dds <- DESeq(dds, betaPrior = FALSE)
results <- results(dds, alpha=0.1)
results_exons <- results(dds, contrast=c('condition','treatment','control'))
results_introns <- results(dds, contrast=list(c('condition_treatment_vs_control','regionin.conditiontreatment')))
plot(results_exons$log2FoldChange, results_introns$log2FoldChange)

deseq2 linear model • 2.0k views

ADD COMMENT • link updated 8.8 years ago by Michael Love 41k • written 8.8 years ago by Jake ▴ 90

score 0 · Answer 1 · 2015-07-07

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 39 minutes ago

United States

hi Jake,

The contrasts look right, but I'm wondering, how do you prepare the countMatrix? One row per gene? And then counts in exons for some columns, counts for introns in other columns? Then you have some pairing information about exons and introns from the same sample no? Also, can you print the layout? What is ncol(cntEx)? I'm confused what the layout actually looks like.

ADD COMMENT • link 8.8 years ago Michael Love 41k

0

Entering edit mode

There is one row per gene in the count matrix. I counted exons (cntEx) and introns (cntIn) separately in feautureCounts and then combined them into one matrix.

cnt <- cbind(Ex=as.data.frame(cntEx[genes.selected,]), In=cntIn[genes.selected,])
countMatrix <- as.matrix(cnt)

Below is the layout:

condition region
cell_1_rep1.bam cell_1 ex
cell_1_rep2.bam cell_1 ex
cell_1_rep3.bam cell_1 ex
cell_2_rep1.bam cell_2 ex
cell_2_rep2.bam cell_2 ex
cell_2_rep3.bam cell_2 ex
cell_1_rep1.bam cell_1 in
cell_1_rep2.bam cell_1 in
cell_1_rep3.bam cell_1 in
cell_2_rep1.bam cell_2 in
cell_2_rep2.bam cell_2 in
cell_2_rep3.bam cell_2 in

ADD REPLY • link 8.8 years ago Jake ▴ 90

1

Entering edit mode

Yes, then I'd use your code above. It is possible to also include a term for rep here (by adding a single term condition:rep), but then it makes the ∆exons and ∆introns main effects only for the reference rep, so this would just confuse things. I'd go with what you have.

ADD REPLY • link 8.8 years ago Michael Love 41k

0

Entering edit mode

Actually, adding such a term might confer quite some gain in power. This is how we do it in DEXSeq. The standard DEXSeq design formula, translated to the present setting, would read:

~ sample + region + condition:region

where sample is a factor with levels cell_1_rep1, cell_1_rep2, cell_1_rep3, cell_2_rep1, cell_2_rep2, and cell_2_rep3. (I'm assuming here that sample cell_2_rep1 is not closer to cell_1_rep1 than to the other cell1 replicates.)

With this, the intercept is the exon expression strength for each sample cell_1_rep1, the sample main effects are the differences between exon expression in the other samples to cell_1_rep1, the region main effect is the log ratio of intron to exon counts for cell_1 and the interaction effect is the logarithm of the double ratio

( cell_2_introns / cell_2_exons ) / ( cell_1_introns / cell_1_exons ).

Mike, please correct me if I'm wrong. (It's late here.)

ADD REPLY • link 8.8 years ago Simon Anders ★ 3.7k

0

Entering edit mode

I have a naive follow up question. I thought that most of the differential expression software gains power by additional replicates because they can better calculate the range of expression (between replicates) of a given gene in a sample, it can then determine if the range of expression (between replicates) in another sample is significantly different. By making each replicate essentially it's own sample by cell_#_rep#, don't you lose the replicate information?

ADD REPLY • link 8.8 years ago Jake ▴ 90

1

Entering edit mode

It's a good question, you gain power in this case because you remove some of the unexplained variance. The simplest example of how this works is the paired t-test, where the baseline for each individual are accounted for, and the test focuses on the differences. Even if the baselines have a wide range, and so a simple t-test would not reject the null, if the differences are consistent, the paired t-test will detect the difference. This is the same principal here, or when batches are accounted for in a blocked experimental design.

ADD REPLY • link 8.8 years ago Michael Love 41k

0

Entering edit mode

If I understand this right, doing it this way would tell me if across all samples the intron count differs significantly from the exon count for any given gene. However, if I want to know does the ratio of intron count to exon count for gene X differ between cell type 1 and cell type 2, I couldn't use this set up and would have to use my original design? Thanks

ADD REPLY • link 8.8 years ago Jake ▴ 90

0

Entering edit mode

Yes, for actually extracting the condition effect, the design with each rep makes it difficult. You could average the effect for all the reps in one condition compared to the average of all the others using contrast=list(), and listValues. But this is not exactly equal to the condition effect in your original design. There are often multiple options in setting up a design and depending on priorities, one chooses one or the other. If you were only interested in testing the interaction term, I would recommend including 'condition:rep' which is equivalent to Simon's suggestion. Or maybe Simon has a better way for you to get the condition effect.

ADD REPLY • link 8.8 years ago Michael Love 41k

0

Entering edit mode

Yes, I agree that this is preferable for power (I think it's equivalent to adding condition:rep). But then Jake wants to plot the treatment vs control effect for exons and for introns. So pulling out that effect requires building the right contrast which will involve these sample terms I think.

ADD REPLY • link 8.8 years ago Michael Love 41k