DESeq2 on allelic reads
Entering edit mode
Nayrouz • 0
Last seen 11 weeks ago


Can I use DESeq2 to perform differential gene expression on allelic reads?

  • I have allelic reads quantified for each parental allele/copy
  • I have 2 treatments (control vs ethanol-exposed)
  • I want to perform differential gene expression to see how one parental copy responds to treatment in comparison to the other parental copy.
RNASeq DifferentialExpression DESeq2 • 658 views
Entering edit mode
Last seen 1 hour ago
United States

Yes, you can use DESeq2 for allelic imbalance. We've used this in two papers, and the AI results were in agreement with QTL analysis that uses an independent type of data to find imbalance. The basic design is ~sample + allele.

You can also test for differential AI with an interaction between the allelic effect and the condition. You'd need to use the strategy outlined in the vignette (condition effect within samples, but comparing across groups where samples are nested within groups).

Just to note, we've also developed the Swish method (in the fishpond) to allow for various allelic expression designs. But you would need to quantify allelic expression with Salmon and have bootstrap replicates for that method. We will be posting more about this soon, but you can already see the fishpond devel branch for details.

Entering edit mode

Hi Michael,

Thank you for the feedback. I appreciate it.

Just to confirm, when testing for differential AI, groups are going to be alleles? so I have 2 groups: allele_1 and allele_2?

Technically, I don't have 1 sample with 2 condition observations (control and treatment). But rather I have 2 samples, one for each treatment. So does that mean each sample will have only 1 condition observation?

Entering edit mode

For differential AI, we use colData like the following (columns: allele, sample, condition),

ref s1 ctr
alt s1 ctr
ref s2 ctr
alt s2 ctr
ref s3 trt
alt s3 trt
ref s4 trt
alt s4 trt

You can test if the allelic imbalance (first column) is changing over treatment (the last column) by testing condition:allele but you also have to control for donor, that is what is described in the DESeq2 vignette.

Sample size imbalance is not a problem, you just need to make the model matrix and then remove a column (see vignette).

Entering edit mode

Hi Michael,

I nested the samples as indicated in the vignette. The colData looks like this (columns: allele, sample, condition, sample.nested),

ref s1 ctr s1
alt s1 ctr s1
ref s2 ctr s2
alt s2 ctr s2
ref s3 trt s1
alt s3 trt s1 
ref s4 trt s2
alt s4 trt s2
ref s5 trt s3 
alt s5 trt s3 

Then I removed the column with 0s and provided the corrected model matrix to the design slot of the dataset.

m1 <- model.matrix(~ condition + condition:sample.nested + condition:allele, coldata) <- apply(m1, 2, function(x) all(x==0))
idx <- which(
m1 <- m1[,-idx]

dds <- DESeqDataSetFromMatrix(countData = round(cts),
                              colData = coldata,
                              design=  m1)

Then I tested for differential AI as follows:

results(dds, contrast=list("ctr.alt","trt.alt"))

However, none of the genes had significant adj.p values.

Entering edit mode

Make a plotCounts of the gene with smallest pvalue. This helps understand how the variation is playing a factor. If you know ggplot2 you can do returnData=T and then use grouping on the sample, with geom_line to connect the two alleles per donor (not the nested variable).

Entering edit mode

I couldn't figure out geom_line but I used color = allele. The plot shows that alt and ref alleles respond differently to treatment when compared to control. However, both alleles of one of the control-treated samples don't have read counts that are similar to the other control samples (as shown in the picture below).

Also, I understand in the vignette you emphasize the importance of working with un-normalized count. However, the pipeline I used to quantify allelic reads only generates allelic reads in RPM values (which is what I used as input in my cts). Could that be a problem?

enter image description here

Entering edit mode

I think it's key to use counts here and not RPM. The variance modeling really matters for allelic analysis.


Login before adding your answer.

Traffic: 339 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6