Yes, you can use DESeq2 for allelic imbalance. We've used this in two papers, and the AI results were in agreement with QTL analysis that uses an independent type of data to find imbalance. The basic design is ~sample + allele.
You can also test for differential AI with an interaction between the allelic effect and the condition. You'd need to use the strategy outlined in the vignette (condition effect within samples, but comparing across groups where samples are nested within groups).
Just to note, we've also developed the Swish method (in the fishpond) to allow for various allelic expression designs. But you would need to quantify allelic expression with Salmon and have bootstrap replicates for that method. We will be posting more about this soon, but you can already see the fishpond devel branch for details.
Just to confirm, when testing for differential AI, groups are going to be alleles? so I have 2 groups: allele_1 and allele_2?
Technically, I don't have 1 sample with 2 condition observations (control and treatment). But rather I have 2 samples, one for each treatment. So does that mean each sample will have only 1 condition observation?
For differential AI, we use colData like the following (columns: allele, sample, condition),
ref s1 ctr
alt s1 ctr
ref s2 ctr
alt s2 ctr
ref s3 trt
alt s3 trt
ref s4 trt
alt s4 trt
You can test if the allelic imbalance (first column) is changing over treatment (the last column) by testing condition:allele but you also have to control for donor, that is what is described in the DESeq2 vignette.
Sample size imbalance is not a problem, you just need to make the model matrix and then remove a column (see vignette).
Make a plotCounts of the gene with smallest pvalue. This helps understand how the variation is playing a factor. If you know ggplot2 you can do returnData=T and then use grouping on the sample, with geom_line to connect the two alleles per donor (not the nested variable).
I couldn't figure out geom_line but I used color = allele. The plot shows that alt and ref alleles respond differently to treatment when compared to control. However, both alleles of one of the control-treated samples don't have read counts that are similar to the other control samples (as shown in the picture below).
Also, I understand in the vignette you emphasize the importance of working with un-normalized count. However, the pipeline I used to quantify allelic reads only generates allelic reads in RPM values (which is what I used as input in my cts). Could that be a problem?
Hi Michael,
Thank you for the feedback. I appreciate it.
Just to confirm, when testing for differential AI, groups are going to be alleles? so I have 2 groups: allele_1 and allele_2?
Technically, I don't have 1 sample with 2 condition observations (control and treatment). But rather I have 2 samples, one for each treatment. So does that mean each sample will have only 1 condition observation?
For differential AI, we use colData like the following (columns: allele, sample, condition),
You can test if the allelic imbalance (first column) is changing over treatment (the last column) by testing
condition:allele
but you also have to control for donor, that is what is described in the DESeq2 vignette.Sample size imbalance is not a problem, you just need to make the model matrix and then remove a column (see vignette).
Hi Michael,
I nested the samples as indicated in the vignette. The colData looks like this (columns: allele, sample, condition, sample.nested),
Then I removed the column with 0s and provided the corrected model matrix to the
design
slot of the dataset.Then I tested for differential AI as follows:
However, none of the genes had significant adj.p values.
Make a plotCounts of the gene with smallest pvalue. This helps understand how the variation is playing a factor. If you know ggplot2 you can do returnData=T and then use grouping on the sample, with geom_line to connect the two alleles per donor (not the nested variable).
I couldn't figure out geom_line but I used
color = allele
. The plot shows that alt and ref alleles respond differently to treatment when compared to control. However, both alleles of one of the control-treated samples don't have read counts that are similar to the other control samples (as shown in the picture below).Also, I understand in the vignette you emphasize the importance of working with un-normalized count. However, the pipeline I used to quantify allelic reads only generates allelic reads in RPM values (which is what I used as input in my cts). Could that be a problem?
I think it's key to use counts here and not RPM. The variance modeling really matters for allelic analysis.