Dear all. I am working on allele-specific expression analysis on 60+ human samples (unrelated, do not have parental genotyping).
One of my applications involves comparing expression of reference and alternative allele of around 50k heterozygous SNPs within these individuals between patients who develop cancer and those who do not. I have created a count matrix that containsFor each individual heterozygous for each SNP I have RNAcounts with ref and alt allele, or have listed the counts as NA if the individual is not heterozygous for the SNP:
SNP | sample1 | sample2 |
kgp1876518_REF_G | 0 | NA |
kgp1876518_ALT_A | 0 | NA |
kgp1876747_REF_A | 0 | 77 |
kgp1876747_ALT_G | 0 | 35 |
kgp1877878_REF_C | 77 | NA |
kgp1877878_ALT_T | 34 | NA |
I would prefer to use DESeq2 (or EdgeR) for this, based on good prior experience with this software
Of course for each SNP, multiple individuals are homozygous and therefore do not contain information on ref/alt counts. Simply replacing the NA with 0 can bias the count variability and dispersion estimate. Neither package allows NA counts.
Any thoughts on how to do this or alternative suggestions of solution?
If you have something like:
... you could do:
If we now look at
colnames(design)
, we get:So, the fourth and fifth coefficients represent the ASE log-fold change of the reference over the alternative in cancer and normal patients, respectively. Drop either to test for ASE in each disease type, or compare them to each other with
contrast=c(0,0,0,1,-1)
inglmLRT
to test for disease-specific ASE.Is there anyway if the treated/control is also paired ? Something like 3 samples, paired, treated vs untreated. So 3 treated and 3 untreated. Is it possible to implement this design in allele specific manner ?
Please ask a new question and give more details about your experimental design.