I have 2 samples each in two genotype groups: wild-type (WT) and heterozygous (HET). And for each gene per sample, I have obtained the raw read counts for two alleles (allele 1 and allele 2)
sample genotype allele
individual 1 WT allele 1 count
individual 1 WT allele 2 count
individual 2 WT allele 1 count
individual 2 WT allele 2 count
individual 3 HET allele 1 count
individual 3 HET allele 2 count
individual 4 HET allele 1 count
individual 4 HET allele 2 count
I would like to know whether the relative abundance of allele 1:allele 2 differs between genotype groups. That is, does (allele 1)/(allele 1 + allele 2) differ between genotype groups? Comparing the raw read count of allele 1 between the WT and HET groups is not ideal. The idea to look at the relative abundance of allele 1 to allele 2 within an individual, for all the individuals in one genotype group and then compare that with the other genotype group.
Here is my code so far:
library(edgeR)
rawcounts <- read.csv("file:///input.csv", row.names = "gene")
group <- c(1,1,1,1,2,2,2,2)
y <- DGEList(counts = rawcounts, norm.factors=rep(c(1),8), group=group)
genotype <- factor(c("WT","WT", "WT", "WT","HET","HET", "HET","HET"))
individual <- factor(c(1,1,2,2,3,3,4,4))
allele <- factor(rep(c("allele1", "allele2"),4))
targets <- data.frame(genotype,individual,allele)
targets
individual <- gl(2,2,length = 8)
genotype <- factor(targets$genotype, levels=c("WT","HET"))
allele <- factor(targets$allele, levels=c("allele1","allele2"))
targets
design <- model.matrix(~individual:genotype+genotype:allele)
colnames(design)
y <- estimateDisp(y,design)
Hi Aaron, I hope it is suitable to expand on this experimental design a bit.
I have a study design similar to this, but in addition to contrasting ASE in different groups, I would like to test if the genotype has a significant effect on the degree of ASE across all individuals. So I would like to compare model differences of ASE across individuals where the genotype is included and omitted, respectively.
I this example, I can't just drop the
genotype:allele
term, because then the model does not test for ASE. Does this mean I have to create a new model like the following:~0 + individual + allele
and then use that as the reduced matrix for comparison against
~0 + individual + allele:genotype?
Kind regards,
Stephan
I don't quite understand how your question differs from the setup in the original post. When you're talking about groups, are you referring to your genotypes? If so, and you want to determine whether genotype has an effect on ASE, then you should just use the approach I've described above. After all, if genotype doesn't have any effect, the genotype-specific log-fold changes between alleles should be the same between genotypes. If you have more than two genotypes, the same reasoning can be applied to perform an ANODEV to test for any effect of genotype. Just set up a contrast matrix that compares one of the
allele:genotype
terms (doesn't matter exactly which one) to every other term.Hi Aaron,
Thank you for your rapid response and for filling in the gaps to my vague question. The reason why I was asking was indeed because I have four genotypes, not two. I suspected that this was the case, but just wanted to make sure. Thanks so much!
Hi Aaron, last question on this (I really appreciate you coming back to this old thread):
I have four genotypes, with varying heterozygosity - the same gene may have 5 SNPs in one genotype, but 10 SNPs in another. My gene counts are composed of adding phased allelic counts across multiple SNPs together, so naturally genes with more SNPs will have a higher read count. Can I still compare allelic ratios across multiple genotypes with the formula provided above? Would the
individual
term still account for these differences, or would you suggest adding an extra termgenotype
term to account for this?I realise its easier to conduct the analysis on SNP level, but I would like to obtain estimates of ASE differences due to genotype on the gene-level.
Systematic differences in counts are handled by
individual
. Indeed,genotype
is nested withinindividual
, so adding the former won't have any effect if the latter is already there. Fewer counts for particular genotypes will obviously reduce power to detect differences but it will not invalidate any contrasts performed within this framework. (Obviously, if you do something outside of the contrasts, like comparing the number of DE genes across genotypes, then differences in power become important.)