I have a differential expression design with multiple conditions, say A, B, C, D.
Using edgeR I'd like to find genes that are upregulated in condition A vs each of the others conditions, i.e. genes upregulated in: A vs B and in A vs C and in A vs D.
My current strategy is to do all the pairwise comparisons and than intersect the output to find genes with e.g. LogFC > 0 and FDR < 0.05 in each comparisons. Like (partially pseudo-code):
design <- model.matrix(~0 + group)
contrasts <- makeContrasts(
A_vs_B= A - B,
A_vs_C= A - C,
A_vs_D= A - D,
levels= design
)
fit <- glmFit(y, design)
foreach contrast in contrasts:
lfc <- glmTreat(fit, contrast= contrast)
detable <- append(detable, topTags(lfc, n= Inf))
# Now query detable
Is there a better strategy? I think a contrast as A_vs_all= A - (B + C +
D)/3
won't work because it compares A vs the average of B, C, D.
(For the record, I don't have genes, I have ATACseq regions and I'm trying to answer the question which ATAC sites are specific to one particular condition?)
Thanks Gordon, I put a longer comment to Kevin's answer - any thought much appreciated!
I edited my answer to address your question about reference regions.
Thanks for the nice overview. I'm working on a non model organism with very different life stages (it's a multi-host parasite) and I think library quality depends on the life stage. Also, read depth varies a lot - between 30 and 160x. I have opted for a combination of your option 3 and 4: Call peaks combining libraries within stages (I'm using Genrich peak caller) then take the union of the stages. About your option 3, I'm worried that the high-depth/high-quality libraries will dominate the pool.
Aaro Lun and I showed that your method, like Method 4, does not control the FDR correctly (PMID: 24852250). Your method is ok though if you want a super sensitive method and you understand that the p-values can't be taken literally.