Hello everyone, I've been working on a RNASeq project requiring an accurate DEG analysis. I've always used a pipeline involving edgeR, and, more specifically, the function glmQLFTest (as explained in the workflow RNASeqEdgeRQL). However, I'd like to perform my anaylsis with DESEQ2 as well . Basically, edgeR seems to perform pairwise analysis between couples of samples, for example, given the following experimental design:
Sample Variable1 Variable2
1 A x
2 A x
3 A x
4 A y
5 A y
6 A y
7 B x
8 B x
9 B x
10 B y
11 B y
12 B y
group <- paste(targets$Variable1, targets$Variable2, sep=".")
group <- factor(group)
design <- model.matrix(~0+group)
Contrasts <- makeContrasts(A.x-B.x, levels=design)
res <- glmQLFTest(fit, contrast=Contrasts)
Contrasts2 <- makeContrasts(A.x-B.y, levels=design)
res2 <- glmQLFTest(fit, contrast2=Contrasts2)
Contrasts3 <- makeContrasts(A.y-B.x, levels=design)
res3 <- glmQLFTest(fit, contrast3=Contrasts3)
Contrasts4 <- makeContrasts(A.y-B.y, levels=design)
res4 <- glmQLFTest(fit, contrast4=Contrasts4)
I've read the vignette several times, but I'm not sure about how to perform this simple analysis with DESEQ2.
I think I may work on the 'Interactions' in DESEQ2, so my question is...
Is the following code performing 4 pairwise comparisons, like edgeR did?
I wrote this (from Interactions paragraph in DESEQ2 vignette)
dds$group <- factor(paste0(dds$Variable1, dds$Variable2))
design(dds) <- ~ group
dds <- DESeq(dds)
resultsNames(dds)
results(dds, contrast=c("group", "Ax", "Ay", "Bx", "By"))
Also, it's not clear, since I have to define the dds object before this, how the design formula has to be... maybe like:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Variable1 + Varible2)
or possibly
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Variable1 + Varible2 + Variable1:Variable2 )
From the help of this specific function, while defining the design formula, I read:
*a formula which expresses how the counts for each gene depend on the variables in colData. Many R formula are valid, including designs with multiple variables, e.g., ~ group + condition, and designs with interactions, e.g., ~ genotype + treatment + genotype:treatment
What is exactly the difference between ~ Variable1 + Variable2 and ~ Variable1 + Variable2 + Variable1:Variable2?
Thanks in advance for your help!
```
I saw that. but it still doesn't answer my question.
Can I use DESeq2 to analyze paired samples?
Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.
It doesn't expand on how to do this between specific values of 'subject' and 'condition'.
I think I misunderstood. You don't have paired samples I see.
You can make
group
before making the DESeqDataSet. Make this variable in the colData.Thanks for your answer, how about performing multiple comparisons instead?
Again, in the [RNASeqGeneEdgeRQL][1] in the paragraph '[Analysis of Deviance][2]', it's described how to extend a comparison between two groups to three or more, in a specific experimental design.
The aim is to identify genes which are DE among 3 or more groups (So, for istance, among the comparisons A.x-A.y , B.x-B.y and A.x-B.x). An output table is reported, showing logFoldChange and logCPM for each group, and just one statistical value per gene (p-value, FDR, F-statistic).
How can an equivalent analysis be performed in DESEQ2?
Thanks in advance for your help
You can use a LRT in DESeq2 to test more than one coefficient at a time.
https://master.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test
For setting up the statistical design and interpreting coefficient, I recommend to consult with local statisticians. On the support site, I have to restrict my time to software related questions.
Thanks for your answer, how about performing multiple comparisons instead?
Again, in the RNASeqGeneEdgeRQL workflow (using edgeR) in the paragraph 'Analysis of Deviance', it's described how to extend a comparison between two groups to three or more, in a specific experimental design.
The aim is to identify genes which are DE among 3 or more groups (So, for istance, among the comparisons A.x-A.y , B.x-B.y and A.x-B.x). An output table is reported, showing logFoldChange and logCPM for each group, and just one statistical value per gene (p-value, FDR, F-statistic).
How can an equivalent analysis be performed in DESEQ2?
Thanks in advance for your help