Hello,
I would like to find differentially expressed genes between different groups in RNA-seq data. I have 55 samples that were prepared either in batch 3, batch 4 or batch 6. These samples represent 4 groups (NR, NT, P, F). When I checked the MDS plot, all the samples were separated by group and by batch, so it seems that the batch effect should be corrected. The problem is that one group (group F) has only 4 samples, which were all prepared in batch 3. All these groups are distinct from each other, so none of the groups can be considered as a reference/control.
I would like do the following comparisons:
- NR vs. NT
- NR and NT vs. P
- NR, NT and P vs. F
- P vs. NR, NT and F
- NR vs. P, F and NT
- NT vs. P, F and NT
--------------------------------------------------------------------
I am a bit unsure about a couple of things:
1) How to build the design matrix. I don't have a clear reference/control or "baseline", so I guess I should not include the intercept to the design matrix?
2) How group F affects the de-analysis. For example, if I compare group NT to group F, is the model comparing samples which were prepared within the same batch (NT and F samples which were prepared in batch 3), and ignoring NT samples which were prepared in batch 4 or batch 6? Or should I just ignore group F samples?
3) How to make the correct contrasts for the glmQLFTest function. I think the first three comparisons are correct (the code below), but what about the last three comparisons? How can I make those?
--------------------------------------------------------------------
Here are the samples and the design matrix:
group <- factor(c(rep("NR", 11), rep("NT", 26), rep("P", 14), rep("F", 4)))
batch <- factor(c(3,3,3,4,4,6,3,3,3,6,6,6,6,6,4,6,3,3,3,3,3,4,4,4,4,6,6,6,6,3,3,6,6,6,6,6,6,3,3,3,3,4,4,4,6,3,3,3,3,6,6,3,3,3,3))
design <- model.matrix(~0 + batch + group)
Example how the columns in the design matrix look like:
batch3 | batch4 | batch5 | groupNR | groupNT | groupP |
---|
Contrasts for the first three comparisons:
- NR vs. NT --> c(0,0,0,1,-1,0)
- NR and NT vs. P --> c(0,0,0,0.5,0.5,-1)
- NR, NT and P vs. F --> c(0,0,0,0.33,0.33,0.33)
Thank you in advance for all the help.
Thanks a lot Aaron!
I have a similar question. I have a set of RNA-Seq data from human tissue and want to compare the gene expression in cases and controls using EdgeR. If I want to adjust for the confounding effect of RIN, age and PMI; what is the correct way to do this? Here is my code:
library(edgeR) x <- read.delim("rawCountMatrix.txt", row.names = "Gene") group <- factor(c(1,2,2,2,2,2,1,1,1,1)) RIN <- c(5.6,7.6,6.7,5.6,7.6,6.4,6.8,7.7,5.7,8.3) Age <- c(106,89,76,82,87,87,73,77,78,88) PMI <- c(20.58,15.16,20.17,19.16,21.66,22,22,13.5,12.83,14.5)
y <- DGEList(counts=x,group=group) y <- calcNormFactors(y) design <- model.matrix(~group + RIN + Age + PMI) y <- estimateDisp(y,design)
fit <- glmFit(y,design) lrt <- glmLRT(fit)
Do I need to specify coefficient in the glmLRT() function? I am not sure what coefficient mean, if I change it from 2 to 3 or 2:4? lrt <- glmLRT(fit, coef = 2)
I appriciate any comments.
I have a similar question. I have a set of RNA-Seq data from human tissue and want to compare the gene expression in cases and controls using EdgeR. If I want to adjust for the confounding effect of RIN, age and PMI; what is the correct way to do this? Here is my code:
library(edgeR) x <- read.delim("rawCountMatrix.txt", row.names = "Gene") group <- factor(c(1,2,2,2,2,2,1,1,1,1)) RIN <- c(5.6,7.6,6.7,5.6,7.6,6.4,6.8,7.7,5.7,8.3) Age <- c(106,89,76,82,87,87,73,77,78,88) PMI <- c(20.58,15.16,20.17,19.16,21.66,22,22,13.5,12.83,14.5)
y <- DGEList(counts=x,group=group) y <- calcNormFactors(y) design <- model.matrix(~group + RIN + Age + PMI) y <- estimateDisp(y,design)
fit <- glmFit(y,design) lrt <- glmLRT(fit)
Do I need to specify coefficient in the glmLRT() function? I am not sure what coefficient mean, if I change it from 2 to 3 or 2:4? lrt <- glmLRT(fit, coef = 2)
I appriciate any comments.
I have a similar question. I have a set of RNA-Seq data from human tissue and want to compare the gene expression in cases and controls using EdgeR. If I want to adjust for the confounding effect of RIN, age and PMI; what is the correct way to do this? Here is my code:
library(edgeR) x <- read.delim("rawCountMatrix.txt", row.names = "Gene") group <- factor(c(1,2,2,2,2,2,1,1,1,1)) RIN <- c(5.6,7.6,6.7,5.6,7.6,6.4,6.8,7.7,5.7,8.3) Age <- c(106,89,76,82,87,87,73,77,78,88) PMI <- c(20.58,15.16,20.17,19.16,21.66,22,22,13.5,12.83,14.5)
y <- DGEList(counts=x,group=group) y <- calcNormFactors(y) design <- model.matrix(~group + RIN + Age + PMI) y <- estimateDisp(y,design)
fit <- glmFit(y,design) lrt <- glmLRT(fit)
Do I need to specify coefficient in the glmLRT() function? I am not sure what coefficient mean, if I change it from 2 to 3 or 2:4? lrt <- glmLRT(fit, coef = 2)
I appriciate any comments.