I have RNA-seq data that has three groups - Control (C), Disease1 (D1), and Disease2 (D2). The patients of D1 and D2 were treated with a drug (T) and data was generated on pre- and post-treatment disease groups i.e. I have 5 grouping factors - C, D1-pre, D1-post, D2-pre, D2-post. I am a bit confused while modeling this nested data to identify DE genes in limma. I have two questions in mind:
- What are DE genes in D1 and D2 compared to control, and
- Whether the drug has a differential effect on the expression of DE genes in D1 compared to D2.
I tried to make a model matrix as:
library(limma) patient <- data.frame( patient$patient <- factor(paste(rep(1:5, times = 5),"_",patient$diagnosis)) diagnosis = factor(rep(c("D1", "D2","Control"), each=10, length.out=25)), treat = factor(rep(c("Pre", "Post"), each=5, times=3, length.out=25))) patient$group <- factor(paste(patient$diagnosis,patient$treat)) design <- model.matrix(~0+group,patient) colnames(design) <- c("C", "D1post", "D1pre", "D2post", "D2pre")
I am lost somewhere here, I am unable to decide if should I go for
fit <- lmFit(eset, design) #eset is the data matrix with expression levels
or make a contrast model, something like:
contrast.matrix <- makeContrasts(D1pre-C, D2pre-C, D1post-D1pre, D2post-D2pre, levels=design) fit2 <- contrasts.fit(fit, contrast.matrix)
I will be thankful if someone could please help me in modeling.