i all,
I have RNAseq data for treated and untreated samples (in triplicates) in three different cell lines: KO1, KO2, WT. The goal is to do three comparisons:
(1) KO1 treated vs. untreated (2) KO2 treated vs. untreated and (3) WT treated vs untreated
Here is the group information in a data.frame called "sampleInfo" :
> sampleInfo
| Group | |
| sample1 | KO_2.Treated |
| sample2 | KO_2.Treated |
| sample3 | KO_2.Treated |
| sample4 | KO_2.Untreated |
| sample5 | KO_2.Untreated |
| sample6 | KO_2.Untreated |
| sample7 | KO_1.Treated |
| sample8 | KO_1.Treated |
| sample9 | KO_1.Treated |
| sample10 | KO_1.Untreated |
| sample11 | KO_1.Untreated |
| sample12 | KO_1.Untreated |
| sample13 | WT.Treated |
| sample14 | WT.Treated |
| sample15 | WT.Treated |
| sample16 | WT.Untreated |
| sample17 | WT.Untreated |
| sample18 | WT.Untreated |
I have a counts results data frame that has 18 columns corresponding to the gene counts of the above 18 samples.
My question is, does it make sense if I combine all the data together (18 samples total), make constrasts that specifies the three different comparisons I want to make, fit them through glimFit, and then calculate the three different contrasts separately as below:
## Construct DGEList
d <- DGEList(counts=counts)
## Make design matrix
Group = factor(sampleInfo$Group)
design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
rownames(design) <- colnames(counts)
## Make contrasts
prestr <-"my.contrasts = makeContrasts("
mainStr <- paste("KO_2.Treated_vs_Untreated=KO_2.Treated-KO_2.Untreated,",
"KO_1.Treated_vs_Untreated=KO_1.Treated-KO_1.Untreated,",
"WT.Treated_vs_Untreated=WT.Treated-WT.Untreated",sep="")
poststr <-",levels=design)"
commandstr=paste(prestr,mainStr,poststr,sep="")
eval(parse(text=commandstr))
# annotationTable = read.csv(annotationsFile, row.names=1)
#################
## Filtering ##
#################
d <- calcNormFactors(d)
d <- estimateGLMCommonDisp(d, design)
d <- estimateGLMTagwiseDisp(d, design)
fit <- glmFit(d, design)
Or, do I need to do the glmFit separately?
Thanks very much in advance!

Thank you so much! This is super helpful. Good point about the unnecessary complications there - it was sort of copy paste from my previous project where the contrasts were a lot more complicated & should have changed it. Thanks a lot!!
Indeed, we find that the extra precision from more residual d.f. outweighs any loss of performance due to bias from using a common dispersion for all groups. Also, just to elaborate on
voomWithQualityWeights; if you want to model group-specific variances, you have to specify the design matrix in thevar.designargument, as well as indesign.