I am performing an analysis to look at the impact of 3 types of stimulation on 6 stem cell lines (each derived from 1 individual) and I ran the following code:
exprdata<-read.csv("Replicates Merged.csv", row.names=1)
metadata<-read.csv("Metadata.csv",row.names=1)
metadata$ID<-as.factor(metadata$ID)
precursors<-read.csv("Counts.csv",row.names=1)
form <- ~ 0 + Stim + (1 | ID)
compare <- makeContrastsDream(form, metadata, contrasts = c(compare3_1 = "StimFib - StimCon", compare3_2 = "StimFib - StimPDL", compare2_1 = "StimPDL - StimCon"))
fit <- dream(exprdata, form, metadata, compare, ddf = "Kenward-Roger")
fit2 <- variancePartition::eBayes(fit, robust=T, trend=log(precursors$Counts))
FibxCon<-variancePartition::topTable(fit, coef = "compare3_1", number=3469)
FibxPDL<-variancePartition::topTable(fit, coef = "compare3_2", number=3469)
PDLxCon<-variancePartition::topTable(fit, coef = "compare2_1", number=3469)
Of note, each cell line was plated in triplicate before stimulation. I see the following when running a PCA plot:
The most variation is attributed to the different cell lines. However, some of the technical replicates do not cluster well.
I thought it was best to merge the triplicates via median before running the model above. However, I started to wonder whether it be better to keep the triplicates separate? If so, would I have to change the structure of my random effect (ie nested)?
Any insights would be much appreciated!
gabriel.hoffman any insight?
It is almost always better to retain the full observed data, if the statistical software is able to model the repeated measures. This is a good application for a mixed model.
With count data, it is not clear how to collapse technical replicates, while allowing the statistical model to model count-error, varying library size and overdispersion.
1) In cases where there are multiple sources of expression variation, the variation partitioning plot can give insight not found in the PCA plot.
2) its not clear if you are using a voom-step here.
dream()
is designed to take the result ofvoom()
orvoomWithDreamWeights()
3) Fitting
eBayes()
with a trend is not recommended. Instead, incorporating precision weights from a voom-step intodream()
is the best way to handle counts.4) It looks like you have biological expression variation due to
Donor
andStimulus