Hi,
I have a question about how to deal with semi-technical replicates in bulk RNAseq. I thought someone would've asked it but I couldn't find any. Essentially, I have 10 cell lines grown in 2 different medium conditions with 1 replicate each (20 samples separated in 3 different batches). I have another 1 control line grown in 2 different conditions, but repeated the same way in each of the 3 batches (6 semi-technical replicates). The comparison is between the two conditions regardless of cell lines. My initial thought was to a linear mixed model with variancePartition::dream() ~Condition + (1| line) + (1|batch) + other technical covariates. However, with this model, the 3 pairs of semi-technical replicates of the control line are over-represented here. I think keeping all of them would help to resolve some of the batch effects, as the batch effects are pretty pronounced in this dataset. Samples are separated by batches on PCA even for the same control lines. Is there a way to reduce the contribution of these 3 pairs of replicates, so each pair counts as 1/3 compared to other cell lines? I read somewhere on the forum that the external weights can be applied to each samples in addition to precision weights calculated by zoom. Does it make sense to make the sample weights to 1/3 of their original value for the technical replicates? If not, what would be the right way to do this?
I thought about taking the average count of one condition of the control line in each batch as baseline and divide all the other samples to the corresponding control line on a per-gene basis. However it didn't seem to remove batch effects at all in PCA.
A slightly unrelated question, is there a way to apply voomwithQualityWeights using linear mixed model as in dreamwithvoomweights() and dream() to account for sample heterogeneity?
Thanks! Le
Your formula
~ Condition + (1| line) + (1|batch)
looks reasonable. But I don't really follow the motivation for down-weighting some of the samples. In principle, you can include sample-level weights into the estimation of precision weights usingvoomWithDreamWeights(...,weights=w)
wherew
is a vector storing a positive value for each sample. (You'll need a recent version ofvariancePartition
for this) But I'm not sure that would be useful here.Gordon also points out you can do this in
limma
as well using one random effect.I've thought about combining
voomWithQualityWeights()
with dream's linear mixed model. But the clean math and fast implementation from limma's linear model become more complicated and computationally expensive when including multiple random effects-- Gabriel
Thank you Dr. Hoffman for your reply and your package. They are very helpful.
My reasoning to down-weight technical replicates is to balance out the contribution of each line to the final results. Right now only the control lines had 3 technical replicates. My concern is that they'll make a bigger difference to identify DEGs than the other lines.
I'm also playing with combining voomWithQualityWeights() and dream() like this form=~ Condition + (1| line) + (1|batch)
vobjDream = voomWithDreamWeights( dge2, form, covariates )
design = model.matrix(~Condition+batch+line, covariates) #specifying fixed effect model to feed into voom
vobj_tmp = voomWithQualityWeights( dge2, design, plot=FALSE) #transfer sample weights to dream
vobjDream<-applyQualityWeights(vobjDream , vobj_tmp$targets$sample.weights)
fitmmKR = dream( vobjDream, form, covariates, ddf="Kenward-Roger", BPPARAM=param)
fitmmKR = eBayes(fitmmKR)
I did get way more DEGs, but not sure if this makes sense. My reasoning was that fixed effects model is probably close enough to mixed model to estimate sample weights.
Le