Semi-technical replicates and voom sample weights
1
0
Entering edit mode
Le • 0
@bf1730d5
Last seen 8 weeks ago
United States

Hi,

I have a question about how to deal with semi-technical replicates in bulk RNAseq. I thought someone would've asked it but I couldn't find any. Essentially, I have 10 cell lines grown in 2 different medium conditions with 1 replicate each (20 samples separated in 3 different batches). I have another 1 control line grown in 2 different conditions, but repeated the same way in each of the 3 batches (6 semi-technical replicates). The comparison is between the two conditions regardless of cell lines. My initial thought was to a linear mixed model with variancePartition::dream() ~Condition + (1| line) + (1|batch) + other technical covariates. However, with this model, the 3 pairs of semi-technical replicates of the control line are over-represented here. I think keeping all of them would help to resolve some of the batch effects, as the batch effects are pretty pronounced in this dataset. Samples are separated by batches on PCA even for the same control lines. Is there a way to reduce the contribution of these 3 pairs of replicates, so each pair counts as 1/3 compared to other cell lines? I read somewhere on the forum that the external weights can be applied to each samples in addition to precision weights calculated by zoom. Does it make sense to make the sample weights to 1/3 of their original value for the technical replicates? If not, what would be the right way to do this?

I thought about taking the average count of one condition of the control line in each batch as baseline and divide all the other samples to the corresponding control line on a per-gene basis. However it didn't seem to remove batch effects at all in PCA.

A slightly unrelated question, is there a way to apply voomwithQualityWeights using linear mixed model as in dreamwithvoomweights() and dream() to account for sample heterogeneity?

Thanks! Le

variancePa variancePartition limma-voom dream • 1.3k views
ADD COMMENT
0
Entering edit mode

Your formula ~ Condition + (1| line) + (1|batch) looks reasonable. But I don't really follow the motivation for down-weighting some of the samples. In principle, you can include sample-level weights into the estimation of precision weights using voomWithDreamWeights(...,weights=w) where w is a vector storing a positive value for each sample. (You'll need a recent version of variancePartition for this) But I'm not sure that would be useful here.

Gordon also points out you can do this in limma as well using one random effect.

I've thought about combining voomWithQualityWeights() with dream's linear mixed model. But the clean math and fast implementation from limma's linear model become more complicated and computationally expensive when including multiple random effects

-- Gabriel

ADD REPLY
0
Entering edit mode

Thank you Dr. Hoffman for your reply and your package. They are very helpful.

My reasoning to down-weight technical replicates is to balance out the contribution of each line to the final results. Right now only the control lines had 3 technical replicates. My concern is that they'll make a bigger difference to identify DEGs than the other lines.

I'm also playing with combining voomWithQualityWeights() and dream() like this form=~ Condition + (1| line) + (1|batch)

vobjDream = voomWithDreamWeights( dge2, form, covariates )

design = model.matrix(~Condition+batch+line, covariates) #specifying fixed effect model to feed into voom

vobj_tmp = voomWithQualityWeights( dge2, design, plot=FALSE) #transfer sample weights to dream

vobjDream<-applyQualityWeights(vobjDream , vobj_tmp$targets$sample.weights)

fitmmKR = dream( vobjDream, form, covariates, ddf="Kenward-Roger", BPPARAM=param)

fitmmKR = eBayes(fitmmKR)

I did get way more DEGs, but not sure if this makes sense. My reasoning was that fixed effects model is probably close enough to mixed model to estimate sample weights.

Le

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

voomWithQualityWeights() handles one random effect by setting the block and using duplicateCorrelation(). The easy way to automate that is to use edgeR::voomLmFit(), which will automatically iterate voom(), duplicateCorrelation() and arrayWeights() for you in an appropriate way. It has the added advantage of better handling of groups with entirely zero counts.

However it isn't entirely clear to me from your description that random effects are necessarily good for your experiment. I would not recommend a random effect for batch when there are just three batches and the batch effect is large. Random effects are more appropriate when there are more levels and the effects are more random. Setting a cell line to be a random effect also seems unusual. Whether that is appropriate depends on how the cell lines were generated.

I don't follow your reasoning about weights or why you think samples are "over-represented". Dividing weights by 3 is not correct. I don't know what you have read, but it is not true that external weights can be applied that are not related to the variances of the respective samples. The other hack you propose, diving some counts by others, is just terrible. Please do not do that.

ADD COMMENT
0
Entering edit mode

Thanks for your reply Dr. Smyth,

Batch probably doesn't need to be a random effects variable, but line (level =11) might be appropriate.

For down-weighting technical replicates, what if I have 100 technical replicates for line A (grown in 2 conditions A_high and A_low in 100 different batches, where culture batch accounts for a large amount of variance), and only 1 technical replicate for 100 other lines (B_high vs B_low, C_high vs C_low etc. distributed evenly in 3 batches). Let's say I use the formula ~Condition + (1|line) where Condition means high vs low. Wouldn't line A in this case make a much larger impact to the overall analysis? Please forgive my superficial understanding of mixed models. This is all based on my intuition.

ADD REPLY
0
Entering edit mode

If technical replicates are correctly specified in the linear model, either by using random effects or by summing the replicates, then having more replicates for one level than another will make very little difference. This sort of thing is handled organically by the statistical model, it isn't something you need to make ad hoc adjustments for.

On the other hand, if you have technical replicates but enter them into the model as if they are full biological replicates, then you could get unbalanced or over-significant results.

In your case, I'm not entirely clear to what extent your replicates are technical. From your descriptions, they just seem like regular replicates.

ADD REPLY

Login before adding your answer.

Traffic: 892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6