Batch correction and repeated measures
Entering edit mode
Jonathan ▴ 10
Last seen 4 days ago
United States

(I asked this as a side question before, but I though it deserves it's own post)

I have a bulk-RNA experiment with 3 tissue types: Lesional, Non-Lesional and Controls, whereas Lesional and Non-Lesional are taken from the same patients. Most of the Controls were processed in batch#2. To batch-correct, two technical replicates of lesional samples and two technical replicates of non-lesional samples were also included in batch #2. All samples have Patient ID and Sample ID, like so:

Sample ID Sample Type Patient ID Batch
S1 Lesional P1 1
S2 Non-Lesional P1 1
S3 Lesional P2 1
S4 Non-Lesional P2 1
... ... ... ...
S55 Lesional P20 1
S56 Non-Lesional P20 1
S55 Lesional P20 2
S56 Non-Lesional P20 2
S57 Lesional P21 1
S58 Non-Lesional P21 1
S57 Lesional P21 2
S58 Non-Lesional P21 2
S59 Control P22 2
S60 Control P23 2
... ... ... ...


I've read some comments regarding batch correction in the forum, and a similar problem was raised here with a more thorough explanation here, but I couldn't find an answer for a case of batch correction followed by repeated measures. I can use sva and ComBat_seq (or RUVSeq), but the comments recommend utilizing limma's duplicateCorrelation. This is a valid option, but I don't know which blocking parameter to use - because of the repeated-measures nature of the experiment, I need to use PatientID, but this might not suit the batch correction, which should match SampleID.

My current approach is something like this, which I'm not sure is the "right" way.

exprs.mat.batchcorrected = ComBat_seq(counts=exprs.mat, batch=phenodata$batch, group = phenodata$SampleID, full_mod = T)
DGE.cpm<-DGEList(counts = exprs.mat.batchcorrected, samples = phenodata)
...  # filtering genes etc ... #
design<-model.matrix(~0 + Sample.Type + batch, data=DGE.cpm$samples)
voom.result <-voomWithQualityWeights(DGE.cpm, design, plot=T, normalize="quantile")
corfit.rna<-duplicateCorrelation(voom.result, design, block=DGE.cpm$samples$PatientID)
voom.result.randeffect <-voomWithQualityWeights(DGE.cpm, design, plot=TRUE, block=DGE.cpm$samples$PatientID, correlation=corfit.rna$consensus, normalize="quantile")
corfit.rna2<-duplicateCorrelation(voom.result.randeffect, design, block=DGE.cpm$samples$PatientID)
fit2.rna<-lmFit(voom.result.randeffect, design, block=voom.result.randeffect$targets$PatientID, correlation=corfit.rna2$consensus)

Thank you so much for helping!

RUVSeq sva limma • 630 views
Entering edit mode
Last seen just now
WEHI, Melbourne, Australia

You should just do a standard limma analysis as recommended in the previous posts.

design <- model.matrix(~0 + Sample.Type + batch)
i <- filterByExpr(exprs.mat, group=Sample.Type)
fit <- edgeR::voomLmFit(exprs.mat[i,], design, block=PatientID, sample.weights=TRUE)

Here I am assuming that exprs.mat is your count matrix (just counts, not cpm or anything else).

That's all you need. You should not use ComBat to pre batch-correct.

A caveat is that this analysis treats the four technical replicates as if they were biological replicates, but that can't be helped, and it will be relatively harmless given the number of samples you have. I don't know any practical way to do better.

Entering edit mode

Thank you for your swift answer :)

Entering edit mode

Have you checked from the MDS plot that there is a visible batch effect for the repeated samples? Batch correction has lots of negative consequences for the analysis so you should avoid it if there is little or no effect to remove.

Entering edit mode

The technical replicates do look rather similar on the MDS plot, but the axes percentages are rather low... is that enough to draw conclusions?

enter image description here

Thank you again!

Entering edit mode

I see no evidence of a batch effect. I would be tempted to sum the technical replicates and analyse the data without batch correction.

Entering edit mode

Thanks, I'll try that. BTW, what do you mean by "sum them"? Do you mean averaging them?

Entering edit mode

I mean add the counts, for example by sumTechReps. Technical RNA-seq replicates are always combined by summing rather than by "averaging", equivalent to what you would get by combining the FASTQ files. In fact I am not even sure what "averaging" would mean for a count.


Login before adding your answer.

Traffic: 747 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6