Question

Batch correction and repeated measures

1

Entering edit mode

Jonathan ▴ 10

@c31cf0e5

Last seen 20 days ago

United States

(I asked this as a side question before, but I though it deserves it's own post)

I have a bulk-RNA experiment with 3 tissue types: Lesional, Non-Lesional and Controls, whereas Lesional and Non-Lesional are taken from the same patients. Most of the Controls were processed in batch#2. To batch-correct, two technical replicates of lesional samples and two technical replicates of non-lesional samples were also included in batch #2. All samples have Patient ID and Sample ID, like so:

Sample ID	Sample Type	Patient ID	Batch
S1	Lesional	P1	1
S2	Non-Lesional	P1	1
S3	Lesional	P2	1
S4	Non-Lesional	P2	1
...	...	...	...
S55	Lesional	P20	1
S56	Non-Lesional	P20	1
S55	Lesional	P20	2
S56	Non-Lesional	P20	2
S57	Lesional	P21	1
S58	Non-Lesional	P21	1
S57	Lesional	P21	2
S58	Non-Lesional	P21	2
S59	Control	P22	2
S60	Control	P23	2
...	...	...	...

.

I've read some comments regarding batch correction in the forum, and a similar problem was raised here with a more thorough explanation here, but I couldn't find an answer for a case of batch correction followed by repeated measures. I can use sva and ComBat_seq (or RUVSeq), but the comments recommend utilizing limma's duplicateCorrelation. This is a valid option, but I don't know which blocking parameter to use - because of the repeated-measures nature of the experiment, I need to use PatientID, but this might not suit the batch correction, which should match SampleID.

My current approach is something like this, which I'm not sure is the "right" way.

exprs.mat.batchcorrected = ComBat_seq(counts=exprs.mat, batch=phenodata$batch, group = phenodata$SampleID, full_mod = T)
DGE.cpm<-DGEList(counts = exprs.mat.batchcorrected, samples = phenodata)
...  # filtering genes etc ... #
design<-model.matrix(~0 + Sample.Type + batch, data=DGE.cpm$samples)
voom.result <-voomWithQualityWeights(DGE.cpm, design, plot=T, normalize="quantile")
corfit.rna<-duplicateCorrelation(voom.result, design, block=DGE.cpm$samples$PatientID)
voom.result.randeffect <-voomWithQualityWeights(DGE.cpm, design, plot=TRUE, block=DGE.cpm$samples$PatientID, correlation=corfit.rna$consensus, normalize="quantile")
corfit.rna2<-duplicateCorrelation(voom.result.randeffect, design, block=DGE.cpm$samples$PatientID)
fit2.rna<-lmFit(voom.result.randeffect, design, block=voom.result.randeffect$targets$PatientID, correlation=corfit.rna2$consensus)

Thank you so much for helping!

RUVSeq sva limma • 1.9k views

ADD COMMENT • link updated 7 months ago by Gordon Smyth 52k • written 14 months ago by Jonathan ▴ 10

score 3 · Accepted Answer · 2023-09-26

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 12 minutes ago

WEHI, Melbourne, Australia

You should just do a standard limma analysis as recommended in the previous posts.

design <- model.matrix(~0 + Sample.Type + batch)
i <- filterByExpr(exprs.mat, group=Sample.Type)
fit <- edgeR::voomLmFit(exprs.mat[i,], design, block=PatientID, sample.weights=TRUE)

Here I am assuming that exprs.mat is your count matrix (just counts, not cpm or anything else).

That's all you need. You should not use ComBat to pre batch-correct.

A caveat is that this analysis treats the four technical replicates as if they were biological replicates, but that can't be helped, and it will be relatively harmless given the number of samples you have. I don't know any practical way to do better.

ADD COMMENT • link 14 months ago Gordon Smyth 52k

0

Entering edit mode

Thank you for your swift answer :)

ADD REPLY • link 14 months ago Jonathan ▴ 10

0

Entering edit mode

Have you checked from the MDS plot that there is a visible batch effect for the repeated samples? Batch correction has lots of negative consequences for the analysis so you should avoid it if there is little or no effect to remove.

ADD REPLY • link 14 months ago Gordon Smyth 52k

0

Entering edit mode

The technical replicates do look rather similar on the MDS plot, but the axes percentages are rather low... is that enough to draw conclusions?

enter image description here

Thank you again!

ADD REPLY • link 14 months ago Jonathan ▴ 10

0

Entering edit mode

I see no evidence of a batch effect. I would be tempted to sum the technical replicates and analyse the data without batch correction.

ADD REPLY • link 14 months ago Gordon Smyth 52k

0

Entering edit mode

Thanks, I'll try that. BTW, what do you mean by "sum them"? Do you mean averaging them?

ADD REPLY • link 14 months ago Jonathan ▴ 10

1

Entering edit mode

I mean add the counts, for example by sumTechReps. Technical RNA-seq replicates are always combined by summing rather than by "averaging", equivalent to what you would get by combining the FASTQ files. In fact I am not even sure what "averaging" would mean for a count.

ADD REPLY • link 14 months ago Gordon Smyth 52k

0

Entering edit mode

I'm sorry for raising this thread, but I have a very similar project, and after returning to your answer I have a few more question regarding your guidance:

Why did you instruct me to use "just counts, not cpm or anything else"? Can't I use the DGEList object for the voomLmFit?
Can I utilize normalize="quantile" option for the voomLmFit in this case?

Thank you again!

ADD REPLY • link 8 months ago Jonathan ▴ 10

0

Entering edit mode

In your code, it was clear that expr.mat was a matrix rather than a DGEList, so it has to be a count matrix. If you have a DGEList object, then of course you could use that instead.

normalize="quantile" is an option in the code so you, again, you can obviously use it. Or alternatively run normLibSizes on the DGEList in edgeR before voomLmFit().

ADD REPLY • link 7 months ago Gordon Smyth 52k