Batch correction and repeated measures
1
1
Entering edit mode
Jonathan ▴ 10
@c31cf0e5
Last seen 14 days ago
United States

(I asked this as a side question before, but I though it deserves it's own post)

I have a bulk-RNA experiment with 3 tissue types: Lesional, Non-Lesional and Controls, whereas Lesional and Non-Lesional are taken from the same patients. Most of the Controls were processed in batch#2. To batch-correct, two technical replicates of lesional samples and two technical replicates of non-lesional samples were also included in batch #2. All samples have Patient ID and Sample ID, like so:

Sample ID Sample Type Patient ID Batch
S1 Lesional P1 1
S2 Non-Lesional P1 1
S3 Lesional P2 1
S4 Non-Lesional P2 1
... ... ... ...
S55 Lesional P20 1
S56 Non-Lesional P20 1
S55 Lesional P20 2
S56 Non-Lesional P20 2
S57 Lesional P21 1
S58 Non-Lesional P21 1
S57 Lesional P21 2
S58 Non-Lesional P21 2
S59 Control P22 2
S60 Control P23 2
... ... ... ...

.

I've read some comments regarding batch correction in the forum, and a similar problem was raised here with a more thorough explanation here, but I couldn't find an answer for a case of batch correction followed by repeated measures. I can use sva and ComBat_seq (or RUVSeq), but the comments recommend utilizing limma's duplicateCorrelation. This is a valid option, but I don't know which blocking parameter to use - because of the repeated-measures nature of the experiment, I need to use PatientID, but this might not suit the batch correction, which should match SampleID.

My current approach is something like this, which I'm not sure is the "right" way.

exprs.mat.batchcorrected = ComBat_seq(counts=exprs.mat, batch=phenodata$batch, group = phenodata$SampleID, full_mod = T)
DGE.cpm<-DGEList(counts = exprs.mat.batchcorrected, samples = phenodata)
...  # filtering genes etc ... #
design<-model.matrix(~0 + Sample.Type + batch, data=DGE.cpm$samples)
voom.result <-voomWithQualityWeights(DGE.cpm, design, plot=T, normalize="quantile")
corfit.rna<-duplicateCorrelation(voom.result, design, block=DGE.cpm$samples$PatientID)
voom.result.randeffect <-voomWithQualityWeights(DGE.cpm, design, plot=TRUE, block=DGE.cpm$samples$PatientID, correlation=corfit.rna$consensus, normalize="quantile")
corfit.rna2<-duplicateCorrelation(voom.result.randeffect, design, block=DGE.cpm$samples$PatientID)
fit2.rna<-lmFit(voom.result.randeffect, design, block=voom.result.randeffect$targets$PatientID, correlation=corfit.rna2$consensus)

Thank you so much for helping!

RUVSeq sva limma • 1.9k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 13 minutes ago
WEHI, Melbourne, Australia

You should just do a standard limma analysis as recommended in the previous posts.

design <- model.matrix(~0 + Sample.Type + batch)
i <- filterByExpr(exprs.mat, group=Sample.Type)
fit <- edgeR::voomLmFit(exprs.mat[i,], design, block=PatientID, sample.weights=TRUE)

Here I am assuming that exprs.mat is your count matrix (just counts, not cpm or anything else).

That's all you need. You should not use ComBat to pre batch-correct.

A caveat is that this analysis treats the four technical replicates as if they were biological replicates, but that can't be helped, and it will be relatively harmless given the number of samples you have. I don't know any practical way to do better.

ADD COMMENT
0
Entering edit mode

Thank you for your swift answer :)

ADD REPLY
0
Entering edit mode

Have you checked from the MDS plot that there is a visible batch effect for the repeated samples? Batch correction has lots of negative consequences for the analysis so you should avoid it if there is little or no effect to remove.

ADD REPLY
0
Entering edit mode

The technical replicates do look rather similar on the MDS plot, but the axes percentages are rather low... is that enough to draw conclusions?

enter image description here

Thank you again!

ADD REPLY
0
Entering edit mode

I see no evidence of a batch effect. I would be tempted to sum the technical replicates and analyse the data without batch correction.

ADD REPLY
0
Entering edit mode

Thanks, I'll try that. BTW, what do you mean by "sum them"? Do you mean averaging them?

ADD REPLY
1
Entering edit mode

I mean add the counts, for example by sumTechReps. Technical RNA-seq replicates are always combined by summing rather than by "averaging", equivalent to what you would get by combining the FASTQ files. In fact I am not even sure what "averaging" would mean for a count.

ADD REPLY
0
Entering edit mode

I'm sorry for raising this thread, but I have a very similar project, and after returning to your answer I have a few more question regarding your guidance:

  1. Why did you instruct me to use "just counts, not cpm or anything else"? Can't I use the DGEList object for the voomLmFit?
  2. Can I utilize normalize="quantile" option for the voomLmFit in this case?

Thank you again!

ADD REPLY
0
Entering edit mode

In your code, it was clear that expr.mat was a matrix rather than a DGEList, so it has to be a count matrix. If you have a DGEList object, then of course you could use that instead.

normalize="quantile" is an option in the code so you, again, you can obviously use it. Or alternatively run normLibSizes on the DGEList in edgeR before voomLmFit().

ADD REPLY

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6