Question

edgeR removing batch effect before using expression data for clustering

1

Entering edit mode

Biologist ▴ 120

@biologist-9801

Last seen 4.4 years ago

I have gene expression data for 8 samples, 4 controls and 4 FOXCUT OE samples.

The raw counts data were in a dataframe counts. coldata looks like below:

Samples   Group              Experiment
Sample1 Control                 EXP1
Sample2 Control                 EXP1
Sample3 Control                 EXP2
Sample4 Control                 EXP2
Sample5 FOXCUT Overexpression   EXP2
Sample6 FOXCUT Overexpression   EXP2
Sample7 FOXCUT Overexpression   EXP1
Sample8 FOXCUT Overexpression   EXP1

I have used following code for filtering and normallization:

library(edgeR)
group <- factor(paste0(coldata$Group))
y <- DGEList(counts,group = group)

## Filtering 
keep <- rowSums(cpm(y) > 1) >= 2
table(keep)
summary(keep)

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") 
design2 <- model.matrix(~ 0 + group + coldata$Experiment)

So, after Normalization I did the batch correction using removeBatchEffect function like below:

y <- removeBatchEffect(y, design = design2)

## For Clustering Heatmap
logCPM <- cpm(y, prior.count=2, log=TRUE)
# Scale data to Z-scores
logCPM <- t(scale(t(logCPM)))

library(matrixStats)
## To select top 10% highly variable genes
vars <- apply(logCPM, 1, IQR) 
clust <- logCPM[vars > quantile(vars, 0.9), ]

I wanted to use clust for a clustering heatmap.

Question: Is the above way of batch correction is right or should I do batch correction after calculating logCPM

edger r batcheffect clustering removeBatchEffect • 1.8k views

ADD COMMENT • link updated 5.2 years ago by James W. MacDonald 66k • written 5.2 years ago by Biologist ▴ 120

score 0 · Answer 1 · 2019-05-14

0

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

You should run removeBatchEffect after computing logCPM, but you should probably use voom rather than cpm for that step. The difference being that voom will also generate observation-level weights that will be passed to lmFit for the batch correction.

ADD COMMENT • link 5.2 years ago James W. MacDonald 66k

0

Entering edit mode

Thanks for the reply. So, after calculating logCPM, then I have to use removeBatchEffect function on that and then scale that for clustering analysis. Is it fine?

logCPM <- cpm(y, prior.count=2, log=TRUE)

logCPM <- removeBatchEffect(logCPM)
logCPM <- t(scale(t(logCPM)))

Is this right?

ADD REPLY • link 5.2 years ago Biologist ▴ 120

0

Entering edit mode

Did you read my answer?

ADD REPLY • link 5.2 years ago James W. MacDonald 66k

0

Entering edit mode

Yes, James. But in this Remove batch effect in small RNASeq study (SVA or others?) Gordon recommends to do in the above way and then use it for MDS, PCA or heatmaps. So, I did it in that way.

Is this wrong?

ADD REPLY • link 5.2 years ago Biologist ▴ 120

0

Entering edit mode

No, it's not wrong, particularly if you are doing MDS, which is what he is talking about in that post. There is never any mention of heatmaps, and only the OP is talking about PCA. And Gordon specifically mentions that he isn't enthused with using the scale argument to prcomp, and I would interpret that as being cautious about using methods that assume normality on non-normal data.

You are using scale directly, which centers and scales the data by dividing by the standard deviation (well, the root-mean-square, which is the same if you also center). Since that assumes the data are at least normal-ish, I recommended you use voom, which will incorporate observation-level weights to the lmFit step, and hopefully reduce some of the heteroscedasticity inherent to logCPM data.

That said, if somebody gives you a recommendation, and you then use somebody else's recommendation (that is different from the recommendation you already have), why would you ask the first person if it's OK? You are basically asking me to critique advice that Gordon already gave in another context, without letting me know that's what you are doing.

ADD REPLY • link 5.2 years ago James W. MacDonald 66k

0

Entering edit mode

Sorry, I got you now. Before your first comment itself I saw the Gordon's comment in that post and thought of asking you whether it is right. Anyways now I got the point, I will use voom

But If you dont mind could you please help me with the all the syntax I have to use for that. I never did this. Thank you

ADD REPLY • link 5.2 years ago Biologist ▴ 120