Question: edgeR removing batch effect before using expression data for clustering
gravatar for Biologist
6 months ago by
Biologist70 wrote:

I have gene expression data for 8 samples, 4 controls and 4 FOXCUT OE samples.

The raw counts data were in a dataframe counts. coldata looks like below:

Samples   Group              Experiment
Sample1 Control                 EXP1
Sample2 Control                 EXP1
Sample3 Control                 EXP2
Sample4 Control                 EXP2
Sample5 FOXCUT Overexpression   EXP2
Sample6 FOXCUT Overexpression   EXP2
Sample7 FOXCUT Overexpression   EXP1
Sample8 FOXCUT Overexpression   EXP1

I have used following code for filtering and normallization:

group <- factor(paste0(coldata$Group))
y <- DGEList(counts,group = group)

## Filtering 
keep <- rowSums(cpm(y) > 1) >= 2

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") 
design2 <- model.matrix(~ 0 + group + coldata$Experiment)

So, after Normalization I did the batch correction using removeBatchEffect function like below:

y <- removeBatchEffect(y, design = design2)

## For Clustering Heatmap
logCPM <- cpm(y, prior.count=2, log=TRUE)
# Scale data to Z-scores
logCPM <- t(scale(t(logCPM)))

## To select top 10% highly variable genes
vars <- apply(logCPM, 1, IQR) 
clust <- logCPM[vars > quantile(vars, 0.9), ]

I wanted to use clust for a clustering heatmap.

Question: Is the above way of batch correction is right or should I do batch correction after calculating logCPM

ADD COMMENTlink modified 6 months ago by James W. MacDonald51k • written 6 months ago by Biologist70
Answer: edgeR removing batch effect before using expression data for clustering
gravatar for James W. MacDonald
6 months ago by
United States
James W. MacDonald51k wrote:

You should run removeBatchEffect after computing logCPM, but you should probably use voom rather than cpm for that step. The difference being that voom will also generate observation-level weights that will be passed to lmFit for the batch correction.

ADD COMMENTlink written 6 months ago by James W. MacDonald51k

Thanks for the reply. So, after calculating logCPM, then I have to use removeBatchEffect function on that and then scale that for clustering analysis. Is it fine?

logCPM <- cpm(y, prior.count=2, log=TRUE)

logCPM <- removeBatchEffect(logCPM)
logCPM <- t(scale(t(logCPM)))

Is this right?

ADD REPLYlink written 6 months ago by Biologist70

Did you read my answer?

ADD REPLYlink written 6 months ago by James W. MacDonald51k

Yes, James. But in this Remove batch effect in small RNASeq study (SVA or others?) Gordon recommends to do in the above way and then use it for MDS, PCA or heatmaps. So, I did it in that way.

Is this wrong?

ADD REPLYlink modified 6 months ago • written 6 months ago by Biologist70

No, it's not wrong, particularly if you are doing MDS, which is what he is talking about in that post. There is never any mention of heatmaps, and only the OP is talking about PCA. And Gordon specifically mentions that he isn't enthused with using the scale argument to prcomp, and I would interpret that as being cautious about using methods that assume normality on non-normal data.

You are using scale directly, which centers and scales the data by dividing by the standard deviation (well, the root-mean-square, which is the same if you also center). Since that assumes the data are at least normal-ish, I recommended you use voom, which will incorporate observation-level weights to the lmFit step, and hopefully reduce some of the heteroscedasticity inherent to logCPM data.

That said, if somebody gives you a recommendation, and you then use somebody else's recommendation (that is different from the recommendation you already have), why would you ask the first person if it's OK? You are basically asking me to critique advice that Gordon already gave in another context, without letting me know that's what you are doing.

ADD REPLYlink written 6 months ago by James W. MacDonald51k

Sorry, I got you now. Before your first comment itself I saw the Gordon's comment in that post and thought of asking you whether it is right. Anyways now I got the point, I will use voom

But If you dont mind could you please help me with the all the syntax I have to use for that. I never did this. Thank you

ADD REPLYlink written 6 months ago by Biologist70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 438 users visited in the last hour