Question

Combat for batch effects

0

Entering edit mode

zhouxiuqing • 0

@zhouxiuqing-7455

Last seen 10.1 years ago

Hi,

I want to use edgeR for DEGs between two groups, but I find two batches in samples. So I intend to use Combat to adjust for differences between batches. I know edgeR is designed to work with actual read counts, not rpkm or fpkm. The question is can i use Combat on read counts straightly? Or Combat only be used on expression data, for example rpkm?

combat • 4.6k views

ADD COMMENT • link 10.1 years ago zhouxiuqing • 0

score 2 · Answer 1 · 2015-03-17

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

It would usually be preferable to skip any pre-adjustment and to instead include a batch effect as part of the edgeR linear model analysis. See case study 4.6 in the edgeR User's Guide for an example of this sort of analysis.

ADD COMMENT • link 10.1 years ago Gordon Smyth 52k

Gordon Smyth · Answer 2 · 2015-03-17

0

Entering edit mode

zhouxiuqing • 0

@zhouxiuqing-7455

Last seen 10.1 years ago

Hi Gordon, thank you for your advice. I have checked on case study 4.6 in the edgeR User's Guide, I don't think our case is similar with it.

I have 40 pairs Normal-Tumor samples. And there ane two batches. Can I use edgeR to adjust for batch effects in such a case?

I have used Combat on log(rpkm+1) data to adjust for batch effects, and then just use it's result to compute DEG by edgeR, is this OK? (I mean I do not use read counts but adjusted rpkm for DEG by edgeR.)

ADD COMMENT • link updated 10.1 years ago by Gordon Smyth 52k • written 10.1 years ago by zhouxiuqing • 0

0

Entering edit mode

No, what you are doing with Combat and rpkm is not ok.

Do you have different pairs in different batches? In that case, there is no need to take any special action to remove the batch effect, because it is automatically removed as part of the edgeR paired analysis.

ADD REPLY • link 10.1 years ago Gordon Smyth 52k

score 0 · Answer 3 · 2015-03-17

Yes，we have different pairs in different batches. We use featureCounts in Rsubread package to get read counts. You mean that we just use the read counts for DEG by edgeR without removing the batch effect? We did before and got more than 3,000 different expression genes. My codes are as follows:

library(Rsubread)
library(limma)
library(edgeR)
library(splines)
targets <- readTargets("targets.txt")
Counts <- read.table("readscount.txt")
x <- DGEList(counts=Counts,group=targets$CellType)
keep <- rowSums(cpm(x)>10) >= 80
x <- x[keep,]
x$samples$lib.size <- colSums(x$counts)
x <- calcNormFactors(x)
Patient <- factor(rep(15:54,each=2)) ###the sample are CH15N, CH15T,CH16N,CH16T...CH54N,CH54T
Tissue <- factor(targets$CellType)
data.frame(Sample=colnames(x),Patient,Tissue)
design <- model.matrix(~Patient+Tissue)
rownames(design) <- colnames(x)
x <- estimateGLMCommonDisp(x, design, verbose=TRUE)
x <- estimateGLMTrendedDisp(x, design)
x <- estimateGLMTagwiseDisp(x, design)
fit <- glmFit(x, design)
lrt <- glmLRT(fit)
write.table(topTags(lrt, n=dim(lrt$table)[1]),file="result.txt",row.names=TRUE,sep="\t")

Are the codes OK? Thank you very much for your help!

score 0 · Answer 4 · 2015-03-18

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

The code looks ok. Yes, any batch effect associated with groups of patients is already being removed.

One would expect to get 3000 or more DE genes for tumor vs normal for most cancers.

ADD COMMENT • link 10.1 years ago Gordon Smyth 52k

score 0 · Answer 5 · 2015-03-18

0

Entering edit mode

zhouxiuqing • 0

@zhouxiuqing-7455

Last seen 10.1 years ago

OK, thank you very much!

ADD COMMENT • link 10.1 years ago zhouxiuqing • 0