Combat for batch effects
5
0
Entering edit mode
@zhouxiuqing-7455
Last seen 6.6 years ago

Hi,

  I want to use edgeR for DEGs between two groups, but I find two batches in samples. So I intend to use Combat to adjust for differences between batches. I know edgeR is designed to work with actual read counts, not rpkm or fpkm. The question is can i use Combat on read counts straightly? Or Combat only be used on expression data, for example rpkm?

combat • 2.8k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

It would usually be preferable to skip any pre-adjustment and to instead include a batch effect as part of the edgeR linear model analysis. See case study 4.6 in the edgeR User's Guide for an example of this sort of analysis.

ADD COMMENT
0
Entering edit mode
@zhouxiuqing-7455
Last seen 6.6 years ago

    Hi Gordon, thank you for your advice. I have checked on  case study 4.6 in the edgeR User's Guide, I don't think our case is similar with it. 

    I have 40 pairs Normal-Tumor samples. And there ane two batches. Can I use edgeR to adjust for batch effects in such a case?

    I have used Combat on log(rpkm+1) data to adjust for batch effects, and then just use it's result to compute DEG by edgeR, is this OK? (I mean I do not use read counts but adjusted rpkm for DEG by edgeR.) 

ADD COMMENT
0
Entering edit mode

No, what you are doing with Combat and rpkm is not ok.

Do you have different pairs in different batches? In that case, there is no need to take any special action to remove the batch effect, because it is automatically removed as part of the edgeR paired analysis.

ADD REPLY
0
Entering edit mode
@zhouxiuqing-7455
Last seen 6.6 years ago

   Yes,we have different pairs in different batches. We use featureCounts in Rsubread package to get read counts. You mean that we just use the read counts for DEG by edgeR without removing the batch effect? We did before and got more than 3,000 different expression genes. My codes are as follows:

library(Rsubread)
library(limma)
library(edgeR)
library(splines)
targets <- readTargets("targets.txt")
Counts <- read.table("readscount.txt")
x <- DGEList(counts=Counts,group=targets$CellType)
keep <- rowSums(cpm(x)>10) >= 80
x <- x[keep,]
x$samples$lib.size <- colSums(x$counts)
x <- calcNormFactors(x)
Patient <- factor(rep(15:54,each=2)) ###the sample are CH15N, CH15T,CH16N,CH16T...CH54N,CH54T
Tissue <- factor(targets$CellType)
data.frame(Sample=colnames(x),Patient,Tissue)
design <- model.matrix(~Patient+Tissue)
rownames(design) <- colnames(x)
x <- estimateGLMCommonDisp(x, design, verbose=TRUE)
x <- estimateGLMTrendedDisp(x, design)
x <- estimateGLMTagwiseDisp(x, design)
fit <- glmFit(x, design)
lrt <- glmLRT(fit)
write.table(topTags(lrt, n=dim(lrt$table)[1]),file="result.txt",row.names=TRUE,sep="\t")

    Are the codes OK? Thank you very much for your help! 

 

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

The code looks ok. Yes, any batch effect associated with groups of patients is already being removed.

One would expect to get 3000 or more DE genes for tumor vs normal for most cancers.

ADD COMMENT
0
Entering edit mode
@zhouxiuqing-7455
Last seen 6.6 years ago

OK, thank you very much!

ADD COMMENT

Login before adding your answer.

Traffic: 369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6