Yes,we have different pairs in different batches. We use featureCounts in Rsubread package to get read counts. You mean that we just use the read counts for DEG by edgeR without removing the batch effect? We did before and got more than 3,000 different expression genes. My codes are as follows:
library(Rsubread)
library(limma)
library(edgeR)
library(splines)
targets <- readTargets("targets.txt")
Counts <- read.table("readscount.txt")
x <- DGEList(counts=Counts,group=targets$CellType)
keep <- rowSums(cpm(x)>10) >= 80
x <- x[keep,]
x$samples$lib.size <- colSums(x$counts)
x <- calcNormFactors(x)
Patient <- factor(rep(15:54,each=2)) ###the sample are CH15N, CH15T,CH16N,CH16T...CH54N,CH54T
Tissue <- factor(targets$CellType)
data.frame(Sample=colnames(x),Patient,Tissue)
design <- model.matrix(~Patient+Tissue)
rownames(design) <- colnames(x)
x <- estimateGLMCommonDisp(x, design, verbose=TRUE)
x <- estimateGLMTrendedDisp(x, design)
x <- estimateGLMTagwiseDisp(x, design)
fit <- glmFit(x, design)
lrt <- glmLRT(fit)
write.table(topTags(lrt, n=dim(lrt$table)[1]),file="result.txt",row.names=TRUE,sep="\t")
Are the codes OK? Thank you very much for your help!
No, what you are doing with Combat and rpkm is not ok.
Do you have different pairs in different batches? In that case, there is no need to take any special action to remove the batch effect, because it is automatically removed as part of the edgeR paired analysis.