Question

edgeR for DEG between two groups

0

Entering edit mode

zhouxiuqing • 0

@zhouxiuqing-7455

Last seen 9.1 years ago

Hi,

I am using edgeR for 65 pairs Tumor-Normal differential expression genes analysis. I want to get genes that differentially express between Tumor samples and Normal samples. The methods I used are refered to section 4.6 "Pro les of Unrelated Nigerian Individuals" of edgeR User's Guide. But the results are not ideal and that result in interrupted analysis. Could you please help me to check on the R codes? The codes are as follows:

library(Rsubread)

library(limma)

library(edgeR)

library(splines)

Counts <- read.table("readcounts.txt",head=T,row.names=1)

gr <- c("N","T")

group <- rep(gr, times=65)

x <- DGEList(counts=Counts,group=group)

x$samples$lib.size <- colSums(x$counts)

x <- calcNormFactors(x)

Patient <- factor(rep(15:79,each=2)) ####The sample names are "CH15N" "CH15T" "CH16N" "CH16T" ...... "CH79N" "CH79T".

Tissue <- factor(group)

data.frame(Sample=colnames(x),Patient,Tissue)

design <- model.matrix(~Patient+Tissue)

rownames(design) <- colnames(x)

x <- estimateGLMCommonDisp(x, design, verbose=TRUE)

x <- estimateGLMTrendedDisp(x, design)

x <- estimateGLMTagwiseDisp(x, design)

fit <- glmFit(x, design)

lrt <- glmLRT(fit)

write.table(topTags(lrt, n=dim(lrt$table)[1]),file="result.txt",row.names=TRUE,sep="\t")

If there is any mistake or question, please let me know. I look forward to hearing from you.

Thanks!

edgeR • 3.0k views

ADD COMMENT • link 9.1 years ago zhouxiuqing • 0

1

Entering edit mode

What do you mean by "not ideal" and "interrupted analysis"? The code itself looks fine.

ADD REPLY • link 9.1 years ago Aaron Lun ★ 28k

score 0 · Answer 1 · 2015-03-12

0

Entering edit mode

zhouxiuqing • 0

@zhouxiuqing-7455

Last seen 9.1 years ago

We run the code and get more than 500 differential expression genes. And then we use these genes to do pathway enrichment analysis. But the pathways do not include our interested genes. So we doubt if the differential expression genes are right.

ADD COMMENT • link 9.1 years ago zhouxiuqing • 0

0

Entering edit mode

I'd check the CPM values to make sure that the detected genes are actually DE. In particular, for each putative DE gene, you can plot the log-fold change between tissue types for each patient. The aim is to make sure that the log-fold changes for most of the patients are changing in a consistent direction. If DE detection is being driven by a couple of misbehaving patients, then you might have a point. Otherwise - the analysis will only interpret the data as it is supplied, so if a gene is DE, edgeR will report it, regardless of whether you are interested in it or not.

Conversely, if you have particular genes of interest, I'd have a look at their CPM values as well. Again, the idea here is to check that there are no outlier samples. For example, if several patients are behaving irregularly for a target gene, that may inflate the estimated dispersion and prevent DE detection.

ADD REPLY • link 9.1 years ago Aaron Lun ★ 28k