edgeR for DEG between two groups
1
0
Entering edit mode
@zhouxiuqing-7455
Last seen 9.1 years ago

Hi,

    I am using edgeR for 65 pairs Tumor-Normal differential expression genes analysis. I want to get genes that differentially express between Tumor samples and Normal samples. The methods I used are refered to section 4.6 "Pro les of Unrelated Nigerian Individuals" of edgeR User's Guide. But the results are not ideal and that result in interrupted analysis. Could you please help me to check on  the R codes? The codes are as follows:

library(Rsubread)

library(limma)

library(edgeR)

library(splines)

Counts <- read.table("readcounts.txt",head=T,row.names=1)

gr <- c("N","T")

group <- rep(gr, times=65)

x <- DGEList(counts=Counts,group=group)

x$samples$lib.size <- colSums(x$counts)

x <- calcNormFactors(x)

Patient <- factor(rep(15:79,each=2))                       ####The sample names are "CH15N" "CH15T" "CH16N" "CH16T" ...... "CH79N" "CH79T".

Tissue <- factor(group)

data.frame(Sample=colnames(x),Patient,Tissue)

design <- model.matrix(~Patient+Tissue)

rownames(design) <- colnames(x)

x <- estimateGLMCommonDisp(x, design, verbose=TRUE)

x <- estimateGLMTrendedDisp(x, design)

x <- estimateGLMTagwiseDisp(x, design)

fit <- glmFit(x, design)

lrt <- glmLRT(fit)

write.table(topTags(lrt, n=dim(lrt$table)[1]),file="result.txt",row.names=TRUE,sep="\t")

    If there is any mistake or question, please let me know. I look forward to hearing from you.

    Thanks!

edgeR • 3.0k views
ADD COMMENT
1
Entering edit mode

What do you mean by "not ideal" and "interrupted analysis"? The code itself looks fine.

ADD REPLY
0
Entering edit mode
@zhouxiuqing-7455
Last seen 9.1 years ago

We run the code and get more than 500 differential expression genes. And then we use these genes to do pathway enrichment analysis. But the pathways do not include our interested genes. So we doubt if the differential expression genes are right. 

ADD COMMENT
0
Entering edit mode

I'd check the CPM values to make sure that the detected genes are actually DE. In particular, for each putative DE gene, you can plot the log-fold change between tissue types for each patient. The aim is to make sure that the log-fold changes for most of the patients are changing in a consistent direction. If DE detection is being driven by a couple of misbehaving patients, then you might have a point. Otherwise - the analysis will only interpret the data as it is supplied, so if a gene is DE, edgeR will report it, regardless of whether you are interested in it or not.

Conversely, if you have particular genes of interest, I'd have a look at their CPM values as well. Again, the idea here is to check that there are no outlier samples. For example, if several patients are behaving irregularly for a target gene, that may inflate the estimated dispersion and prevent DE detection.

ADD REPLY

Login before adding your answer.

Traffic: 952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6