Question: confusing output from EdgeR Glm
0
gravatar for amoltej
3.6 years ago by
amoltej10
Australia
amoltej10 wrote:

Hi, 

I am using following code to analyse two samples and differentially expressed gene in them. there are 3 replicates for each sample. after completing analysis I am seeing all the logFC values in negative, I don't know why? I actually went back and checked the raw count for them and they seem wrong. can you please tell me if I am doing anything wrong? also why pvalue is 0 (zero)?  Thank you 

here is the code that I am using -

#data_import
raw.data<-read.table("bcytvsgut.count",header=TRUE)

group<-c("Bcyt","Bcyt","Bcyt","mgut","mgut","mgut")
#make DGE list
cds<-DGEList(raw.data,group=group)
names(cds)        
head(cds$counts)
cds$samples
#normalize data using TMM
cds <- calcNormFactors( cds )
d$samples
#data filtering
cps <- cpm(cds)
k <- rowSums(cps>=10)>3
d <- cds[k,]

#reset the library size
d$samples$lib.size <- colSums(d$counts)
cols <- as.numeric(d$samples$group)
par(mfrow=c(2,2))

plotMDS(d,col=cols)

#setup desing
design <- model.matrix(~0+group, data=d$samples)
design

d <- estimateGLMCommonDisp(d,design,verbose=T)
d <- estimateGLMTrendedDisp(d,design)
d <- estimateGLMTagwiseDisp(d,design)
plotBCV(d)

plotMeanVar(d,show.raw=TRUE, show.tagwise=TRUE, show.binned=TRUE)
#calculate differentially expressed genes
fit <- glmFit(d,design)
lrt <- glmLRT(fit)

topTags(lrt)
names(lrt)
lrt$comparison
head(lrt$table)
#export data
saveme<-topTags( lrt, n = Inf , sort.by = "none" )
write.table(saveme,file="EdgeR_bcytvsgut_hiseq.txt",sep="\t",quote=F)
plotSmear(d)
abline(h = c(-2, 2), col = "blue")

ambiguities that i am seeing between data - 

Raw data - 

Raw_data          
G000241 1580 1726 1805 94 82 244
G000270 34849 34345 36117 50908 44831 46246
differential expression result        
G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232  
G000270 -7.90819 11.8414 5135.262 0 0

 

 
 
edger logfc pvalue • 821 views
ADD COMMENTlink modified 3.6 years ago by Aaron Lun23k • written 3.6 years ago by amoltej10

Sorry the data doesn't make sense there - here is updated version of sample data 

Raw_data          
  Bcyt Bcyt Bcyt Gut Gut Gut
G000241 1580 1726 1805 94 82 244
G000270 34849 34345 36117 50908 44831 46246
differential expression result        
G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232  
G000270 -7.90819 11.8414 5135.262 0 0  
ADD REPLYlink written 3.6 years ago by amoltej10
Answer: confusing output from EdgeR Glm
1
gravatar for Aaron Lun
3.6 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

You're using a design matrix without an intercept. This means that the first column/coefficient represents the average log-expression across the "Bcyt" group, while the second column represents the average log-expression across the "mgut" group. By default, glmLRT will drop the last coefficient in the design matrix. This is equivalent to testing for whether the average log-expression across all "mgut" samples is equal to zero. This will not be true for the vast majority of genes, resulting in many genes with very low p-values and negative log-FC's (presumably, the average log-expression for these genes is less than zero when adjusted for library size).

I'm guessing that you actually want to test for differential expression between "Bcyt" and "mgut" groups. If so, you need to define a contrast vector. For simplicity, I'll redefine your design matrix here:

grouping <- factor(group)
design <- model.matrix(~0+grouping)
colnames(design) <- levels(grouping)

Your contrast vector can be defined with makeContrasts, as shown below:

con <- makeContrasts(Bcyt - mgut, levels=design)

This can be supplied to the contrast= argument of glmLRT to test for DE between groups. The log-fold changes in the resulting DE list represent the change in expression of the "Bcyt" group over the "mgut" group.

ADD COMMENTlink written 3.6 years ago by Aaron Lun23k

fantastic !!! Thanks Aaron! 

ADD REPLYlink written 3.6 years ago by amoltej10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 150 users visited in the last hour