confusing output from EdgeR Glm
1
0
Entering edit mode
amoltej ▴ 10
@amoltej-7192
Last seen 7.2 years ago
Australia

Hi, 

I am using following code to analyse two samples and differentially expressed gene in them. there are 3 replicates for each sample. after completing analysis I am seeing all the logFC values in negative, I don't know why? I actually went back and checked the raw count for them and they seem wrong. can you please tell me if I am doing anything wrong? also why pvalue is 0 (zero)?  Thank you 

here is the code that I am using -

#data_import
raw.data<-read.table("bcytvsgut.count",header=TRUE)

group<-c("Bcyt","Bcyt","Bcyt","mgut","mgut","mgut")
#make DGE list
cds<-DGEList(raw.data,group=group)
names(cds)        
head(cds$counts)
cds$samples
#normalize data using TMM
cds <- calcNormFactors( cds )
d$samples
#data filtering
cps <- cpm(cds)
k <- rowSums(cps>=10)>3
d <- cds[k,]

#reset the library size
d$samples$lib.size <- colSums(d$counts)
cols <- as.numeric(d$samples$group)
par(mfrow=c(2,2))

plotMDS(d,col=cols)

#setup desing
design <- model.matrix(~0+group, data=d$samples)
design

d <- estimateGLMCommonDisp(d,design,verbose=T)
d <- estimateGLMTrendedDisp(d,design)
d <- estimateGLMTagwiseDisp(d,design)
plotBCV(d)

plotMeanVar(d,show.raw=TRUE, show.tagwise=TRUE, show.binned=TRUE)
#calculate differentially expressed genes
fit <- glmFit(d,design)
lrt <- glmLRT(fit)

topTags(lrt)
names(lrt)
lrt$comparison
head(lrt$table)
#export data
saveme<-topTags( lrt, n = Inf , sort.by = "none" )
write.table(saveme,file="EdgeR_bcytvsgut_hiseq.txt",sep="\t",quote=F)
plotSmear(d)
abline(h = c(-2, 2), col = "blue")

ambiguities that i am seeing between data - 

Raw data - 

Raw_data          
G000241 1580 1726 1805 94 82 244
G000270 34849 34345 36117 50908 44831 46246
differential expression result        
G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232  
G000270 -7.90819 11.8414 5135.262 0 0

 

 
 
edger logfc pvalue • 2.0k views
ADD COMMENT
0
Entering edit mode

Sorry the data doesn't make sense there - here is updated version of sample data 

Raw_data          
  Bcyt Bcyt Bcyt Gut Gut Gut
G000241 1580 1726 1805 94 82 244
G000270 34849 34345 36117 50908 44831 46246
differential expression result        
G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232  
G000270 -7.90819 11.8414 5135.262 0 0  
ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

You're using a design matrix without an intercept. This means that the first column/coefficient represents the average log-expression across the "Bcyt" group, while the second column represents the average log-expression across the "mgut" group. By default, glmLRT will drop the last coefficient in the design matrix. This is equivalent to testing for whether the average log-expression across all "mgut" samples is equal to zero. This will not be true for the vast majority of genes, resulting in many genes with very low p-values and negative log-FC's (presumably, the average log-expression for these genes is less than zero when adjusted for library size).

I'm guessing that you actually want to test for differential expression between "Bcyt" and "mgut" groups. If so, you need to define a contrast vector. For simplicity, I'll redefine your design matrix here:

grouping <- factor(group)
design <- model.matrix(~0+grouping)
colnames(design) <- levels(grouping)

Your contrast vector can be defined with makeContrasts, as shown below:

con <- makeContrasts(Bcyt - mgut, levels=design)

This can be supplied to the contrast= argument of glmLRT to test for DE between groups. The log-fold changes in the resulting DE list represent the change in expression of the "Bcyt" group over the "mgut" group.

ADD COMMENT
0
Entering edit mode

fantastic !!! Thanks Aaron! 

ADD REPLY

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6