Question: confusing output from EdgeR Glm
0
3.8 years ago by
amoltej10
Australia
amoltej10 wrote:

Hi,

I am using following code to analyse two samples and differentially expressed gene in them. there are 3 replicates for each sample. after completing analysis I am seeing all the logFC values in negative, I don't know why? I actually went back and checked the raw count for them and they seem wrong. can you please tell me if I am doing anything wrong? also why pvalue is 0 (zero)?  Thank you

here is the code that I am using -

#data_import

group<-c("Bcyt","Bcyt","Bcyt","mgut","mgut","mgut")
#make DGE list
cds<-DGEList(raw.data,group=group)
names(cds)
head(cds$counts) cds$samples
#normalize data using TMM
cds <- calcNormFactors( cds )
d$samples #data filtering cps <- cpm(cds) k <- rowSums(cps>=10)>3 d <- cds[k,] #reset the library size d$samples$lib.size <- colSums(d$counts)
cols <- as.numeric(d$samples$group)
par(mfrow=c(2,2))

plotMDS(d,col=cols)

#setup desing
design <- model.matrix(~0+group, data=d$samples) design d <- estimateGLMCommonDisp(d,design,verbose=T) d <- estimateGLMTrendedDisp(d,design) d <- estimateGLMTagwiseDisp(d,design) plotBCV(d) plotMeanVar(d,show.raw=TRUE, show.tagwise=TRUE, show.binned=TRUE) #calculate differentially expressed genes fit <- glmFit(d,design) lrt <- glmLRT(fit) topTags(lrt) names(lrt) lrt$comparison
#export data
saveme<-topTags( lrt, n = Inf , sort.by = "none" )
write.table(saveme,file="EdgeR_bcytvsgut_hiseq.txt",sep="\t",quote=F)
plotSmear(d)
abline(h = c(-2, 2), col = "blue")

ambiguities that i am seeing between data -

Raw data -

 Raw_data G000241 1580 1726 1805 94 82 244 G000270 34849 34345 36117 50908 44831 46246 differential expression result G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232 G000270 -7.90819 11.8414 5135.262 0 0


edger logfc pvalue • 850 views
modified 3.8 years ago by Aaron Lun24k • written 3.8 years ago by amoltej10

Sorry the data doesn't make sense there - here is updated version of sample data

 Raw_data Bcyt Bcyt Bcyt Gut Gut Gut G000241 1580 1726 1805 94 82 244 G000270 34849 34345 36117 50908 44831 46246 differential expression result G000241 -16.2926 6.382334 1060.65 1.18E-232 1.20E-232 G000270 -7.90819 11.8414 5135.262 0 0
Answer: confusing output from EdgeR Glm
1
3.8 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

You're using a design matrix without an intercept. This means that the first column/coefficient represents the average log-expression across the "Bcyt" group, while the second column represents the average log-expression across the "mgut" group. By default, glmLRT will drop the last coefficient in the design matrix. This is equivalent to testing for whether the average log-expression across all "mgut" samples is equal to zero. This will not be true for the vast majority of genes, resulting in many genes with very low p-values and negative log-FC's (presumably, the average log-expression for these genes is less than zero when adjusted for library size).

I'm guessing that you actually want to test for differential expression between "Bcyt" and "mgut" groups. If so, you need to define a contrast vector. For simplicity, I'll redefine your design matrix here:

grouping <- factor(group)
design <- model.matrix(~0+grouping)
colnames(design) <- levels(grouping)

Your contrast vector can be defined with makeContrasts, as shown below:

con <- makeContrasts(Bcyt - mgut, levels=design)

This can be supplied to the contrast= argument of glmLRT to test for DE between groups. The log-fold changes in the resulting DE list represent the change in expression of the "Bcyt" group over the "mgut" group.

fantastic !!! Thanks Aaron!

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.