Question

Same number of topTags reported for different contrasts

0

Entering edit mode

MBWatson • 0

@mbwatson-7475

Last seen 7.5 years ago

United States

Good afternoon,

I am using edgeR to find differentially expressed genes among populations (SD and BR) and salinity treatments (15, 35, 60ppt). I have three biological replicates for each treatment. My experimental design matches very closely the example in section 3.2.4 of the (excellent) edgeR user guide and I have been using it as an example. The code appears to working correctly but when I write the topTags to a file, each pairwise contrast has the exact same number of tags- this seems unexpected. I am using 'topTags(x, n=Inf).' If anyone has an feedback on this observation, I would appreciate hearing it. My code is below. Thank you!

source("http://bioconductor.org/biocLite.R")
biocLite("edgeR")
library("edgeR")

x <- read.delim("~path_to_file", header=TRUE, row.names="Symbols") 
z <-x[ ,c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18)]
#make new file called y composed of these columns
keep <- rowSums(cpm(z)) >=18 
z<- z[keep,] 
group <- factor(c("B","B","B","A","A","A","E","E","E","D","D","D","C","C","C","G","G","G")) 
#BR15=B, BR35=A, SD15=D, SD35=C, BR60=E, SD60=G
y <- DGEList(counts=z, group=group) 
y <-calcNormFactors(y) 
design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design, verbose=TRUE)
y <- estimateGLMTagwiseDisp(y, design)
fit <-glmFit(y, design)

#Contrast 1: Genes differentially expressed between BR15 and BR35

BvsA <- makeContrasts(B-A, levels=design)
lrt <-glmLRT(fit, contrast=BvsA)
DGE_BR15vsBR35 <- topTags(lrt, n=Inf)
write.csv(DGE_BR15vsBR35, file="DGE_BR15vsBR35.txt")

#Contrast 2: Genes differentially expressed between SD15 and SD35

DvsC <- makeContrasts(D-C, levels=design)
lrt <-glmLRT(fit, contrast=DvsC)
DGE_SD15vsSD35 <-topTags(lrt, n=Inf)
write.csv(DGE_SD15vsSD35, file="DGE_SD15vsSD35.txt")

#Contrast 3: Genes differentially expressed between BR35 and BR60

AvsE <- makeContrasts(A-E, levels=design) 
lrt <-glmLRT(fit, contrast=AvsE)
DGE_BR35vsBR60 <-topTags(lrt, n=Inf)
write.csv(DGE_BR35vsBR60, file="DGE_BR35vsBR60.txt")

#Contrast 4: Genes differentially expressed between SD35 and SD60

CvsG <- makeContrasts(C-G, levels=design)
lrt <-glmLRT(fit, contrast=CvsG)
DGE_SD35vsSD60 <-topTags(lrt, n=Inf)
write.csv(DGE_SD35vsSD60, file="DGE_SD35vsSD60.txt")

edger topTags design and contrast matrix • 1.4k views

ADD COMMENT • link updated 9.1 years ago by Gordon Smyth 50k • written 9.1 years ago by MBWatson • 0

score 1 · Answer 1 · 2015-03-21

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

You are using n=Inf, hence the topTags table lists every gene, hence all the topTags tables must be the same length. The length of your topTags tables will be the same as nrow(fit).

If you intended to write a topTags tables of statistically significant genes only, then you needed to set the 'p' argument of topTags. For example, topTags with n=Inf and p=0.05 will give genes significant at FDR of 5%.