Question

edgeR - coefficients in 3-factor experiment, complex contrasts and decideTestsDGE

0

Entering edit mode

Emanuel Heitlinger ▴ 30

@emanuel-heitlinger-4912

Last seen 9.6 years ago

Hi all, I have some questions regarding multi-factor-glms in edgeR. I am working on a RNA-seq experiment: I have 24 samples from 3 "treatments" each having two levels. This means 3 biological replicates per treatment combination. I want to model the full set of possible interactions (sex.conds*eel.conds*pop.conds), as expecially two-fold interactions could be very interetsing for my research-question. I want to categorize genes (contigs form a 454-transcriptome assembly, genome is unsequenced) I mapped my reads against into categoris: a) only different for sex b) only different for eel c) only different for pop d1)d2)d3) different for sex:eel, sex:pop, eel:pop In glmLRT giving simple coefficients would compare the complete model to a model removing one coefficient at a time. From application of glms in ecology I remember that an interaction effect should not be left in the model if the main effect is removed. Does this apply here? Should I compare the full model against e.g. the model minus pop, sex:pop, eel:pop and sex:eel:pop, when I want to remove condition "pop" from the model? Hope this code demonstrates what I mean: ## CODE start library(edgeR) ## generate a df of neg. bionm. counts y <- as.data.frame(matrix(rnbinom(6000,mu=10,size=10),ncol=24)) names(y) <- c("AA_R11M", "AA_R16M", "AA_R18F", "AA_R28F", "AA_R2M", "AA_R8F", "AA_T12F", "AA_T20F", "AA_T24M", "AA_T3M", "AA_T42M", "AA_T45F", "AJ_R1F", "AJ_R1M", "AJ_R3F", "AJ_R3M", "AJ_R5F", "AJ_R5M", "AJ_T19M", "AJ_T20M", "AJ_T25M", "AJ_T26F", "AJ_T5F", "AJ_T8F") sex.conds <- factor(ifelse(grepl("M$", names(y)), "male", "female" )) eel.conds <- factor(ifelse(grepl("^AA", names(y)), "Aa", "Aj" )) pop.conds <- factor(ifelse(grepl("\\w\\w_R.*", names(y)), "EU", "TW" )) design <- model.matrix(~sex.conds*eel.conds*pop.conds) ## Counts frame to full DGEList d <- DGEList(y, lib.size=colSums(y)) d <- calcNormFactors(d) d <- estimateGLMCommonDisp(d, design=design) d <- estimateGLMTrendedDisp(d, design=design) d <- estimateGLMTagwiseDisp(d, design=design) glmfit <- glmFit(d, design, dispersion=d$tagwise.dispersion) glm.wrapper <- function(de.obj, fit.obj, conds.regex){ glm.list <- list() coe <- names(as.data.frame(fit.obj$design)) coe.l <- lapply(conds.regex, function (x) grep(x, coe)) for (i in 1:length(conds.regex)){ glm.list[[conds.regex[[i]]]] <- glmLRT(de.obj, fit.obj, coef=coe.l[[i]]) } return(glm.list) } ## selects all coefficients being contained in each other hierachically combi.conds <- gsub(":", ".*", names(as.data.frame(glmfit$design))[2:8]) glm.l <- glm.wrapper(d, glmfit, combi.conds) ## show what is compared lapply(glm.l, function (x) x$comparison) ## topTags works (same as using p.adjust directly) topTags.l <- lapply(glm.l, function (x){ tt <- topTags(x, n=40000) ## set as high as the length tt[tt$table$adj.P.Val<0.05] ## get only below adj.P }) ## Code End Then I would look through the topTags list to categorize the contigs as stated above. E.g. from topTags.l[[1]] I would take only those not in topTags.l[[c(4, 5, 7]] to get category a) stated above, from topTags.l[[4]] only those not in topTagl.l[[7]] to get d1. This seems all a bit complicated to me, is this a correct way of doing this? I am alos a bit worried that decideTestsDGE seems to not work on DGELRT-objects with complicated coefficients. Eg. ## Code Start ## decideTestsDGE does not work decideTestsDGE.l <- lapply(glm.l, function (x){ subset(x$table, (decideTestsDGE(x, p.value = .05))!=0)}) ## Code End I saw that for simple coefficients the results of decideTestsDGE differ from topTags. Is this expected, what is the difference between the two, thought both do p-value adjustment the same way (I could provide code if these differenced would not be the expected behaviour)? These are my questions for now. I would be very greatful for help! All the Best, Emanuel sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.2.6 loaded via a namespace (and not attached): [1] limma_3.8.3 tools_2.13.0 -- Emanuel Heitlinger Karlsruhe Institute of Technology (KIT) Zoological Institute; Ecology/Parasitology Kornblumenstr. 13 76131 Karlsruhe Germany Telephone +49 (0)721-608 47654 or University of Edinburgh Institute of Evolutionary Biology Kings Buildings, Ashworth Laboratories, West Mains Road Edinburgh EH9 3JT Scotland, UK Telephone:+44 (0)131-650 7403

Category edgeR Category edgeR • 1.1k views

ADD COMMENT • link 12.5 years ago Emanuel Heitlinger ▴ 30