I am trying to get the set of differentially expressed genes between several breast cancer subtypes. To this end, I am using an ANOVA-like test from the edgeR package. I have several questions about it. The code looks like that:
# The conditions as a factor head(groups)  Basal-like Basal-like Basal-like Basal-like Basal-like Basal-like Levels: Basal-like HER2-enriched Luminal A Luminal B dge <- DGEList(tcga, group = groups) # tcga is a data.frame containing gene names as rownames and sample labels as colnames. keep <- filterByExpr(dge) dge <- dge[keep, , keep.lib.sizes = FALSE]
1) By setting keep.lib.sizes to FALSE in dge <- dge[keep, , keep.lib.sizes = FALSE] the library sizes are recalculated, right?
dge.n <- calcNormFactors(dge) # Calc norm factors design <- model.matrix( ~ groups) colnames(design) <- c("basal", "her2", "luma", "lumb") # change colnames of design matrix head(design) basal her2 luma lumb 1 1 0 0 0 2 1 0 0 0 3 1 0 0 0 4 1 0 0 0 5 1 0 0 0 6 1 0 0 0 dge.c <- estimateCommonDisp(dge.n) # Common disp dge.t <- estimateTagwiseDisp(dge.c) # Tagwise disp fit <- glmQLFit(dge.t, design, robust = TRUE) qlf <- glmQLFTest(fit, coef = 2:4) out <- topTags(qlf, n = "Inf")$table out <- out[out$FDR <= 0.05,]
2) It is correct to give dge.t to glmQLFit() or it is not necessary for this analysis?
3) If I want to filter the genes in out by an additional logFC threshold, do I have to use glmTreat()? In that case, how should I use the function to retrieve all the genes that meet a certain logFC threshold between at least two conditions (for example, 0.5849625 for fold change of 1.5) and are differentially expressed?
Thanks in advance.