Question

Pipeline for an Anova-like analysis in edgeR

0

Entering edit mode

Antonio M. • 0

@antonio-m-13072

Last seen 10 months ago

Spain

Dear all.

I am trying to get the set of differentially expressed genes between several breast cancer subtypes. To this end, I am using an ANOVA-like test from the edgeR package. I have several questions about it. The code looks like that:


# The conditions as a factor
head(groups)
[1] Basal-like Basal-like Basal-like Basal-like Basal-like Basal-like
Levels: Basal-like HER2-enriched Luminal A Luminal B

dge <- DGEList(tcga, group = groups) # tcga is a data.frame containing gene names as rownames and sample labels as colnames.

keep <- filterByExpr(dge)
dge <- dge[keep, , keep.lib.sizes = FALSE]

1) By setting keep.lib.sizes to FALSE in dge <- dge[keep, , keep.lib.sizes = FALSE] the library sizes are recalculated, right?

dge.n <- calcNormFactors(dge) # Calc norm factors
design <- model.matrix( ~ groups)
colnames(design) <- c("basal", "her2", "luma", "lumb") # change colnames of design matrix
head(design)
  basal her2 luma lumb
1     1    0    0    0
2     1    0    0    0
3     1    0    0    0
4     1    0    0    0
5     1    0    0    0
6     1    0    0    0

dge.c <- estimateCommonDisp(dge.n) # Common disp
dge.t <- estimateTagwiseDisp(dge.c) # Tagwise disp

fit <- glmQLFit(dge.t, design, robust = TRUE)
qlf <- glmQLFTest(fit, coef = 2:4)

out <- topTags(qlf, n = "Inf")$table
out <- out[out$FDR <= 0.05,]

2) It is correct to give dge.t to glmQLFit() or it is not necessary for this analysis?

3) If I want to filter the genes in out by an additional logFC threshold, do I have to use glmTreat()? In that case, how should I use the function to retrieve all the genes that meet a certain logFC threshold between at least two conditions (for example, 0.5849625 for fold change of 1.5) and are differentially expressed?

Thanks in advance.

edgeR anova rna-seq • 1.4k views

ADD COMMENT • link 3.0 years ago Antonio M. • 0

score 2 · Accepted Answer · 2021-04-20

2

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

The edgeR QL workflow illustrates a full pipeline:

https://www.bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

Yes, see ?subsetting.
You will get exactly the same results whether you input dge.c or dge.t to glmQLFit. We do recommend however inputing trended dispersion to glmQLFit. Is there some reason why you have avoided that?
Yes, we recommend glmTreat. glmTreat is not the same as simply applying a logFC cutoff, which is something we strongly discourage, despite it being so popular in the literature.

ADD COMMENT • link 3.0 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, thank you so much for your response.

I misunderstood a part of the manual and thought that the tagwise dispersion was recommended.

I have made a series of changes to use the trended dispersion and finally, get the set of all genes differentially expressed between the conditions with a fold change greater than 1.5. The code for this purpose is shown below:

design <- model.matrix(~0+groups)
colnames(design) <- c("basal", "her2", "luma", "lumb")
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge.n <- calcNormFactors(dge)
dge.disp <- estimateDisp(dge.n, design = design, robust = TRUE)
fit <- glmQLFit(dge.disp, design, robust = TRUE)


basher2 <- makeContrasts(basal-her2, levels=design)
basluma <- makeContrasts(basal-luma, levels=design)
baslumb <- makeContrasts(basal-lumb, levels=design)
her2luma <- makeContrasts(her2-luma, levels=design)
her2lumb <- makeContrasts(her2-lumb, levels=design)
lumalumb <- makeContrasts(luma-lumb, levels=design)

trbasher2 <- glmTreat(fit, contrast = basher2, lfc = log2(1.5))
trbasluma <- glmTreat(fit, contrast = basluma, lfc = log2(1.5))
trbaslumb <- glmTreat(fit, contrast = baslumb, lfc = log2(1.5))
trher2luma <- glmTreat(fit, contrast = her2luma, lfc = log2(1.5))
trher2lumb <- glmTreat(fit, contrast = her2lumb, lfc = log2(1.5))
trlumalumb <- glmTreat(fit, contrast = lumalumb, lfc = log2(1.5))

totalcontrasts <- list(trbasher2, trbasluma, trbaslumb, trher2luma, trher2lumb, trlumalumb)
genes <- c() # Vector with all genes differentially expressed
for(c in totalcontrasts){
    out <- topTags(c, n ="Inf")$table
    out <- out[which(out$FDR <= 0.05),]
    genes <- c(genes, rownames(out))
    genes <- unique(genes)
}

With this, the final set of genes differentially expressed at a fold change of 1.5 is given by each contrast. Is there any way to do this with coef while using a subtype as the intercept? Again, thanks for your help.

ADD REPLY • link 3.0 years ago Antonio M. • 0

1

Entering edit mode

There is no way to combine fold-change thresholding with an F-test. Fold-changes can only be associated with individual t-tests. So if you want to use fold-changes, there is no alternative but make pairwise comparisons, as you have done.