Query regarding the use of eBayes + topTable and treat + topTreat
1
1
Entering edit mode
Nikshay ▴ 10
@2dbde40d
Last seen 3.1 years ago
Australia

Hi all,

I'm currently re-analysing Agilent single-channel microarray data (3 paired pre/post-treatment human samples) using the limma package.

For differential expression (DE) analysis, I had previously used the eBayes method, followed by topTable to filter based on FDR<0.05 and then filtering based on FC>2 cut-off. (Method 1)

However, based on the forum posts and literature I have recently read, my understanding is that this method computes adjusted p-values independently of the FC cut-off whereas treat incorporates FC threshold in the hypothesis testing.

As a result, I am now using the treat method with fc=1.2, followed by topTreat to filter based on FDR < 0.05. (Method 2)

  • Which of these 2 methods is the recommended/preferred method for DE analysis of microarray data?
    • How is the estimated FC calculated based on the fc threshold parameter of treat? The literature mentions that "Setting fc=1.2 or fc=1.5 will usually cause most differentially expressed genes to have estimated fold-changes of 2-fold or greater, depending on the sample size and precision of the experiment." Is there a way to formally calculate this estimated FC?
    • Also, I am using trend=TRUE and robust=TRUE. Would this be appropriate to use by default?

Thank you very much,

Nik

Here is my code:

#Design matrix
Patient <- factor(targets$Patient)
Treatment <- factor(targets$Treatment)
Sex <- factor(targets$Sex)
Status <- factor(targets$Status)

design <- model.matrix(~0 + Treatment + Patient)

fit <- lmFit(yfilt, design)

#Remove duplicate probes for the same gene symbol
#Keep the one with the largest overall expression value
o <- order(fit$Amean, decreasing = TRUE)
fit <- fit[o, ]
d <- duplicated(fit$genes$Symbol)
fit <- fit[!d, ]

#Contrast matrix
contrast.matrix <- makeContrasts(TreatmentPost - TreatmentPre, levels = design)

fit2 <- contrasts.fit(fit, contrast.matrix)

Method 1:

fit2 <- eBayes(fit2, trend = TRUE, robust = TRUE)

genelist.T <- topTable(fit2, coef = 1, number = Inf, p.value = 0.05, adjust.method = "BH")

genelist.T.filt <- filter(genelist.T, (logFC < -log2(2) | logFC > log2(2)))

Method 2

fit2 <- treat(fit2, lfc = log2(1.2), trend = TRUE, robust = TRUE)

genelist.T.filt <- topTreat(fit2, coef = 1, number = Inf, p.value = 0.05, adjust.method = "BH")
Microarray limma DifferentialExpression • 2.9k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 32 minutes ago
WEHI, Melbourne, Australia

The appropriate use of treat is illustrated and explained in the example workflow articles:

There is also a case study with Agilent microarray data in the limma User's Guide.

We strongly recommend against the use of FC cutoffs so we definitely do not recommmed your Method 1. I understand that FC cutoffs are common in the published biomedical literature, but they are unnecessary and poor practice in the limma context.

Method 2 is strongly recommended over Method 1. We recommend that you either use topTable without a FC cutoff or use topTreat. Unlike ordinary t-tests, limma always prioritizes large fold changes over small fold changes, whether you use treat or not, so the use of naive FC cutoffs is unnecessary and actually harmfull.

As the help page for topTable says

"Although topTable enables users to set p-value and lfc cutoffs simultaneously, this is not generally recommended. If the fold changes and p-values are not highly correlated, then the use of a fold change cutoff can increase the false discovery rate above the nominal level. Users wanting to use fold change thresholding are usually recommended to use treat and topTreat instead."

How is the estimated FC calculated based on the fc threshold parameter of treat? ... Is there a way to formally calculate this estimated FC?

Treat is not equivalent to a FC cutoff. It is not possible to estimate the equivalent FC cutoff because there is none. On the other hand, you will find that the all the DE genes after using treat will have an estimated fold-change higher than the nominated theshold, and quite a bit higher if the sample sizes are small or the data is noisy.

Also, I am using trend=TRUE and robust=TRUE. Would this be appropriate to use by default?

We do not recommend trend=TRUE in conjunction with voom and RNA-seq data because it is superfluous in that case. For Agilent microarray data, these two options are sufficiently robust that you could use them routinely.

ADD COMMENT
0
Entering edit mode

Thank you very much for your help Gordon!

ADD REPLY

Login before adding your answer.

Traffic: 406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6