One replicate (one-color OR two-color) is enough to use Limma?
2
0
Entering edit mode
kevin.m.hao ▴ 10
@kevinmhao-12183
Last seen 5.0 years ago

Hello,

I am testing what condition Limma can not be used and do some simulation (as below). I find that no matter one-color or two-color microarray, if only have one replicate, Limma can be used to get the result.

But if no any replicate, then Limma refuses analysis in the step : fit2 <- eBayes(fit2)

My question is:

If in all arrays, just one replicate, may I use Limma to get reasonal result for Differentially Expressed Genes (DEG) even for several group, since Limma does can run?

If one replicate is not enough, how many replicates are needed to get reasonal result?

Thanks,

Kevin

##################################

#  one-color simulation

##################################

library(limma)

# good Targets

# simulate Targets frame for one-color

gr <- c("c", "t1", "t3", "t2", "c") ################ just "c" is replicate

# gr <- c("c", "t1", "t2") # wrong, no any replicate

Targets <- data.frame(group = gr)
rownames(Targets) <- paste0("a", 1:length(gr))
Targets

# simulate expression matrix according to Targets frame
expMat <- matrix(log2(runif(500 * length(gr))), nrow = 500, ncol = length(gr))
colnames(expMat) <- paste0("a", 1:length(gr))
expMat[1:3, ]

# extract groups
gf <- factor(Targets\$group)
gf

# design matrix
design <- model.matrix(~ 0 + gf)
design
colnames(design) <- levels(gf)
design

# first linear model
fit <- lmFit(expMat, design)

# make contrast matrix
treat <- setdiff(levels(gf), "c")
ct <- paste0(treat, "-", "c")
cm <- makeContrasts(contrasts = ct, levels = design)
cm

# fit contrast matrix
fit2 <- contrasts.fit(fit, cm)
fit2

##################################
# it gives wrong info if any
##################################
# final model
fit2 <- eBayes(fit2)
##################################

# extract Differentially Expressed Genes (DEG)
for (i in 1:length(ct)) {
print(topTable(fit2, coef = i))
}

#######################################

# two-color simulation

#######################################

library(limma)

# good Targets

# simulate Targets frame for two-color

# just c("c", "t1") and c("t1", c) is one replicate with dye-swap

cy3cy5 <- rbind(
c("c", "t1"),
c("c", "t2"),
c("t1", "c"),
c("t3", "c")
)

# wrong, no any replicate

#  cy3cy5 <- rbind(
#   c("c", "t1"),
#   c("c", "t2"),
#   c("t3", "c"),
#   c("t4", "c")
# )

colnames(cy3cy5) <- c("cy3", "cy5")
numArray <- nrow(cy3cy5)
Targets <- data.frame(Cy3 = cy3cy5[, "cy3"], Cy5 = cy3cy5[, "cy5"])
rownames(Targets) <- paste0("a", 1:numArray)
Targets

# simulate expression matrix according to Targets frame
expMat <- matrix(log2(runif(500 * numArray)), nrow = 500, ncol = numArray)
colnames(expMat) <- paste0("a", 1:numArray)
expMat[1:3, ]

# design matrix
design <- modelMatrix(Targets, ref = "c")
design

# first linear model
fit <- lmFit(expMat, design)

# make contrast matrix
treat <- setdiff(as.vector(cy3cy5), "c")
ct <- paste0(treat)
cm <- makeContrasts(contrasts = ct, levels = design)
cm

# fit contrast matrix
fit2 <- contrasts.fit(fit, cm)
fit2

##################################
# it gives wrong info if any
##################################
# final model
fit2 <- eBayes(fit2)
##################################

# extract Differentially Expressed Genes (DEG)
for (i in 1:length(ct)) {
print(topTable(fit2, coef = i))
}

limma microarray • 1.3k views
3
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 12 hours ago
The city by the bay

In other words:

If I only have replicates in one group, will limma still work?

The answer is yes. limma will estimate the variance from the group with multiple replicates and use that estimate for all comparisons between groups. This will control the type I error rate assuming that the true variance is the same for all groups. In this respect, the results are "reasonable" as the error rate is properly controlled. However, comparisons between single-sample groups will be underpowered so you may not get many DE genes, i.e., the type II error rate will be quite high. This issue is inherent to the experimental design, so there's not much that can be done by limma to overcome it. If you fail to detect a DE gene, and that seems "unreasonable" to you, then the only general solution is to get more samples in the relevant groups.

2
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

Dear Kevin,

I have to correct one of the statements you made: limma does not "give wrong information" if there are no replicates. What it actually does is to refuse to estimate variances or to compute p-values when there are no replicates. In that case it simply returns log-fold-changes. In other words, it refuses to give wrong information.

The situation is pretty straightforward. limma is designed to make the best use of whatever data you give it. It will compute reasonable, defensible, p-values for any replicated experiment. It will work with any amount of replication at all. However it can't do the impossible -- it can't estimate variances if there is no replication whatsoever.

What limma doesn't do is to tell you how you should have designed your experiment. Obviously having more replicates is always better, both in terms of statistical power and in terms of confidence, but how many that needs to be depends on your aims and on the type of data you are analysing. limma just does the best it can with what you give it. Planning your experiment however is up to you.

A single replicate will not be viewed as adequate for many or most studies, but it's still better than no replicates at all. And you were yourself analysing a no-replicate experiment just a day ago.

You might like to read the discussion here:

See my responses to the reviewers.

0
Entering edit mode

Thanks Gordon. I modified the statements "...Limma refuses analysis in the step" according to you. Yes. I am trying to learning other people's work. They did analysis like this: If microarray is two-color: they identified DEG just using 2-fold change, no matter if there exists replicates. If microarray is one-color: if there are at least two replicates in each group, then they used SAM (significance analysis of microarray), otherwise, they just used 2-fold change to obtain DEG.

I would like to know: if it is better to using the combined 2-fold change and p-values than just 2-fold change when existing replicates, even just one replicate?

Thanks,

Kevin

1
Entering edit mode

I almost always use p-value, not combined with fold change. Please read the warning about combining fold change and p-value on the topTable help page: ?topTable.

Regarding the "other people's work", I suspect that their decisions relate more to the limitations of SAM (it can't analyse two colour arrays unless all common reference and it needs a minimum number of arrays) rather than what would one ideally want to do.

0
Entering edit mode