Question: meta analysis of microarrays of small sample sizes
1
2.9 years ago by
Lian Liu20
Tokyo
Lian Liu20 wrote:

Dear all,

I'm learning to perform meta-analysis of affymetrix microarrays.

I tested 4 studies with package geneMeta.

The sample sizes of the 4 studies are very small:

 Cases Controls Study 76- 10 (1 outlier) 20 Study 2a 5 5 Study 1- 16 3 (1 outlier) Study 4- 6 6

1) I performed RMA normalization for each study seperately, excluded the outliers and made 4 ExpressionSets.

2) Then I performed non-specific gene filtering with 'nsFilter' for each ExpressionSet and

got 4 filtered ExpressionSets.

3) Matched the identifiers of 4 filtered ExpressionSets with 'ENTREZID' using 'intersect' and the 4 ExpressionSets had same rows (ENTREID).

4) Performed meta-analysis with 'GeneMeta'.

My questions are:

(1) Should I preprocess the expression matrix such as centralizing or scaling the expression intensities after RMA before 'step 3)'. Or should I preprocess the expression matrix with other methods?

(2) The sample sizes of the above studies are very small. Is it right for me to use the GeneMeta package to perform meta-analysis? How should I deal with studies with small sample sizes?

(3) In my test analysis with GeneMeta package. In the FDR plot, the y axis of FDR curve of the combined set (meta-analyzed set) was higher than those of 3 individual studies (Study 2a, Study 1-, Study 4-) and was only lower than that of one study (Study 76-). Theoretically, I think the FDR of the combined set (meta-analyzed set) should be lower than individual studies. What should I do to improve the analysis?

(4) I'm not good at statistics and I think I must have missed some necessary and important steps in my analysis. Could you please teach me which work I should do in addition to the above steps or which of the above steps are wrong?

I want to learn the workflow of performing meta-analysis.

Sorry for my questions if they are too basic.

Thank you very much!

modified 2.9 years ago by alexvpickering110 • written 2.9 years ago by Lian Liu20
Answer: meta analysis of microarrays of small sample sizes
3
2.9 years ago by
alexvpickering110 wrote:

Hi Lian,

I don't think it makes sense to perform non-specific filtering for meta-analysis. If you do, genes filtered from as few as one study will be excluded from the meta-analysis because GeneMeta will only analyse genes included in all studies. This could result, for example, in the exclusion of genes that are consistently differentially expressed in all but one study. You could get around this using crossmeta, which uses the same meta-analysis approach as GeneMeta, but will analyse genes not measured in all studies. However, the results would be biased if you also perform non-specific filtering. This would result because the meta-analysis for each gene would disregard studies where that gene was below the threshold of variance for inclusion.

Filtering makes more sense for differential expression analysis of a single study: you generally do not care about genes that are not differentially expressed, so excluding genes with little variation across all samples will reduce the number of statistical tests performed and thereby increase power. In contrast, for meta-analysis, a gene having little variation in one study but consistent variation in the others should not be discarded as a true effect of the treatment on that gene is possible.

If you want to use non-specific filtering, I would suggest that you only exclude those genes that are filtered in all (or above a certain threshold number of) studies. Depending on how many genes this ends up being, the additional power might not be worth the effort.

Regarding your other questions, here are my opinions (I'm not a statistics expert).

(1) You could try surrogate variable analysis (using the sva package) in order to discover and model unaccounted sources of variation within each study. My personal experience with surrogate variable analysis is that is has a small benefit (here is a blog post where I performed some analysis of it across 125 studies). Anecdotally, I suspect sva provides larger benefits for studies with larger sample sizes. The package crossmeta incorporates surrogate variable analysis.

(2) I don't think you need to do anything special. Smaller samples sizes will reduce your power but shouldn't change how you perform your analysis (sample size is accounted for in statistical tests).

(3) The FDR can be lower after meta-analysis but not necessarily. Imagine a case where a gene is up-regulated in 60% of the studies and down-regulated in the other 40%. The meta-analysis might suggest up-regulation  of that gene (depending on the magnitude of the individual effect sizes) with a high FDR even if some/all of the results from the individual studies had lower FDRs.

Hi Alex,

Thank you very much for you reply. I'm so glad we can discuss here again.

I do think crossmeta is fantastic not only because of its automaticity but also it can perform meta-analysis for genes that are not detected in every the study. It is sepcially attractive that it can specify the fraction of studies in which a gene is measured for meta analysis. And also it resolves 'many-to-many' problems.

But I also want to filter out the lowly expressed genes. As is said that less than 40% of the genes are truly expressed in many tissues and more tha 60% are not robustly expressed. I am interested in the expression of genes in human heart tissue. So the studies I included only include samples from human heart. So I imagine that I can filter out at least 50% of genes that are not robustly expressed in the heart. Even though many genes are differentially expressed, it would not help guide the following function studies (of the genes), if they are expressed at very low levels. Becasue they are barely expressed in the heart. This is my main reason for performing non-specific gene filtering. How do you think of my this opinion?

Thank you so much.

1

Hi Lian,

If you want to use variance filtering as part of the work flow with crossmeta I would first try to exclude only those genes that are consistently filtered by genefilter across all your esets. To do this (example data as in vignette):

library(crossmeta)
library(lydata)
library(genefilter)
library(Biobase)

# Setup:
# ------

# gather all GSE names
gse_names  <- c("GSE9601", "GSE15069", "GSE50841", "GSE29689")

# location of/for raw data
data_dir <- system.file("extdata", package = "lydata")

# get_raw(gse_names, data_dir)

# Variance Filtering:
# -------------------

# gather symbols prior to filtering
syms <- lapply(esets, function(eset) unique(fData(eset)$SYMBOL)) # filtered esets fesets <- lapply(esets, function(eset) varFilter(eset)) # not filtered symbols nfsyms <- lapply(fesets, function(eset) unique(fData(eset)$SYMBOL))

# filtered symbols
fsyms  <- mapply(setdiff, syms, nfsyms)

# determine commonly filtered symbols
cfsyms  <- Reduce(intersect, fsyms)

# exclude commonly filtered symbols
esets <- mapply(function(eset, symbol) {
}, esets, syms)

# Analysis:
-----------

# differential expression analysis
anals <- diff_expr(esets, data_dir)

# meta analysis
es <- es_meta(anals)

Thank you Alex.

Although I don't understand your codes now, I will learn and try them, and let you know my results.

Many thanks to you!