Question

meta analysis of microarrays of small sample sizes

1

Entering edit mode

Lian Liu ▴ 20

@lian-liu-11570

Last seen 8.4 years ago

Tokyo

Dear all,

I'm learning to perform meta-analysis of affymetrix microarrays.

I tested 4 studies with package geneMeta.

The sample sizes of the 4 studies are very small:

	Cases	Controls
Study 76-	10 (1 outlier)	20
Study 2a	5	5
Study 1-	16	3 (1 outlier)
Study 4-	6	6

1) I performed RMA normalization for each study seperately, excluded the outliers and made 4 ExpressionSets.

2) Then I performed non-specific gene filtering with 'nsFilter' for each ExpressionSet and

got 4 filtered ExpressionSets.

3) Matched the identifiers of 4 filtered ExpressionSets with 'ENTREZID' using 'intersect' and the 4 ExpressionSets had same rows (ENTREID).

4) Performed meta-analysis with 'GeneMeta'.

My questions are:

(1) Should I preprocess the expression matrix such as centralizing or scaling the expression intensities after RMA before 'step 3)'. Or should I preprocess the expression matrix with other methods?

(2) The sample sizes of the above studies are very small. Is it right for me to use the GeneMeta package to perform meta-analysis? How should I deal with studies with small sample sizes?

(3) In my test analysis with GeneMeta package. In the FDR plot, the y axis of FDR curve of the combined set (meta-analyzed set) was higher than those of 3 individual studies (Study 2a, Study 1-, Study 4-) and was only lower than that of one study (Study 76-). Theoretically, I think the FDR of the combined set (meta-analyzed set) should be lower than individual studies. What should I do to improve the analysis?

(4) I'm not good at statistics and I think I must have missed some necessary and important steps in my analysis. Could you please teach me which work I should do in addition to the above steps or which of the above steps are wrong?

I want to learn the workflow of performing meta-analysis.

Sorry for my questions if they are too basic.

Thank you very much!

preprocessing microarray genemeta • 2.0k views

ADD COMMENT • link updated 8.4 years ago by alexvpickering ▴ 110 • written 8.4 years ago by Lian Liu ▴ 20

score 3 · Answer 1 · 2016-10-16

Hi Lian,

I don't think it makes sense to perform non-specific filtering for meta-analysis. If you do, genes filtered from as few as one study will be excluded from the meta-analysis because GeneMeta will only analyse genes included in all studies. This could result, for example, in the exclusion of genes that are consistently differentially expressed in all but one study. You could get around this using crossmeta, which uses the same meta-analysis approach as GeneMeta, but will analyse genes not measured in all studies. However, the results would be biased if you also perform non-specific filtering. This would result because the meta-analysis for each gene would disregard studies where that gene was below the threshold of variance for inclusion.

Filtering makes more sense for differential expression analysis of a single study: you generally do not care about genes that are not differentially expressed, so excluding genes with little variation across all samples will reduce the number of statistical tests performed and thereby increase power. In contrast, for meta-analysis, a gene having little variation in one study but consistent variation in the others should not be discarded as a true effect of the treatment on that gene is possible.

If you want to use non-specific filtering, I would suggest that you only exclude those genes that are filtered in all (or above a certain threshold number of) studies. Depending on how many genes this ends up being, the additional power might not be worth the effort.

Regarding your other questions, here are my opinions (I'm not a statistics expert).

(1) You could try surrogate variable analysis (using the sva package) in order to discover and model unaccounted sources of variation within each study. My personal experience with surrogate variable analysis is that is has a small benefit (here is a blog post where I performed some analysis of it across 125 studies). Anecdotally, I suspect sva provides larger benefits for studies with larger sample sizes. The package crossmeta incorporates surrogate variable analysis.

(2) I don't think you need to do anything special. Smaller samples sizes will reduce your power but shouldn't change how you perform your analysis (sample size is accounted for in statistical tests).

(3) The FDR can be lower after meta-analysis but not necessarily. Imagine a case where a gene is up-regulated in 60% of the studies and down-regulated in the other 40%. The meta-analysis might suggest up-regulation of that gene (depending on the magnitude of the individual effect sizes) with a high FDR even if some/all of the results from the individual studies had lower FDRs.