Dear all,
I'm learning to perform meta-analysis of affymetrix microarrays.
I tested 4 studies with package geneMeta.
The sample sizes of the 4 studies are very small:
Cases | Controls | |
Study 76- | 10 (1 outlier) | 20 |
Study 2a | 5 | 5 |
Study 1- | 16 | 3 (1 outlier) |
Study 4- | 6 | 6 |
1) I performed RMA normalization for each study seperately, excluded the outliers and made 4 ExpressionSets.
2) Then I performed non-specific gene filtering with 'nsFilter' for each ExpressionSet and
got 4 filtered ExpressionSets.
3) Matched the identifiers of 4 filtered ExpressionSets with 'ENTREZID' using 'intersect' and the 4 ExpressionSets had same rows (ENTREID).
4) Performed meta-analysis with 'GeneMeta'.
My questions are:
(1) Should I preprocess the expression matrix such as centralizing or scaling the expression intensities after RMA before 'step 3)'. Or should I preprocess the expression matrix with other methods?
(2) The sample sizes of the above studies are very small. Is it right for me to use the GeneMeta package to perform meta-analysis? How should I deal with studies with small sample sizes?
(3) In my test analysis with GeneMeta package. In the FDR plot, the y axis of FDR curve of the combined set (meta-analyzed set) was higher than those of 3 individual studies (Study 2a, Study 1-, Study 4-) and was only lower than that of one study (Study 76-). Theoretically, I think the FDR of the combined set (meta-analyzed set) should be lower than individual studies. What should I do to improve the analysis?
(4) I'm not good at statistics and I think I must have missed some necessary and important steps in my analysis. Could you please teach me which work I should do in addition to the above steps or which of the above steps are wrong?
I want to learn the workflow of performing meta-analysis.
Sorry for my questions if they are too basic.
Thank you very much!
Hi Alex,
Thank you very much for you reply. I'm so glad we can discuss here again.
I do think crossmeta is fantastic not only because of its automaticity but also it can perform meta-analysis for genes that are not detected in every the study. It is sepcially attractive that it can specify the fraction of studies in which a gene is measured for meta analysis. And also it resolves 'many-to-many' problems.
But I also want to filter out the lowly expressed genes. As is said that less than 40% of the genes are truly expressed in many tissues and more tha 60% are not robustly expressed. I am interested in the expression of genes in human heart tissue. So the studies I included only include samples from human heart. So I imagine that I can filter out at least 50% of genes that are not robustly expressed in the heart. Even though many genes are differentially expressed, it would not help guide the following function studies (of the genes), if they are expressed at very low levels. Becasue they are barely expressed in the heart. This is my main reason for performing non-specific gene filtering. How do you think of my this opinion?
Your answers have been very helpful and cleared my worries.
Thank you so much.
Hi Lian,
If you want to use variance filtering as part of the work flow with
crossmeta
I would first try to exclude only those genes that are consistently filtered bygenefilter
across all your esets. To do this (example data as in vignette):Thank you Alex.
Although I don't understand your codes now, I will learn and try them, and let you know my results.
Many thanks to you!