Dear community,
As I’m currently studying the biological phenomenon of irradiated cells, I have downloaded from GEO repository 9 human microarray datasets, in order to perform “some kind” of meta-analysis. In detail, I would like to identify genuine differentially expressed genes, and subsequently to conduct functional enrichment analysis. My main problem-issue, is because I’m a newbie in R and statistical analysis, i wonder how I should proceed with the analysis of my datasets. Specifically, 7 of the 9 datasets are Agilent (6 are of the same platform- Agilent-014850 Whole Human Genome Microarray 4x44K G4112F, and one Agilent- Agilent-026652 Whole Human Genome Microarray 4x44K v2), whereas both of the 2 Illumina datasets, comprise of the platform llumina HumanWG-6 v3.0 expression beadchip.
Thus, one first naive thought was to perform some kind of cross-normalization between the datasets of each platform, and to perform two separate analysis and then compare my results (for instance in the final DE lists). However, except from the obvious problem that could arise from the specific effects of each data-set, also the experiment design increases more the complexity: in other words, although there actually 3 conditions in each dataset(control, bystander & irradiated cells), some time points are different or don’t exist in some datasets. So, how could I proceed in a “safe way” with my actual analysis? I should analyze separately each dataset, export my gene lists with my differentially expressed genes(i.e. gene symbols) and then somehow identify common genes between common comparisons? Moreover, is there a package that after exporting from each dataset the statistics to perform meta-analysis?
Any suggestion or feedback on this matter would be very helpful !!!
Best,
Konstantinos
No, that isn't what I meant at all, which is why I gave you a link to the GeneMeta vignette; so you could read about the package yourself, including how one would use the package to do the analysis.
Dear James,
please excuse me for returning again with more questions, but I had a first look on the vignette, and I would like to ask you some further explanations to be certain of some appropriate implementations. In detail, you have mentioned above that “The basic idea is to process all the different platforms separately, then subset to consistent reporter molecules”—thus, according to my specific experimental design, as the 4h comparison is common in 5 agilent datasets, I could normalize each dataset separately, and then use geneMeta with the common probesets ? Like the first example in the vignette, in which one dataset is splited into 2 subsequent datasets? My one major concern is that from these 5 agilent, only 4 are of the same platform, which hampers a bit the situation, thus:
2. Alternatively, could I perform statistical inference for each of the 5 Agilent datasets separately—for the 4h--, extract the DE lists at the final annotation level(gene symbol), and see if I can find any common differentially expressed genes ??
Best,
Konstantinos