Dear all,
I have a question about the use of the frma function.
I have to put several datasets from GEO series together in a same big analysis. However, for each of these datasets (that is each specific GSE serie) I have to select only some specific patients (selection of specific GSM samples).
I downloaded the .cel files for all of the selected GSM samples.
Now, I want to normalize all these samples and I'm asking myself about the best solution between the following ones:
- Read all my .cel files whatever the dataset (that is whatever the GSE identifier) into a same Affybatch object and apply frma on this object
- For each GSE serie, first apply frma on all .cel files and then select the normalized patients I need to analyze
- For each GSE serie, first select the .cel for the patients I need to analyze and then apply frma only for these patients. I'm wondering if the 2 last solutions are equivalent.
I hope I'm clear enough. Thank you for your help.
Caroline
These should all be equivalent since fRMA is a single-sample normalization method, no? Unlike RMA, it doesn't matter what other samples are included in your normalization, each one is normalized against their large pool of arrays. Except that your first would require a large amount of memory.
Thank you for your clear answer.
Best, Caroline
Is the use of fRMA a requirement for your study? If not, you might consider using our SCAN.UPC package for this. Nothing against fRMA, but with our package you can specify the GSE identifier directly when you invoke the SCAN() function. This will download and normalize the data in one step. The concept is somewhat similar to fRMA except that you don't need to an external reference set. Send me an email if you have any questions. https://bioconductor.org/packages/release/bioc/html/SCAN.UPC.html
Thank you Stephen for your answer. No, fRMA is not a requirement for my study. I will have a look to your library.
Best,
Caroline