About the best use of frma for several GEO Series
1
1
Entering edit mode
ctruntzer ▴ 20
@ctruntzer-12058
Last seen 14 days ago
France

Dear all,

I have a question about the use of the frma function.

I have to put several datasets from GEO series together in a same big analysis. However, for each of these datasets (that is each specific GSE serie) I have to select only some specific patients (selection of specific GSM samples).

I downloaded the .cel files for all of the selected GSM samples.

Now, I want to normalize all these samples and I'm asking myself about the best solution between the following ones:
-    Read all my .cel files whatever the dataset (that is whatever the GSE identifier) into a same Affybatch object and apply frma on this object
-    For each GSE serie, first apply frma on all .cel files and then select the normalized patients I need to analyze
-    For each GSE serie, first select the .cel for the patients I need to analyze and then apply frma only for these patients. I'm wondering if the 2 last solutions are equivalent.

I hope I'm clear enough. Thank you for your help.

Caroline

frma normalization preprocessing • 1.3k views
1
Entering edit mode

These should all be equivalent since fRMA is a single-sample normalization method, no? Unlike RMA, it doesn't matter what other samples are included in your normalization, each one is normalized against their large pool of arrays.  Except that your first would require a large amount of memory.

0
Entering edit mode

Best, Caroline

0
Entering edit mode

Is the use of fRMA a requirement for your study? If not, you might consider using our SCAN.UPC package for this. Nothing against fRMA, but with our package you can specify the GSE identifier directly when you invoke the SCAN() function. This will download and normalize the data in one step. The concept is somewhat similar to fRMA except that you don't need to an external reference set. Send me an email if you have any questions. https://bioconductor.org/packages/release/bioc/html/SCAN.UPC.html

0
Entering edit mode

Thank you Stephen for your answer. No, fRMA is not a requirement for my study. I will have a look to your library.

Best,

Caroline

1
Entering edit mode
@matthew-mccall-4459
Last seen 3.7 years ago
United States

Sorry for the slow response (I'm on vacation at the moment). If you're using the default frma, then Levi is correct that all 3 options are identical. If you set summarize="random_effect" then all 3 options would produce different results. For most situations, the default version of frma works well. I would additionally suggest that you consider using the barcode function (also in the frma package) to convert your data to binary (expressed / unexpressed) calls or z-scores. This is one way to reduce some of the batch effects that likely exist between the GSEs you're combining.

0
Entering edit mode

0
Entering edit mode

I have an additional question if you mind. When I look at the frma vignette, I read "Summarization refers to the method used to combine probe-level expression values to obtain gene-level expression estimate". However, when I apply frma on my data with the default option (summarize="robust_weighted_average") I still have several probe names corresponding to the same gene symbol. Did I miss something?

0
Entering edit mode

Summarization for Affy arrays is from probe to probeset, but as you state, there are often multiple probesets that map to the same gene symbol. James MacDonald's answer in the thread linked below gives a good overview of why there isn't a one-to-one mapping between probesets and gene symbols and suggests some approaches you could take:

Questions about gene identifiers and probesets regulation

0
Entering edit mode

Thank you for this additional explanation!