Question: About the best use of frma for several GEO Series
1
gravatar for ctruntzer
2.9 years ago by
ctruntzer10
ctruntzer10 wrote:

Dear all,

I have a question about the use of the frma function.

I have to put several datasets from GEO series together in a same big analysis. However, for each of these datasets (that is each specific GSE serie) I have to select only some specific patients (selection of specific GSM samples).

I downloaded the .cel files for all of the selected GSM samples.

Now, I want to normalize all these samples and I'm asking myself about the best solution between the following ones:
-    Read all my .cel files whatever the dataset (that is whatever the GSE identifier) into a same Affybatch object and apply frma on this object
-    For each GSE serie, first apply frma on all .cel files and then select the normalized patients I need to analyze
-    For each GSE serie, first select the .cel for the patients I need to analyze and then apply frma only for these patients. I'm wondering if the 2 last solutions are equivalent.

I hope I'm clear enough. Thank you for your help.

Caroline

ADD COMMENTlink modified 2.9 years ago by Matthew McCall830 • written 2.9 years ago by ctruntzer10
1

These should all be equivalent since fRMA is a single-sample normalization method, no? Unlike RMA, it doesn't matter what other samples are included in your normalization, each one is normalized against their large pool of arrays.  Except that your first would require a large amount of memory. 

ADD REPLYlink written 2.9 years ago by Levi Waldron950

Thank you for your clear answer.

Best, Caroline

ADD REPLYlink written 2.9 years ago by ctruntzer10

Is the use of fRMA a requirement for your study? If not, you might consider using our SCAN.UPC package for this. Nothing against fRMA, but with our package you can specify the GSE identifier directly when you invoke the SCAN() function. This will download and normalize the data in one step. The concept is somewhat similar to fRMA except that you don't need to an external reference set. Send me an email if you have any questions. https://bioconductor.org/packages/release/bioc/html/SCAN.UPC.html

ADD REPLYlink written 2.9 years ago by Stephen Piccolo560

Thank you Stephen for your answer. No, fRMA is not a requirement for my study. I will have a look to your library.

Best,

Caroline

ADD REPLYlink written 2.9 years ago by ctruntzer10
Answer: About the best use of frma for several GEO Series
1
gravatar for Matthew McCall
2.9 years ago by
United States
Matthew McCall830 wrote:

Sorry for the slow response (I'm on vacation at the moment). If you're using the default frma, then Levi is correct that all 3 options are identical. If you set summarize="random_effect" then all 3 options would produce different results. For most situations, the default version of frma works well. I would additionally suggest that you consider using the barcode function (also in the frma package) to convert your data to binary (expressed / unexpressed) calls or z-scores. This is one way to reduce some of the batch effects that likely exist between the GSEs you're combining.  

ADD COMMENTlink written 2.9 years ago by Matthew McCall830

Thank you for your helpful answer.

ADD REPLYlink written 2.9 years ago by ctruntzer10

I have an additional question if you mind. When I look at the frma vignette, I read "Summarization refers to the method used to combine probe-level expression values to obtain gene-level expression estimate". However, when I apply frma on my data with the default option (summarize="robust_weighted_average") I still have several probe names corresponding to the same gene symbol. Did I miss something?

ADD REPLYlink written 2.9 years ago by ctruntzer10

Summarization for Affy arrays is from probe to probeset, but as you state, there are often multiple probesets that map to the same gene symbol. James MacDonald's answer in the thread linked below gives a good overview of why there isn't a one-to-one mapping between probesets and gene symbols and suggests some approaches you could take:

Questions about gene identifiers and probesets regulation

 

ADD REPLYlink written 2.9 years ago by Matthew McCall830

Thank you for this additional explanation!

ADD REPLYlink written 2.9 years ago by ctruntzer10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 280 users visited in the last hour