Question: Normalization of array data from GEO repository
0
gravatar for Aleš Maver
10.4 years ago by
Aleš Maver80
Aleš Maver80 wrote:
Hi all, I have obtained several GEO Series (GSE) entries from GEO repository using getGEO function (GEOquery package). Data obtained in this manner is stored in ExpressionSet class. The problem is I don't know how to perform quality control analyses and normalization procedures on ExpressionSet data, because functions like expresso (affy package) work only on AffyBatch classes. Is there anything I am missing? And- does anyone know whether data in GEO repository is already normalised or not? Thank you for any replies! Ales Maver Ales.Maver@gmail.com [[alternative HTML version deleted]]
• 2.1k views
ADD COMMENTlink modified 10.4 years ago by Steve Lianoglou12k • written 10.4 years ago by Aleš Maver80
Answer: Normalization of array data from GEO repository
0
gravatar for Steve Lianoglou
10.4 years ago by
Denali
Steve Lianoglou12k wrote:
Hi, On Jul 7, 2009, at 5:38 AM, Ale? Maver wrote: > Hi all, > I have obtained several GEO Series (GSE) entries from GEO repository > using > getGEO function (GEOquery package). > Data obtained in this manner is stored in ExpressionSet class. The > problem > is I don't know how to perform quality control analyses and > normalization > procedures on ExpressionSet data, because functions like expresso > (affy > package) work only on AffyBatch classes. Is there anything I am > missing? Sorry, I've never used the GEOquery package before, so I can't speak much to that, but I'd be surprised if there isn't an option to return your results as an AffyBatch object, because I'd dare say that you can get most of the data from geo in its raw format (eg, CEL file or whatever). > And- does anyone know whether data in GEO repository is already > normalised > or not? It depends, sometimes you aren't given the raw files: sometimes the data is from a custom array, or I've also seen some datasets provided in the post-processed form (already MAS5 normalized, for example), but it's been my experience that you can get the raw data for most of the experiments you find there. Also, for array quality assessment, look into the arrayQualityMetrics package: http://www.bioconductor.org/packages/release/bioc/html/arrayQualityMet rics.html Hope that helps, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENTlink written 10.4 years ago by Steve Lianoglou12k
Hello, just a small addendum: you may also want to have a look at the ArrayExpress package which allows the user to retrieve data sets from the ArrayExpress database at EBI and returns the data in form of an AffyBatch, NChannelSet, RGList or the like. Since GEO and ArrayExpress are regularly synchronized, you may be able to find your data sets of interest there as well. Regards, Joern On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote > Hi, > > On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Ale? Maver wrote: > > > Hi all, > > I have obtained several GEO Series (GSE) entries from GEO repository > > using > > getGEO function (GEOquery package). > > Data obtained in this manner is stored in ExpressionSet class. The > > problem > > is I don't know how to perform quality control analyses and > > normalization > > procedures on ExpressionSet data, because functions like expresso > > (affy > > package) work only on AffyBatch classes. Is there anything I am > > missing? > > Sorry, I've never used the GEOquery package before, so I can't speak > much to that, but I'd be surprised if there isn't an option to > return your results as an AffyBatch object, because I'd dare say > that you can get most of the data from geo in its raw format (eg, > CEL file or whatever). > > > And- does anyone know whether data in GEO repository is already > > normalised > > or not? > > It depends, sometimes you aren't given the raw files: sometimes the > data is from a custom array, or I've also seen some datasets > provided in the post-processed form (already MAS5 normalized, for > example), but it's been my experience that you can get the raw data > for most of the experiments you find there. > > Also, for array quality assessment, look into the > arrayQualityMetrics package: > > http://www.bioconductor.org/packages/release/bioc/html/arrayQualityM etrics.html > > Hope that helps, > -steve
ADD REPLYlink written 10.4 years ago by Joern Toedling450
Great! thank you for all the info and useful advice regarding arrayQualityMetrics and ArrayExpress! Regards, Ales 2009/7/8 Joern Toedling <joern.toedling at="" curie.fr=""> > > Hello, > > just a small addendum: you may also want to have a look at the ArrayExpress > package which allows the user to retrieve data sets from the ArrayExpress > database at EBI and returns the data in form of an AffyBatch, NChannelSet, > RGList or the like. Since GEO and ArrayExpress are regularly synchronized, you > may be able to find your data sets of interest there as well. > > Regards, > Joern > > > On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote > > Hi, > > > > On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Ale? Maver wrote: > > > > > Hi all, > > > I have obtained several GEO Series (GSE) entries from GEO repository > > > using > > > getGEO function (GEOquery package). > > > Data obtained in this manner is stored in ExpressionSet class. The > > > problem > > > is I don't know how to perform quality control analyses and > > > normalization > > > procedures on ExpressionSet data, because functions like expresso > > > (affy > > > package) work only on AffyBatch classes. Is there anything I am > > > missing? > > > > Sorry, I've never used the GEOquery package before, so I can't speak > > ?much to that, but I'd be surprised if there isn't an option to > > return ?your results as an AffyBatch object, because I'd dare say > > that you can ?get most of the data from geo in its raw format (eg, > > CEL file or ?whatever). > > > > > And- does anyone know whether data in GEO repository is already > > > normalised > > > or not? > > > > It depends, sometimes you aren't given the raw files: sometimes the > > data is from a custom array, or I've also seen some datasets > > provided ?in the post-processed form (already MAS5 normalized, for > > example), but ?it's been my experience that you can get the raw data > > for most of the ?experiments you find there. > > > > Also, for array quality assessment, look into the > > arrayQualityMetrics ?package: > > > > http://www.bioconductor.org/packages/release/bioc/html/arrayQualit yMetrics.html > > > > Hope that helps, > > -steve > -- Ale? Maver Ales.Maver at gmail.com
ADD REPLYlink written 10.4 years ago by Aleš Maver80
On Wed, Jul 8, 2009 at 6:16 AM, Joern Toedling <joern.toedling@curie.fr>wrote: > Hello, > > just a small addendum: you may also want to have a look at the ArrayExpress > package which allows the user to retrieve data sets from the ArrayExpress > database at EBI and returns the data in form of an AffyBatch, NChannelSet, > RGList or the like. Since GEO and ArrayExpress are regularly synchronized, > you > may be able to find your data sets of interest there as well. > Actually, ArrayExpress and GEO are NOT synchronized. There are some overlaps where investigators have submitted to both and for other reasons, but GEO is still the larger of the two and they each contain largely non-overlapping data sets. > > Regards, > Joern > > > On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote > > Hi, > > > > On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Aleš Maver wrote: > > > > > Hi all, > > > I have obtained several GEO Series (GSE) entries from GEO repository > > > using > > > getGEO function (GEOquery package). > > > Data obtained in this manner is stored in ExpressionSet class. The > > > problem > > > is I don't know how to perform quality control analyses and > > > normalization > > > procedures on ExpressionSet data, because functions like expresso > > > (affy > > > package) work only on AffyBatch classes. Is there anything I am > > > missing? > > > > Sorry, I've never used the GEOquery package before, so I can't speak > > much to that, but I'd be surprised if there isn't an option to > > return your results as an AffyBatch object, because I'd dare say > > that you can get most of the data from geo in its raw format (eg, > > CEL file or whatever). > > > > > And- does anyone know whether data in GEO repository is already > > > normalised > > > or not? > > > > It depends, sometimes you aren't given the raw files: sometimes the > > data is from a custom array, or I've also seen some datasets > > provided in the post-processed form (already MAS5 normalized, for > > example), but it's been my experience that you can get the raw data > > for most of the experiments you find there. > > > > Also, for array quality assessment, look into the > > arrayQualityMetrics package: > > > > > http://www.bioconductor.org/packages/release/bioc/html/arrayQualityM etrics.html > > > > Hope that helps, > > -steve > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 10.4 years ago by Sean Davis21k
Hi, care: this is my understanding and I might be quite wrong. There is indeed no synchronization between the two databases for lack of a common standard (each have their own flavour of MAGE-ML). In addition to investigators submitting to both repositories, ArrayExpress also imports experiments from GEO according to certain criteria. These are prefixed by 'E-GEOD' in the experiment ID. Querying ArrayExpress for these returns 5155 such experiments out of a total of 8372. GEO contains 12810 Series (experiments), so GEO does contain more data I would say. HTH, James. Sean Davis wrote: > On Wed, Jul 8, 2009 at 6:16 AM, Joern Toedling <joern.toedling at="" curie.fr="">wrote: > >> Hello, >> >> just a small addendum: you may also want to have a look at the ArrayExpress >> package which allows the user to retrieve data sets from the ArrayExpress >> database at EBI and returns the data in form of an AffyBatch, NChannelSet, >> RGList or the like. Since GEO and ArrayExpress are regularly synchronized, >> you >> may be able to find your data sets of interest there as well. >> > > Actually, ArrayExpress and GEO are NOT synchronized. There are some > overlaps where investigators have submitted to both and for other reasons, > but GEO is still the larger of the two and they each contain largely > non-overlapping data sets. > > >> Regards, >> Joern >> >> >> On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote >>> Hi, >>> >>> On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Ale?? Maver wrote: >>> >>>> Hi all, >>>> I have obtained several GEO Series (GSE) entries from GEO repository >>>> using >>>> getGEO function (GEOquery package). >>>> Data obtained in this manner is stored in ExpressionSet class. The >>>> problem >>>> is I don't know how to perform quality control analyses and >>>> normalization >>>> procedures on ExpressionSet data, because functions like expresso >>>> (affy >>>> package) work only on AffyBatch classes. Is there anything I am >>>> missing? >>> Sorry, I've never used the GEOquery package before, so I can't speak >>> much to that, but I'd be surprised if there isn't an option to >>> return your results as an AffyBatch object, because I'd dare say >>> that you can get most of the data from geo in its raw format (eg, >>> CEL file or whatever). >>> >>>> And- does anyone know whether data in GEO repository is already >>>> normalised >>>> or not? >>> It depends, sometimes you aren't given the raw files: sometimes the >>> data is from a custom array, or I've also seen some datasets >>> provided in the post-processed form (already MAS5 normalized, for >>> example), but it's been my experience that you can get the raw data >>> for most of the experiments you find there. >>> >>> Also, for array quality assessment, look into the >>> arrayQualityMetrics package: >>> >>> >> http://www.bioconductor.org/packages/release/bioc/html/arrayQuality Metrics.html >>> Hope that helps, >>> -steve >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 10.4 years ago by James F. Reid610
Hi, have a look to the AE FAQ: http://www.ebi.ac.uk/microarray/doc/help/faq.html#submitter_FAQ_genera l *How much over-lap is there between ArrayExpress and the Gene Expression Omnibus (GEO)?* We import data on a weekly basis from GEO (NCBI). As a priority all GEO experiments which are in GEO datasets on catalogue Affymetrix and Agilent platforms are imported and we re-curate these before loading into ArrayExpress. We also import all GSE on these platforms and these are loaded uncurated if they pass our quality checks (e.g. no corrupt data files). All experiments imported from GEO have accession numbers in the format of E-GEOD-n, where n is a number. For more information see the http://www.ebi.ac.uk/microarray/doc/help/GEO_data.html I had a more detailed look at the "HG-U133A" chip type. There I found an overlap of more than 90%. Especially all the new experiments are available in AE, too. Using R and Bioconductor for analyses, I recognized that the file format in AE is more suitable. Best Markus James F. Reid schrieb: > Hi, > > care: this is my understanding and I might be quite wrong. > > There is indeed no synchronization between the two databases for lack > of a common standard (each have their own flavour of MAGE-ML). > In addition to investigators submitting to both repositories, > ArrayExpress also imports experiments from GEO according to certain > criteria. These are prefixed by 'E-GEOD' in the experiment ID. > Querying ArrayExpress for these returns 5155 such experiments out of a > total of 8372. GEO contains 12810 Series (experiments), so GEO does > contain more data I would say. > > HTH, > James. > > > Sean Davis wrote: >> On Wed, Jul 8, 2009 at 6:16 AM, Joern Toedling >> <joern.toedling at="" curie.fr="">wrote: >> >>> Hello, >>> >>> just a small addendum: you may also want to have a look at the >>> ArrayExpress >>> package which allows the user to retrieve data sets from the >>> ArrayExpress >>> database at EBI and returns the data in form of an AffyBatch, >>> NChannelSet, >>> RGList or the like. Since GEO and ArrayExpress are regularly >>> synchronized, >>> you >>> may be able to find your data sets of interest there as well. >>> >> >> Actually, ArrayExpress and GEO are NOT synchronized. There are some >> overlaps where investigators have submitted to both and for other >> reasons, >> but GEO is still the larger of the two and they each contain largely >> non-overlapping data sets. >> >> >>> Regards, >>> Joern >>> >>> >>> On Tue, 7 Jul 2009 13:59:19 -0400, Steve Lianoglou wrote >>>> Hi, >>>> >>>> On Jul 7, 2009, at 5:38 AM, [WINDOWS-1252?]Ale?? Maver wrote: >>>> >>>>> Hi all, >>>>> I have obtained several GEO Series (GSE) entries from GEO repository >>>>> using >>>>> getGEO function (GEOquery package). >>>>> Data obtained in this manner is stored in ExpressionSet class. The >>>>> problem >>>>> is I don't know how to perform quality control analyses and >>>>> normalization >>>>> procedures on ExpressionSet data, because functions like expresso >>>>> (affy >>>>> package) work only on AffyBatch classes. Is there anything I am >>>>> missing? >>>> Sorry, I've never used the GEOquery package before, so I can't speak >>>> much to that, but I'd be surprised if there isn't an option to >>>> return your results as an AffyBatch object, because I'd dare say >>>> that you can get most of the data from geo in its raw format (eg, >>>> CEL file or whatever). >>>> >>>>> And- does anyone know whether data in GEO repository is already >>>>> normalised >>>>> or not? >>>> It depends, sometimes you aren't given the raw files: sometimes the >>>> data is from a custom array, or I've also seen some datasets >>>> provided in the post-processed form (already MAS5 normalized, for >>>> example), but it's been my experience that you can get the raw data >>>> for most of the experiments you find there. >>>> >>>> Also, for array quality assessment, look into the >>>> arrayQualityMetrics package: >>>> >>>> >>> http://www.bioconductor.org/packages/release/bioc/html/arrayQualit yMetrics.html >>> >>>> Hope that helps, >>>> -steve >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> >> >> ------------------------------------------------------------------- ----- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dipl.-Tech. Math. Markus Schmidberger Ludwig-Maximilians-Universit?t M?nchen IBE - Institut f?r medizinische Informationsverarbeitung, Biometrie und Epidemiologie Marchioninistr. 15, D-81377 Muenchen URL: http://www.ibe.med.uni-muenchen.de Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de Tel: +49 (089) 7095 - 4497
ADD REPLYlink written 10.4 years ago by Markus Schmidberger380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 333 users visited in the last hour