GEOquery on rawdata and processed data ?
2
0
Entering edit mode
Alex Tsoi ▴ 260
@alex-tsoi-2154
Last seen 9.7 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070703/ 6abf6359/attachment.pl
• 1.7k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Hi, Alex. The typical process would be to use getGEO to get a GSE or GSEMatrix file and parse it into R. The data in these files are taken directly from submitters to GEO and so could be processed by RMA, MAS5, or any of several other methods. One will often need to refer to the protocol information in GEO or to the associated paper to determine the exact methods. As Saroj pointed out, in many cases, there is a link in the GSE file or online on the summary page to supplementary files. This link will, for Affy, usually contain at least .CEL files. One can then use the getGEO function to get the processed data and annotation, then get the raw .CEL files and process them however necessary, and replace the values that come from GEO with the ones derived locally. Sean
ADD COMMENT
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070703/ 0719cf4f/attachment.pl
ADD REPLY
0
Entering edit mode
Alex Tsoi wrote: > Thanks all of you for the information. > > However, as I mentioned in my previous emails, some GEO data (eg. > GSM72287) has both the .CEL file and .EXP file, and I looked up their > paper: http://www.ncbi.nlm.nih.gov/sites/entrez > and the authors mentioned that they did put the processed data as .CEL > and the raw as .EXP. The .CEL files are, by definition, raw files. If a manuscript says otherwise, then I think you should probably contact the author to clarify the situation. > I understand that I could first download the supplementary files > manually from the GEO website, then input them as R object. But > unfortunately, I am doing meta-analysis on cancer microarrays, so I > would have to download 20 + datasets manually for getting the raw data > . So I just wonder, in case the raw data is available in the GEO, is > there any way I could parse that directly to R ?(since some of those > have both processed and raw, but once parsed using the getGEO, only > the processed is shown) The link for the supplementary files is embedded in the GSE header information, if available. You can certainly use R to download those files and uncompress them. You will still need to make some decisions about how you would like to treat these raw data after they are downloaded. Since you are setting up to do a meta-analysis, presumably you have thought a good deal about how to go about processing the raw data and analyzing the results across datasets. Sean
ADD REPLY
0
Entering edit mode
Alex Tsoi ▴ 260
@alex-tsoi-2154
Last seen 9.7 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070703/ 9a07cfb1/attachment.pl
ADD COMMENT
0
Entering edit mode
There are links to the .CEL files (I guess this would be "raw" files) at GEO. E.g., GSM72287 is part of the series GSE3218. At the bottom of the page (below) there is a link under 'Supplementary files'. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3218 HTH Saroj Alex Tsoi wrote: >I figure out that those are the RMA-processed data, so my question should be >how could I get the rawdata ? > > >On 7/3/07, Alex Tsoi <tsoi.teen at="" gmail.com=""> wrote: > > >>Dear all, >> >>I use the function getGEO from GEOquery to retrieve different cancer data >>sets from GEO to do a meta-analysis. >> >>However, I am not quite sure if the data I downloaded has already been >>processed (eg. RMA, or MAS) or not, is it true that all the >>.CEL might be processed while all the .EXP files are raw ? >> >>Also, if I assign as: >> >> >> >>>rawdata <- getGEO(GSM72287) >>> >>> >>"rawdata" has the data table with column names ID_REF and VALUE: >> >>but are those processed or raw data values ? >> >>My main goal is to get the raw data values from each sample so I could do >>a meta analysis by applying my own processing >>methods. >> >>Below is showing the rawdata. >> >>Greatly appreciate for any help. >> >> >> >>An object of class "GSM" >>channel_count >>[1] "1" >>characteristics_ch1 >>[1] "mixed GCT (Embryonal Carcinoma, Seminoma)" >>contact_address >>[1] "1275 York Ave" >>contact_city >>[1] "New York" >>contact_country >>[1] "USA" >>contact_department >>[1] "Cell Biology" >>contact_email >>[1] " korkolaj at mskcc.org" >>contact_institute >>[1] "Memorial Sloan-Kettering" >>contact_laboratory >>[1] "Chaganti" >>contact_name >>[1] "James,,Korkola" >>contact_phone >>[1] "212-639-8281" >>contact_state >>[1] "NY" >>contact_zip/postal_code >>[1] "10021" >>data_processing >>[1] "RMA (robust multi-array)" >>data_row_count >>[1] "22645" >>description >>[1] "Adult Male Germ Cell Tumor" >>extract_protocol_ch1 >>[1] "Frozen tissue from a germ cell tumor was minced and homogenized in >>RLT buffer (Qiagen).Total RNA was extracted from the tissue lysate using an >>RNeasy kit (Qiagen)." >>geo_accession >>[1] "GSM72287" >>hyb_protocol >>[1] "standard Affymetrix procedures" >>label_ch1 >>[1] "biotin" >>label_protocol_ch1 >>[1] "Approximately 12 ug of total RNA was processed to produce >>biotinylated cRNA targets." >>last_update_date >>[1] "Oct 12 2005" >>molecule_ch1 >>[1] "total RNA" >>organism_ch1 >>[1] "Homo sapiens" >>platform_id >>[1] "GPL97" >>scan_protocol >>[1] "standard Affymetrix procedures" >>series_id >>[1] "GSE3218" >>source_name_ch1 >>[1] "germ cell tumor" >>status >>[1] "Public on Nov 10 2005" >>submission_date >>[1] "Aug 29 2005" >>supplementary_file >>[1] "file:///samples/GSM72287/GSM72287.CEL.gz" >>[2] "file:///samples/GSM72287/GSM72287.EXP.gz" >>title >>[1] "germ cell tumors (GCT) and normal controls 052B 1" >>type >>[1] "RNA" >>An object of class "GEODataTable" >>****** Column Descriptions ****** >> Column Description >>1 ID_REF \t >>2 VALUE RMA-calculated Signal intensity >>****** Data Table ****** >> ID_REF VALUE >>1 200000_s_at 9.913362 >>2 200001_at 9.822533 >>3 200002_at 11.318111 >>4 200003_s_at 12.280321 >>5 200004_at 11.068576 >>22640 more rows ... >> >> >> >>-- >>Lam C. Tsoi (Alex) >>Medical University of South Carolina >> >> > > > > > >
ADD REPLY
0
Entering edit mode
CEL files contain the probe-level data, so by definition they contain 'raw' data (no background correction, normalization or summarization). So CEL files never contain processed data... Cheers, Jenny At 02:39 PM 7/3/2007, Saroj Mohapatra wrote: >There are links to the .CEL files (I guess this would be "raw" files) at GEO. > >E.g., GSM72287 is part of the series GSE3218. At the bottom of the >page (below) there is a link under 'Supplementary files'. > >http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3218 > >HTH > >Saroj > > >Alex Tsoi wrote: > >>I figure out that those are the RMA-processed data, so my question should be >>how could I get the rawdata ? >> >> >>On 7/3/07, Alex Tsoi <tsoi.teen at="" gmail.com=""> wrote: >> >> >>>Dear all, >>> >>>I use the function getGEO from GEOquery to retrieve different cancer data >>>sets from GEO to do a meta-analysis. >>> >>>However, I am not quite sure if the data I downloaded has already been >>>processed (eg. RMA, or MAS) or not, is it true that all the >>>.CEL might be processed while all the .EXP files are raw ? >>> >>>Also, if I assign as: >>> >>> >>> >>>>rawdata <- getGEO(GSM72287) >>>> >>>"rawdata" has the data table with column names ID_REF and VALUE: >>> >>>but are those processed or raw data values ? >>> >>>My main goal is to get the raw data values from each sample so I could do >>>a meta analysis by applying my own processing >>>methods. >>> >>>Below is showing the rawdata. >>> >>>Greatly appreciate for any help. >>> >>> >>> >>>An object of class "GSM" >>>channel_count >>>[1] "1" >>>characteristics_ch1 >>>[1] "mixed GCT (Embryonal Carcinoma, Seminoma)" >>>contact_address >>>[1] "1275 York Ave" >>>contact_city >>>[1] "New York" >>>contact_country >>>[1] "USA" >>>contact_department >>>[1] "Cell Biology" >>>contact_email >>>[1] " korkolaj at mskcc.org" >>>contact_institute >>>[1] "Memorial Sloan-Kettering" >>>contact_laboratory >>>[1] "Chaganti" >>>contact_name >>>[1] "James,,Korkola" >>>contact_phone >>>[1] "212-639-8281" >>>contact_state >>>[1] "NY" >>>contact_zip/postal_code >>>[1] "10021" >>>data_processing >>>[1] "RMA (robust multi-array)" >>>data_row_count >>>[1] "22645" >>>description >>>[1] "Adult Male Germ Cell Tumor" >>>extract_protocol_ch1 >>>[1] "Frozen tissue from a germ cell tumor was minced and homogenized in >>>RLT buffer (Qiagen).Total RNA was extracted from the tissue lysate using an >>>RNeasy kit (Qiagen)." >>>geo_accession >>>[1] "GSM72287" >>>hyb_protocol >>>[1] "standard Affymetrix procedures" >>>label_ch1 >>>[1] "biotin" >>>label_protocol_ch1 >>>[1] "Approximately 12 ug of total RNA was processed to produce >>>biotinylated cRNA targets." >>>last_update_date >>>[1] "Oct 12 2005" >>>molecule_ch1 >>>[1] "total RNA" >>>organism_ch1 >>>[1] "Homo sapiens" >>>platform_id >>>[1] "GPL97" >>>scan_protocol >>>[1] "standard Affymetrix procedures" >>>series_id >>>[1] "GSE3218" >>>source_name_ch1 >>>[1] "germ cell tumor" >>>status >>>[1] "Public on Nov 10 2005" >>>submission_date >>>[1] "Aug 29 2005" >>>supplementary_file >>>[1] "file:///samples/GSM72287/GSM72287.CEL.gz" >>>[2] "file:///samples/GSM72287/GSM72287.EXP.gz" >>>title >>>[1] "germ cell tumors (GCT) and normal controls 052B 1" >>>type >>>[1] "RNA" >>>An object of class "GEODataTable" >>>****** Column Descriptions ****** >>> Column Description >>>1 ID_REF \t >>>2 VALUE RMA-calculated Signal intensity >>>****** Data Table ****** >>> ID_REF VALUE >>>1 200000_s_at 9.913362 >>>2 200001_at 9.822533 >>>3 200002_at 11.318111 >>>4 200003_s_at 12.280321 >>>5 200004_at 11.068576 >>>22640 more rows ... >>> >>> >>> >>>-- >>>Lam C. Tsoi (Alex) >>>Medical University of South Carolina >>> >> >> >> >> >> > > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu
ADD REPLY

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6