how to get number of samples from GSE with GEOquery
2
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Would someone please tell me if there is a way to get the number of samples, GSMs associated with a GSE, from doing a GEOquery that returns GSEs? Specifically, I have this query: sql <- paste("SELECT DISTINCT gse.gse, gse.title, gse.overall_design, gse.repeats, gse.repeats_sample_list, gse.variable, gse.variable_description, gse.supplementary_file, gse.summary", "FROM", " gsm JOIN gse_gsm ON gsm.gsm=gse_gsm.gsm", " JOIN gse ON gse_gsm.gse=gse.gse", " JOIN gse_gpl ON gse_gpl.gse=gse.gse", " JOIN gpl ON gse_gpl.gpl=gpl.gpl", "WHERE", #" gsm.molecule_ch1 like '%total RNA%' AND", " gse.title LIKE '%colorectal cancer%' AND", " gpl.organism LIKE '%Homo sapiens%'", sep = " ") So I am getting all the fields I can (I think) for the GSE entries, but I don't see how to get the number of samples for each GSE. I see from the GEOquery documentation that I could ue getGEO(), but I don't want to get all the samples, I just want to know their names and how many. If I use the ncbi website (http://www.ncbi.nlm.nih.gov/sites/entrez) and do a GEO DataSets query for a particular GSE, then the number of samples and their names shows up in the browser. Any help or ideas will be greatly appreciated. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer
GEOquery GEOquery • 1.8k views
ADD COMMENT
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Please ignore my last request. The solution was pretty straighforward, after a peek at the entity-relationship diagram. sql <- paste("SELECT DISTINCT gse.gse, gse.title, gse_gsm.gsm, gsm.title", "FROM", " gsm JOIN gse_gsm ON gsm.gsm=gse_gsm.gsm", " JOIN gse ON gse_gsm.gse=gse.gse", " JOIN gse_gpl ON gse_gpl.gse=gse.gse", " JOIN gpl ON gse_gpl.gpl=gpl.gpl", "WHERE", #" gsm.molecule_ch1 like '%total RNA%' AND", " gse.title LIKE '%colorectal cancer%' AND", " gpl.organism LIKE '%Homo sapiens%'", sep = " ") Thanks, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Wed, 16 Dec 2009, Dick Beyer wrote: > Would someone please tell me if there is a way to get the number of samples, > GSMs associated with a GSE, from doing a GEOquery that returns GSEs? > Specifically, I have this query: > > sql <- paste("SELECT DISTINCT gse.gse, gse.title, gse.overall_design, > gse.repeats, gse.repeats_sample_list, gse.variable, gse.variable_description, > gse.supplementary_file, gse.summary", > "FROM", " gsm JOIN gse_gsm ON gsm.gsm=gse_gsm.gsm", > " JOIN gse ON gse_gsm.gse=gse.gse", " JOIN gse_gpl ON gse_gpl.gse=gse.gse", > " JOIN gpl ON gse_gpl.gpl=gpl.gpl", "WHERE", > #" gsm.molecule_ch1 like '%total RNA%' AND", > " gse.title LIKE '%colorectal cancer%' AND", > " gpl.organism LIKE '%Homo sapiens%'", > sep = " ") > > So I am getting all the fields I can (I think) for the GSE entries, but I don't > see how to get the number of samples for each GSE. I see from the GEOquery > documentation that I could ue getGEO(), but I don't want to get all the > samples, I just want to know their names and how many. > > If I use the ncbi website (http://www.ncbi.nlm.nih.gov/sites/entrez) and do a > GEO DataSets query for a particular GSE, then the number of samples and their > names shows up in the browser. > > Any help or ideas will be greatly appreciated. > > Thanks very much, > Dick > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > ******************************************************************** *********** > > > >
ADD COMMENT
0
Entering edit mode
Jack Zhu ▴ 170
@jack-zhu-3338
Last seen 6.4 years ago
Hi Richard, Thanks for your suggestion. More examples are good, even for myself. I will try to add some in the vignette (depending on my time frame). For the examples you mentioned, I tried a little bit like this: ########### ## 'hgu133plus2' is in the 'bioc_package' field of table 'gpl' > dbListFields(con, "gpl") [1] "ID" "title" "gpl" [4] "status" "submission_date" "last_update_date" [7] "technology" "distribution" "organism" [10] "manufacturer" "manufacture_protocol" "coating" [13] "catalog_number" "support" "description" [16] "web_link" "contact" "data_row_count" [19] "supplementary_file" "bioc_package" > sqliteQuickSQL(con,"SELECT DISTINCT bioc_package FROM gpl") ... 12 hgu133plus2 ... > gpl_hgu133plus2 <- sqliteQuickSQL(con,"SELECT DISTINCT gpl from gpl where bioc_package ='hgu133plus2'") gpl 1 GPL570 ## convert to gse gse_conversion1 <- geoConvert(gpl_hgu133plus2[[1]], 'gse') gse_hgu133plus2 <- unique(gse_conversion1$gse$to_acc) ## It seems that the best field to find all 'cell lines' is the 'characteristics_ch1' field of the table 'gsm': gsm_cell_line <- sqliteQuickSQL(con,"SELECT DISTINCT gsm FROM gsm WHERE characteristics_ch1 LIKE '%cell%' AND characteristics_ch1 LIKE '%line%'") ## Convert to GSE gse_conversion2 <- geoConvert(gsm_cell_line[[1]], 'gse') gse_cell_line <- unique(gse_conversion2$gse$to_acc) ## It seems that the best field in GSE to find all 'colon cancer' or 'colorectal cancer' is the 'summary' filed of the table 'gse': gse_colon <- sqliteQuickSQL(con,"SELECT DISTINCT gse from gse where summary like '%colon cancer%'") gse_colon <- gse_colon$gse ## 'all the colon cancer GSE objects that are primary cell lines that use hgu133plus2 arrays.' ## intersection all three gse vectors: gse_hgu133plus2, gse_cell_line and gse_colon ################ Hope this helps. Please let me know if you have any questions or suggestions. Thanks. Jack On Thu, Dec 17, 2009 at 11:18 AM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: > Hi Sean and Jack, > > That's good to know. ?My only immediate suggestion follows my questions. ?It is always nice to have lots of worked out examples. > > I'm using GEOmetadb to find candidate GSE datasets I can then do further processing on. ?If I can set up my sql queries correctly, I should be able to do that. ?But I need to get the right fields from the GSE objects, etc. ?Maybe your examples could be cast in that question/answer sort of way. ?Such as, how do you find all the colon cancer GSE objects that are primary cell lines, or how do you find all the colorectal cancer GSE objects that use hgu133plus2 arrays. > > Thanks again, > Dick > > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 ? ? Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 ? ? 4225 Roosevelt Way NE, # 100 > ? ? ? ? ? ? ? ? ? ? ? ?Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > ******************************************************************** *********** > > On Thu, 17 Dec 2009, Sean Davis wrote: > >> On Thu, Dec 17, 2009 at 10:42 AM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: >>> Hi Sean, >>> >>> Well, I'm totally grateful to you for your work on this. ?As usual, >>> Bioconductor is making my life easier and more fun! >> >> Thanks. ?The work is all Jack's, though. ?If you have any comments or >> suggestions on the software, let us know. >> >> Sean >> > > >
ADD COMMENT

Login before adding your answer.

Traffic: 582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6