Question: GEOmetadb query to retrieve sample groups
0
5.8 years ago by
Thomas Hampton740 wrote:
The following getGEO query retrieves data files and meta data for a recent GEO submission of mine, one that has been curated: GDS4252 <- getGEO("GDS4252") Columns(GDS4252) > str(Columns(GDS4252)) 'data.frame': 16 obs. of 4 variables: $sample : Factor w/ 16 levels "GSM754979","GSM754980",..: 5 6 7 8 1 2 3 4 13 14 ...$ genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1 1 1 1 1 2 2 ... $agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2 2 2 2 1 1 ... The folks at NCBI have correctly created two factors with two levels to describe the 16 samples in my experiment. I am interested in retrieving similar information using GEOmetadb, but this has proved problematic. getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz") con <- dbConnect(SQLite(), "GEOmetadb.sqlite") dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'") > dat [1] ID gds title [4] description type pubmed_id [7] gpl platform_organism platform_technology_type [10] feature_count sample_organism sample_type [13] channel_count sample_count value_type [16] gse order update_date <0 rows> (or 0-length row.names) It seems, for starters, that this GDS identifier for my particular submission isn't accounted for in the current database. Others are, so it looks like my syntax and so forth is ok: > dat <- dbGetQuery(con, "select gds from gds limit 10") > dat gds 1 GDS5 2 GDS6 3 GDS10 4 GDS12 5 GDS15 6 GDS16 7 GDS17 8 GDS18 9 GDS19 10 GDS20 There is also the question of where a set of fields (variable in number) describing sample factors and their levels would actually "live" in the SQLite database. This information does not seem to be an attribute of the GDS in any case: > dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gds'") > dat FieldName 1 ID 2 channel_count 3 description 4 feature_count 5 gds 6 order 7 platform 8 platform_organism 9 platform_technology_type 10 pubmed_id 11 reference_series 12 sample_count 13 sample_organism 14 sample_type 15 title 16 type 17 update_date 18 value_type Nor does it seem to be a feature stored in the samples: > dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gsm'") > dat FieldName 1 ID 2 channel_count 3 characteristics_ch1 4 characteristics_ch2 5 contact 6 data_processing 7 data_row_count 8 description 9 extract_protocol_ch1 10 extract_protocol_ch2 11 gpl 12 gse 13 gsm 14 hyb_protocol 15 label_ch1 16 label_ch2 17 label_protocol_ch1 18 label_protocol_ch2 19 last_update_date 20 molecule_ch1 21 molecule_ch2 22 organism_ch1 23 organism_ch2 24 source_name_ch1 25 source_name_ch2 26 status 27 submission_date 28 supplementary_file 29 title 30 treatment_protocol_ch1 31 treatment_protocol_ch2 32 type Any advice greatly appreciated. Tom [[alternative HTML version deleted]] geometadb • 679 views ADD COMMENTlink modified 5.8 years ago by Sean Davis21k • written 5.8 years ago by Thomas Hampton740 Answer: GEOmetadb query to retrieve sample groups 0 5.8 years ago by Sean Davis21k United States Sean Davis21k wrote: Hi, Tom. Sorry to take so long to get back to you. See below. On Thu, Jun 6, 2013 at 11:15 AM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > The following getGEO query retrieves data files and meta data for a recent GEO submission of mine, > one that has been curated: > > GDS4252 <- getGEO("GDS4252") > Columns(GDS4252) >> str(Columns(GDS4252)) > 'data.frame': 16 obs. of 4 variables: >$ sample : Factor w/ 16 levels "GSM754979","GSM754980",..: 5 6 7 8 1 2 3 4 13 14 ... > $genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1 1 1 1 1 2 2 ... >$ agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2 2 2 2 1 1 ... > > The folks at NCBI have correctly created two factors with two levels to describe the 16 samples in my experiment. > > I am interested in retrieving similar information using GEOmetadb, but this has proved problematic. > > getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz") > > con <- dbConnect(SQLite(), "GEOmetadb.sqlite") > dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'") > >> dat > [1] ID gds title > [4] description type pubmed_id > [7] gpl platform_organism platform_technology_type > [10] feature_count sample_organism sample_type > [13] channel_count sample_count value_type > [16] gse order update_date > <0 rows> (or 0-length row.names) > > It seems, for starters, that this GDS identifier for my particular submission isn't accounted for in the current > database. > > Others are, so it looks like my syntax and so forth is ok: > >> dat <- dbGetQuery(con, "select gds from gds limit 10") >> dat > gds > 1 GDS5 > 2 GDS6 > 3 GDS10 > 4 GDS12 > 5 GDS15 > 6 GDS16 > 7 GDS17 > 8 GDS18 > 9 GDS19 > 10 GDS20 > > > There is also the question of where a set of fields (variable in number) describing sample factors and their levels would actually "live" > in the SQLite database. It does appear that our update script has a bug; GDS4252 is not present, so we'll check on that. > This information does not seem to be an attribute of the GDS in any case: You'll want to check out the gds_subset table for details of the GDS groups. >> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gds'") >> dat > FieldName > 1 ID > 2 channel_count > 3 description > 4 feature_count > 5 gds > 6 order > 7 platform > 8 platform_organism > 9 platform_technology_type > 10 pubmed_id > 11 reference_series > 12 sample_count > 13 sample_organism > 14 sample_type > 15 title > 16 type > 17 update_date > 18 value_type > > Nor does it seem to be a feature stored in the samples: > >> dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gsm'") >> dat > FieldName > 1 ID > 2 channel_count > 3 characteristics_ch1 > 4 characteristics_ch2 > 5 contact > 6 data_processing > 7 data_row_count > 8 description > 9 extract_protocol_ch1 > 10 extract_protocol_ch2 > 11 gpl > 12 gse > 13 gsm > 14 hyb_protocol > 15 label_ch1 > 16 label_ch2 > 17 label_protocol_ch1 > 18 label_protocol_ch2 > 19 last_update_date > 20 molecule_ch1 > 21 molecule_ch2 > 22 organism_ch1 > 23 organism_ch2 > 24 source_name_ch1 > 25 source_name_ch2 > 26 status > 27 submission_date > 28 supplementary_file > 29 title > 30 treatment_protocol_ch1 > 31 treatment_protocol_ch2 > 32 type > > > Any advice greatly appreciated.
0
5.8 years ago by
Thomas Hampton740 wrote:
The following getGEO query retrieves data files and meta data for a recent GEO submission of mine, one that has been curated: GDS4252 <- getGEO("GDS4252") Columns(GDS4252) > str(Columns(GDS4252)) 'data.frame': 16 obs. of 4 variables: $sample : Factor w/ 16 levels "GSM754979","GSM754980",..: 5 6 7 8 1 2 3 4 13 14 ...$ genotype/variation: Factor w/ 2 levels "CFTR mutant",..: 1 1 1 1 1 1 1 1 2 2 ... \$ agent : Factor w/ 2 levels "PA01","unexposed": 1 1 1 1 2 2 2 2 1 1 ... The folks at NCBI have correctly created two factors with two levels to describe the 16 samples in my experiment. I am interested in retrieving similar information using GEOmetadb, but this has proved problematic. getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz") con <- dbConnect(SQLite(), "GEOmetadb.sqlite") dat <- dbGetQuery(con, "select * from gds where gds = 'GDS4252'") > dat [1] ID gds title [4] description type pubmed_id [7] gpl platform_organism platform_technology_type [10] feature_count sample_organism sample_type [13] channel_count sample_count value_type [16] gse order update_date <0 rows> (or 0-length row.names) It seems, for starters, that this GDS identifier for my particular submission isn't accounted for in the current database. Others are, so it looks like my syntax and so forth is ok: > dat <- dbGetQuery(con, "select gds from gds limit 10") > dat gds 1 GDS5 2 GDS6 3 GDS10 4 GDS12 5 GDS15 6 GDS16 7 GDS17 8 GDS18 9 GDS19 10 GDS20 There is also the question of where a set of fields (variable in number) describing sample factors and their levels would actually "live" in the SQLite database. This information does not seem to be an attribute of the GDS in any case: > dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gds'") > dat FieldName 1 ID 2 channel_count 3 description 4 feature_count 5 gds 6 order 7 platform 8 platform_organism 9 platform_technology_type 10 pubmed_id 11 reference_series 12 sample_count 13 sample_organism 14 sample_type 15 title 16 type 17 update_date 18 value_type Nor does it seem to be a feature stored in the samples: > dat <- dbGetQuery(con, "select fieldname from geodb_column_desc where TableName = 'gsm'") > dat FieldName 1 ID 2 channel_count 3 characteristics_ch1 4 characteristics_ch2 5 contact 6 data_processing 7 data_row_count 8 description 9 extract_protocol_ch1 10 extract_protocol_ch2 11 gpl 12 gse 13 gsm 14 hyb_protocol 15 label_ch1 16 label_ch2 17 label_protocol_ch1 18 label_protocol_ch2 19 last_update_date 20 molecule_ch1 21 molecule_ch2 22 organism_ch1 23 organism_ch2 24 source_name_ch1 25 source_name_ch2 26 status 27 submission_date 28 supplementary_file 29 title 30 treatment_protocol_ch1 31 treatment_protocol_ch2 32 type Any advice greatly appreciated. Tom [[alternative HTML version deleted]]