GEOquery and Sample Subsets
1
0
Entering edit mode
@thomas-hampton-2820
Last seen 9.6 years ago
I am using to GEOquery to establish sample subsets of GEO data -- that is, I would like to know which samples are replicates. I am doing it something like this: gds505 <- getGEO("GDS505") Columns(gds505) > str(Columns(gds505)) 'data.frame': 17 obs. of 4 variables: $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... The problem I have is that the getGEO command retrieves a rather large object: > print(object.size(gds505), units="Mb") 12.6 Mb' This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. Is there a way to retrieve less? I am happy to use R, BioConductor, bioperl or whatever. Best, Tom [[alternative HTML version deleted]]
GEOquery GEOquery • 1.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > I am using to GEOquery to establish sample subsets of GEO data -- that is, I would > like to know which samples are replicates. > > I am doing it something like this: > > gds505 <- getGEO("GDS505") > Columns(gds505) > >> str(Columns(gds505)) > 'data.frame': 17 obs. of 4 variables: > $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... > $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... > $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... > $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... > > The problem I have is that the getGEO command retrieves a rather large object: > >> print(object.size(gds505), units="Mb") > 12.6 Mb' > > This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. > > Is there a way to retrieve less? Hi, Tom. Are you saying that you really want just the metadata to start; in other words, you just want the sample information without the expression values? Sean > I am happy to use R, BioConductor, bioperl or whatever. > > Best, > > Tom > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Exactly! Thanks. ________________________________________ From: seandavi@gmail.com [seandavi@gmail.com] on behalf of Sean Davis [sdavis2@mail.nih.gov] Sent: Tuesday, June 04, 2013 12:54 PM To: Thomas H. Hampton Cc: bioconductor at r-project.org Subject: Re: [BioC] GEOquery and Sample Subsets On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > I am using to GEOquery to establish sample subsets of GEO data -- that is, I would > like to know which samples are replicates. > > I am doing it something like this: > > gds505 <- getGEO("GDS505") > Columns(gds505) > >> str(Columns(gds505)) > 'data.frame': 17 obs. of 4 variables: > $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... > $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... > $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... > $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... > > The problem I have is that the getGEO command retrieves a rather large object: > >> print(object.size(gds505), units="Mb") > 12.6 Mb' > > This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. > > Is there a way to retrieve less? Hi, Tom. Are you saying that you really want just the metadata to start; in other words, you just want the sample information without the expression values? Sean > I am happy to use R, BioConductor, bioperl or whatever. > > Best, > > Tom > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Tue, Jun 4, 2013 at 1:14 PM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > Exactly! This might help: http://www.bioconductor.org/packages/release/bioc/html/GEOmetadb.html Let us know if you have questions. Sean > Thanks. > > ________________________________________ > From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov] > Sent: Tuesday, June 04, 2013 12:54 PM > To: Thomas H. Hampton > Cc: bioconductor at r-project.org > Subject: Re: [BioC] GEOquery and Sample Subsets > > On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton > <thomas.h.hampton at="" dartmouth.edu=""> wrote: >> I am using to GEOquery to establish sample subsets of GEO data -- that is, I would >> like to know which samples are replicates. >> >> I am doing it something like this: >> >> gds505 <- getGEO("GDS505") >> Columns(gds505) >> >>> str(Columns(gds505)) >> 'data.frame': 17 obs. of 4 variables: >> $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... >> $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... >> $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... >> $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... >> >> The problem I have is that the getGEO command retrieves a rather large object: >> >>> print(object.size(gds505), units="Mb") >> 12.6 Mb' >> >> This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. >> >> Is there a way to retrieve less? > > Hi, Tom. Are you saying that you really want just the metadata to > start; in other words, you just want the sample information without > the expression values? > > Sean > > >> I am happy to use R, BioConductor, bioperl or whatever. >> >> Best, >> >> Tom >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
This looks totally cool. Is there a place where one can view the schema of the relational db? In any case -- Thanks tons! Tom ________________________________________ From: seandavi@gmail.com [seandavi@gmail.com] on behalf of Sean Davis [sdavis2@mail.nih.gov] Sent: Tuesday, June 04, 2013 1:19 PM To: Thomas H. Hampton Cc: bioconductor at r-project.org; Jack zhu Subject: Re: [BioC] GEOquery and Sample Subsets On Tue, Jun 4, 2013 at 1:14 PM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > Exactly! This might help: http://www.bioconductor.org/packages/release/bioc/html/GEOmetadb.html Let us know if you have questions. Sean > Thanks. > > ________________________________________ > From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov] > Sent: Tuesday, June 04, 2013 12:54 PM > To: Thomas H. Hampton > Cc: bioconductor at r-project.org > Subject: Re: [BioC] GEOquery and Sample Subsets > > On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton > <thomas.h.hampton at="" dartmouth.edu=""> wrote: >> I am using to GEOquery to establish sample subsets of GEO data -- that is, I would >> like to know which samples are replicates. >> >> I am doing it something like this: >> >> gds505 <- getGEO("GDS505") >> Columns(gds505) >> >>> str(Columns(gds505)) >> 'data.frame': 17 obs. of 4 variables: >> $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... >> $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... >> $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... >> $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... >> >> The problem I have is that the getGEO command retrieves a rather large object: >> >>> print(object.size(gds505), units="Mb") >> 12.6 Mb' >> >> This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. >> >> Is there a way to retrieve less? > > Hi, Tom. Are you saying that you really want just the metadata to > start; in other words, you just want the sample information without > the expression values? > > Sean > > >> I am happy to use R, BioConductor, bioperl or whatever. >> >> Best, >> >> Tom >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Tue, Jun 4, 2013 at 2:02 PM, Thomas H. Hampton <thomas.h.hampton at="" dartmouth.edu=""> wrote: > This looks totally cool. > > Is there a place where one can view the schema of the relational db? Hi, Tom. See the vignette for a diagram and for examples. We obviously also assume some familiarity with SQL. Sean > In any case -- Thanks tons! > > Tom > > > > ________________________________________ > From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov] > Sent: Tuesday, June 04, 2013 1:19 PM > To: Thomas H. Hampton > Cc: bioconductor at r-project.org; Jack zhu > Subject: Re: [BioC] GEOquery and Sample Subsets > > On Tue, Jun 4, 2013 at 1:14 PM, Thomas H. Hampton > <thomas.h.hampton at="" dartmouth.edu=""> wrote: >> Exactly! > > This might help: > > http://www.bioconductor.org/packages/release/bioc/html/GEOmetadb.html > > Let us know if you have questions. > > Sean > > >> Thanks. >> >> ________________________________________ >> From: seandavi at gmail.com [seandavi at gmail.com] on behalf of Sean Davis [sdavis2 at mail.nih.gov] >> Sent: Tuesday, June 04, 2013 12:54 PM >> To: Thomas H. Hampton >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] GEOquery and Sample Subsets >> >> On Tue, Jun 4, 2013 at 12:38 PM, Thomas H. Hampton >> <thomas.h.hampton at="" dartmouth.edu=""> wrote: >>> I am using to GEOquery to establish sample subsets of GEO data -- that is, I would >>> like to know which samples are replicates. >>> >>> I am doing it something like this: >>> >>> gds505 <- getGEO("GDS505") >>> Columns(gds505) >>> >>>> str(Columns(gds505)) >>> 'data.frame': 17 obs. of 4 variables: >>> $ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5 7 9 10 12 14 16 1 ... >>> $ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2 1 ... >>> $ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3 5 8 9 10 6 ... >>> $ description : chr "Value for GSM11814: C035 Renal Clear Cell Carcinoma U133A; src: Trizol... >>> >>> The problem I have is that the getGEO command retrieves a rather large object: >>> >>>> print(object.size(gds505), units="Mb") >>> 12.6 Mb' >>> >>> This takes up a lot of time and bandwidth if you plan to do it for thousands of accessions. >>> >>> Is there a way to retrieve less? >> >> Hi, Tom. Are you saying that you really want just the metadata to >> start; in other words, you just want the sample information without >> the expression values? >> >> Sean >> >> >>> I am happy to use R, BioConductor, bioperl or whatever. >>> >>> Best, >>> >>> Tom >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6