Entering edit mode
Thomas Hampton
▴
750
@thomas-hampton-2820
Last seen 10.3 years ago
I am using to GEOquery to establish sample subsets of GEO data -- that
is, I would
like to know which samples are replicates.
I am doing it something like this:
gds505 <- getGEO("GDS505")
Columns(gds505)
> str(Columns(gds505))
'data.frame': 17 obs. of 4 variables:
$ sample : Factor w/ 17 levels "GSM11805","GSM11814",..: 2 4 5
7 9 10 12 14 16 1 ...
$ disease.state: Factor w/ 2 levels "normal","RCC": 2 2 2 2 2 2 2 2 2
1 ...
$ individual : Factor w/ 10 levels "001","005","011",..: 6 4 1 2 3
5 8 9 10 6 ...
$ description : chr "Value for GSM11814: C035 Renal Clear Cell
Carcinoma U133A; src: Trizol...
The problem I have is that the getGEO command retrieves a rather large
object:
> print(object.size(gds505), units="Mb")
12.6 Mb'
This takes up a lot of time and bandwidth if you plan to do it for
thousands of accessions.
Is there a way to retrieve less?
I am happy to use R, BioConductor, bioperl or whatever.
Best,
Tom
[[alternative HTML version deleted]]