I've been successfully using GEOquery to download GEO datasets. I came across one (GSE56046) whose SOFT file seemed too small for the number of samples, and I get a lot of downstream errors (see below).
Before contacting the submitter, can anyone spot what the issue is?
This command downloads 1203 sample data surprisingly fast and results in a 61 Mb dataset, which is surprisingly small.
gse = getGEO("GSE56046",GSEMatrix=F)
This also results in warning messages:
Warning messages:
1: In readLines(con, n = chunksize) :
seek on a gzfile connection returned an internal error
There is also something fishy about the S4 object:
dim(pData(gse[[1]]))
Error in gse[[1]] : this S4 class is not subsettable
The error you are getting has to do with specifying GSEMatrix=F. If you use the default instead,
you will get a list of ExpressionSets. Unfortunately, though, in this case, that will not help
much to get access to the actual expression values. This GSE has only the metadata attached to the
GSE files. To get the actual data, you'll need to download the supplemental files and then parse
them manually.
getGEOSuppFiles("GSE56046")
Combined with getting the GSE metadata using:
gse = getGEO("GSE56046")
you may be able to construct an ExpressionSet or SummarizedExperiment. Unfortunatly, when GEO has records
like this that do not supply data as part of the record but, instead, as supplemental files,
GEOquery does not attempt to "guess" what the submitters had in mind.