SummarizedExperiment object from GEOquery obtained GSE
1
1
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 5.0 years ago

I was wondering if there was a straightforward way to take a GEOquery downloaded GSE, which I get in the following way:

gse <- getGEO("GSE63137",GSEMatrix=FALSE)

and to create a SummarizedExperiment object with representative ranges? If the GSE has for instance 10 bed files, is there a quick way to make a singular SummarizedExperiment object from these? 

geoquery summarizedexperiment GEO matrix • 2.8k views
ADD COMMENT
1
Entering edit mode

With 10 bed files, what would you want the ranges and the "assay" in the summarized experiment to contain?

ADD REPLY
1
Entering edit mode

No I don't mean that it should be a single SummarizedExperiment object (sorry bad wording on my part), just that I would obtain 10 SE objects simultaneously with associated GRanges. Just in general what is the most straightforward and direct way to do this? 

ADD REPLY
4
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States

I have just updated the GEOquery getGEOSuppFiles() function (version 2.47.16) to support filtering supplemental files (filter_regex='bed') and to return a listing of files without download (fetch_files=FALSE). To give it a try immediately, install from github:

biocLite('seandavi/GEOquery')

In 24-48 hours, the development version of GEOquery should be available.

After installation, you should be able to do something like:

library(GEOquery)
library(rtracklayer)
bedfiles = getGEOSuppFiles("GSE63137", filter_regex = 'bed')
# a data.frame with the filenames as rownames
bedfiles_as_granges = lapply(rownames(bedfiles), import, format = "bed")
bedfiles_as_granges[[1]] #first file granges
[[1]]
GRanges object with 103361 ranges and 0 metadata columns:
           seqnames               ranges strand
              <Rle>            <IRanges>  <Rle>
       [1]     chr1   [3094879, 3095533]      *
       [2]     chr1   [3119625, 3120840]      *
       [3]     chr1   [3121310, 3121944]      *
       [4]     chr1   [3292627, 3293590]      *
       [5]     chr1   [3322353, 3322979]      *
       ...      ...                  ...    ...
  [103357]     chrY [90808536, 90809176]      *
  [103358]     chrY [90810611, 90811697]      *
  [103359]     chrY [90812380, 90813359]      *
  [103360]     chrY [90828629, 90829131]      *
  [103361]     chrY [90838918, 90839418]      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

I suspect that leaving these as GRanges objects may be the most useful form for analysis, but you could go ahead and convert to SummarizedExperiments if you like.

ADD COMMENT
0
Entering edit mode

Thanks, exactly what I was after!

ADD REPLY
1
Entering edit mode

Great. Let me know if you have any problems or have further suggestions on the GEOquery side of things. 

ADD REPLY
1
Entering edit mode

Actually one other quick question, the (filter_regex='bed'), will this also eliminate the .tar files that contain .bed files from retrieval and only get the bed.gz files listed in supp data? 

ADD REPLY
1
Entering edit mode

The regex is matched directly against filenames, not file contents. The default behavior when no regex is specified is to fetch all supplemental files, so the .tar file will be downloaded by default. 

ADD REPLY

Login before adding your answer.

Traffic: 519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6