I have just updated the GEOquery getGEOSuppFiles()
function (version 2.47.16) to support filtering supplemental files (filter_regex='bed'
) and to return a listing of files without download (fetch_files=FALSE
). To give it a try immediately, install from github:
biocLite('seandavi/GEOquery')
In 24-48 hours, the development version of GEOquery should be available.
After installation, you should be able to do something like:
library(GEOquery)
library(rtracklayer)
bedfiles = getGEOSuppFiles("GSE63137", filter_regex = 'bed')
# a data.frame with the filenames as rownames
bedfiles_as_granges = lapply(rownames(bedfiles), import, format = "bed")
bedfiles_as_granges[[1]] #first file granges
[[1]]
GRanges object with 103361 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [3094879, 3095533] *
[2] chr1 [3119625, 3120840] *
[3] chr1 [3121310, 3121944] *
[4] chr1 [3292627, 3293590] *
[5] chr1 [3322353, 3322979] *
... ... ... ...
[103357] chrY [90808536, 90809176] *
[103358] chrY [90810611, 90811697] *
[103359] chrY [90812380, 90813359] *
[103360] chrY [90828629, 90829131] *
[103361] chrY [90838918, 90839418] *
-------
seqinfo: 22 sequences from an unspecified genome; no seqlengths
I suspect that leaving these as GRanges
objects may be the most useful form for analysis, but you could go ahead and convert to SummarizedExperiment
s if you like.
With 10 bed files, what would you want the ranges and the "assay" in the summarized experiment to contain?
No I don't mean that it should be a single SummarizedExperiment object (sorry bad wording on my part), just that I would obtain 10 SE objects simultaneously with associated GRanges. Just in general what is the most straightforward and direct way to do this?