Question: SummarizedExperiment object from GEOquery obtained GSE
1
gravatar for rbronste
15 months ago by
rbronste60
rbronste60 wrote:

I was wondering if there was a straightforward way to take a GEOquery downloaded GSE, which I get in the following way:

gse <- getGEO("GSE63137",GSEMatrix=FALSE)

and to create a SummarizedExperiment object with representative ranges? If the GSE has for instance 10 bed files, is there a quick way to make a singular SummarizedExperiment object from these? 

ADD COMMENTlink modified 15 months ago by Sean Davis21k • written 15 months ago by rbronste60
1

With 10 bed files, what would you want the ranges and the "assay" in the summarized experiment to contain?

ADD REPLYlink written 15 months ago by Sean Davis21k
1

No I don't mean that it should be a single SummarizedExperiment object (sorry bad wording on my part), just that I would obtain 10 SE objects simultaneously with associated GRanges. Just in general what is the most straightforward and direct way to do this? 

ADD REPLYlink written 15 months ago by rbronste60
Answer: SummarizedExperiment object from GEOquery obtained GSE
4
gravatar for Sean Davis
15 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

I have just updated the GEOquery getGEOSuppFiles() function (version 2.47.16) to support filtering supplemental files (filter_regex='bed') and to return a listing of files without download (fetch_files=FALSE). To give it a try immediately, install from github:

biocLite('seandavi/GEOquery')

In 24-48 hours, the development version of GEOquery should be available.

After installation, you should be able to do something like:

library(GEOquery)
library(rtracklayer)
bedfiles = getGEOSuppFiles("GSE63137", filter_regex = 'bed')
# a data.frame with the filenames as rownames
bedfiles_as_granges = lapply(rownames(bedfiles), import, format = "bed")
bedfiles_as_granges[[1]] #first file granges
[[1]]
GRanges object with 103361 ranges and 0 metadata columns:
           seqnames               ranges strand
              <Rle>            <IRanges>  <Rle>
       [1]     chr1   [3094879, 3095533]      *
       [2]     chr1   [3119625, 3120840]      *
       [3]     chr1   [3121310, 3121944]      *
       [4]     chr1   [3292627, 3293590]      *
       [5]     chr1   [3322353, 3322979]      *
       ...      ...                  ...    ...
  [103357]     chrY [90808536, 90809176]      *
  [103358]     chrY [90810611, 90811697]      *
  [103359]     chrY [90812380, 90813359]      *
  [103360]     chrY [90828629, 90829131]      *
  [103361]     chrY [90838918, 90839418]      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths

I suspect that leaving these as GRanges objects may be the most useful form for analysis, but you could go ahead and convert to SummarizedExperiments if you like.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Sean Davis21k

Thanks, exactly what I was after!

ADD REPLYlink written 15 months ago by rbronste60
1

Great. Let me know if you have any problems or have further suggestions on the GEOquery side of things. 

ADD REPLYlink written 15 months ago by Sean Davis21k
1

Actually one other quick question, the (filter_regex='bed'), will this also eliminate the .tar files that contain .bed files from retrieval and only get the bed.gz files listed in supp data? 

ADD REPLYlink written 15 months ago by rbronste60
1

The regex is matched directly against filenames, not file contents. The default behavior when no regex is specified is to fetch all supplemental files, so the .tar file will be downloaded by default. 

ADD REPLYlink written 15 months ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 115 users visited in the last hour