The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: deconvole counts per feature from summerizedOverlaps with GRangesLists
0
gravatar for cosmin
11 months ago by
cosmin0
cosmin0 wrote:

 

I am trying to get the read counts over a number of annotated and newly created features from S, cerevisiae so that I can detect differentially expressed regions.

The features are in a dozen or so txdbs GRanges I created and  I have tens of bam files to deal with on a 8Gb RAM laptop. To accomplish this, I did the following:

- I created a BamFileList with all my bam files that are accessed in e^5 chunks

fls <- BamFileList(dir(pattern = ".bam$"), yieldSize=100000)

- I created a GRangesList containing all my GRanges

All<-GRangesList(c(gr1, gr2....., gr12)

- Passed them to summarizeOverlaps()

se=summarizeOverlaps(All, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE)
saveRDS(se, file = "Allcounts.rds")

To my surprise, I got a RangeSummerizedExperiment Object that contains only one row with an aggregated number of counts for each bam file, which doesn't seem very useful...

   sample1      sample2   ......sample85
   counts          counts          counts

I expected a table-like format with counts/feature/GRanges

The question is whether I can de-convolute somehow the counts /feature/ GRanges.

 

A test in which I used a GRanges object containing 2 GRanges - GR=c(gr1, gr2) and a couple of bam files in a BamFileList seems to do the trick. The assay(se) object shows the expected counts/feature for each bam file, so expanding this approach to a BamFileList that contains all my files would be a solution to my problem. I am testing it right now...

> assay(se)
                                                file1.bam                                file2.bam
    [1,]                                           671                                         256
    [2,]                                           250                                         178
    [3,]                                            64                                          39

Thanks in advance for any suggestions/positive comments/tips.

ADD COMMENTlink modified 11 months ago • written 11 months ago by cosmin0
Answer: deconvole counts per feature from summerizedOverlaps with GRangesLists
0
gravatar for Martin Morgan
11 months ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

This

All <- GRangesList(c(gr1, gr2....., gr12))

concatenates all the GRanges into a single GRanges, and puts it into a list with length 1. Probably you intended to

ALL <- GRangesList(gr1, gr2, ...)

 

ADD COMMENTlink written 11 months ago by Martin Morgan ♦♦ 22k
Answer: deconvole counts per feature from summerizedOverlaps with GRangesLists
0
gravatar for cosmin
11 months ago by
cosmin0
cosmin0 wrote:

Thank you Martin for taking the time and answer me. I'll try your suggestion. 

As I am new at this, I followed the example given in the description of the summerizedOverlap  function

grl <- GRangesList(c(gr1, gr2)). 

In the meantime, I have created a vector of GRanges vector

new_gr<- c(gr1, gr2) and passed that to summarizeOverlap

summarizeOverlaps(new_gr, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE)

​It worked... 

I'll compare the results with the solution suggested by you.

​Thanks again

 

 

ADD COMMENTlink written 11 months ago by cosmin0

Can you point to where you see the example GRangesList(c(gr1, gr2)) ? It doesn't look correct.

ADD REPLYlink modified 11 months ago • written 11 months ago by Martin Morgan ♦♦ 22k

 

http://www.bioconductor.org/packages//2.11/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf

See page 4

ADD REPLYlink written 11 months ago by cosmin0
1

Thanks; note the '2.11' in the URL and the 'compiled' date on the document; this is from 2013 so very out of date. Current vignettes are available from your own session with

browseVignettes(package="SummarizeExperiment")
browseVignettes(package="GenomicRanges")

Or from the web site by browsing to, e.g., https://bioconductor.org/packages/SummarizedExperiment and looking for the 'Documentation' section.

ADD REPLYlink written 11 months ago by Martin Morgan ♦♦ 22k

Thanks for pointing out the new resources. 

I have tried GRangesList(gr1, ..gr12) and the result is a (12x nr of samples) object. So each component of the list yields an aggregated count. Not what I want. I want to have the ranges and the feature info preserved and it doesn't seem that GRangesList option allows for that as far as I tried. 

So far, the only option  I found that does what I want is

new_gr<- c(gr1, gr2) 

summarizeOverlaps(new_gr, fls)

I tried it with two so far and it took a while... 

 

ADD REPLYlink written 11 months ago by cosmin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 276 users visited in the last hour