deconvole counts per feature from summerizedOverlaps with GRangesLists
2
0
Entering edit mode
cosmin • 0
@cosmin-15307
Last seen 6.8 years ago

 

I am trying to get the read counts over a number of annotated and newly created features from S, cerevisiae so that I can detect differentially expressed regions.

The features are in a dozen or so txdbs GRanges I created and  I have tens of bam files to deal with on a 8Gb RAM laptop. To accomplish this, I did the following:

- I created a BamFileList with all my bam files that are accessed in e^5 chunks

fls <- BamFileList(dir(pattern = ".bam$"), yieldSize=100000)

- I created a GRangesList containing all my GRanges

All<-GRangesList(c(gr1, gr2....., gr12)

- Passed them to summarizeOverlaps()

se=summarizeOverlaps(All, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE)
saveRDS(se, file = "Allcounts.rds")

To my surprise, I got a RangeSummerizedExperiment Object that contains only one row with an aggregated number of counts for each bam file, which doesn't seem very useful...

   sample1      sample2   ......sample85
   counts          counts          counts

I expected a table-like format with counts/feature/GRanges

The question is whether I can de-convolute somehow the counts /feature/ GRanges.

 

A test in which I used a GRanges object containing 2 GRanges - GR=c(gr1, gr2) and a couple of bam files in a BamFileList seems to do the trick. The assay(se) object shows the expected counts/feature for each bam file, so expanding this approach to a BamFileList that contains all my files would be a solution to my problem. I am testing it right now...

> assay(se)
                                                file1.bam                                file2.bam
    [1,]                                           671                                         256
    [2,]                                           250                                         178
    [3,]                                            64                                          39

Thanks in advance for any suggestions/positive comments/tips.

summarizeoverlaps rangedsummarizedexperiment grangeslist summarizedexperiment • 1.8k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

This

All <- GRangesList(c(gr1, gr2....., gr12))

concatenates all the GRanges into a single GRanges, and puts it into a list with length 1. Probably you intended to

ALL <- GRangesList(gr1, gr2, ...)

 

ADD COMMENT
0
Entering edit mode
cosmin • 0
@cosmin-15307
Last seen 6.8 years ago

Thank you Martin for taking the time and answer me. I'll try your suggestion. 

As I am new at this, I followed the example given in the description of the summerizedOverlap  function

grl <- GRangesList(c(gr1, gr2)). 

In the meantime, I have created a vector of GRanges vector

new_gr<- c(gr1, gr2) and passed that to summarizeOverlap

summarizeOverlaps(new_gr, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE)

​It worked... 

I'll compare the results with the solution suggested by you.

​Thanks again

 

 

ADD COMMENT
0
Entering edit mode

Can you point to where you see the example GRangesList(c(gr1, gr2)) ? It doesn't look correct.

ADD REPLY
1
Entering edit mode

Thanks; note the '2.11' in the URL and the 'compiled' date on the document; this is from 2013 so very out of date. Current vignettes are available from your own session with

browseVignettes(package="SummarizeExperiment")
browseVignettes(package="GenomicRanges")

Or from the web site by browsing to, e.g., https://bioconductor.org/packages/SummarizedExperiment and looking for the 'Documentation' section.

ADD REPLY
0
Entering edit mode

Thanks for pointing out the new resources. 

I have tried GRangesList(gr1, ..gr12) and the result is a (12x nr of samples) object. So each component of the list yields an aggregated count. Not what I want. I want to have the ranges and the feature info preserved and it doesn't seem that GRangesList option allows for that as far as I tried. 

So far, the only option  I found that does what I want is

new_gr<- c(gr1, gr2) 

summarizeOverlaps(new_gr, fls)

I tried it with two so far and it took a while... 

 

ADD REPLY

Login before adding your answer.

Traffic: 510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6