I am trying to get the read counts over a number of annotated and newly created features from S, cerevisiae so that I can detect differentially expressed regions.
The features are in a dozen or so txdbs GRanges I created and I have tens of bam files to deal with on a 8Gb RAM laptop. To accomplish this, I did the following:
- I created a BamFileList with all my bam files that are accessed in e^5 chunks
fls <- BamFileList(dir(pattern = ".bam$"), yieldSize=100000)
- I created a GRangesList containing all my GRanges
All<-GRangesList(c(gr1, gr2....., gr12)
- Passed them to summarizeOverlaps()
se=summarizeOverlaps(All, fls, mode="IntersectionNotEmpty", singleEnd=FALSE, ignore.strand = FALSE) saveRDS(se, file = "Allcounts.rds")
To my surprise, I got a RangeSummerizedExperiment Object that contains only one row with an aggregated number of counts for each bam file, which doesn't seem very useful...
sample1 sample2 ......sample85 counts counts counts
I expected a table-like format with counts/feature/GRanges
The question is whether I can de-convolute somehow the counts /feature/ GRanges.
A test in which I used a GRanges object containing 2 GRanges - GR=c(gr1, gr2) and a couple of bam files in a BamFileList seems to do the trick. The assay(se) object shows the expected counts/feature for each bam file, so expanding this approach to a BamFileList that contains all my files would be a solution to my problem. I am testing it right now...
> assay(se) file1.bam file2.bam [1,] 671 256 [2,] 250 178 [3,] 64 39
Thanks in advance for any suggestions/positive comments/tips.