Dear All,
I am interested in counting the number of mapped reads in equal size intervals on genome (say 200 bp), my question is whether featurecounts any option for doing this. According to what I have found so far it sounds like it is more appropriate for counting reads mapping to exons, genes, and promoters. In order to use it for this purpose I define a set of features like following:
library("Rsubread")
library("BSgenome.Hsapiens.UCSC.hg19")
lengths = seqlengths(Hsapiens)
chrlengths = lengths[1:24]
intervalnum <- ceiling(chrlengths/200)
start <- rapply(as.list(intervalnum),function(x) seq(1,x*200,200))
end <- rapply(as.list(chrlengths),function(x) c(as.list(seq(200,x,200)),x))
chr <- rep(names(intervalnum),intervalnum)
ann <- data.frame(
GeneID=as.character(1:sum(intervalnum)),
Chr=chr,
Start=start,
End=end,
Strand = rep("+",sum(intervalnum)),
stringsAsFactors=FALSE)
bams <- dir(".","_sorted.bam$", full=TRUE)
fc <- featureCounts(files=bams, nthreads=5, readExtension3=150, annot.ext=ann)
Is there a better and faster way? (this way, it takes too long for each bam file)
How long did it take for each of your bam files?
Hi Wei,
I could run it more rapidly with more cores thanks to parallel feature in featurecounts.
Hadi
Hi Hadi, are you using the latest version of Rsubread?
We ran your code to generate 15 million bins and then ran featureCounts to assign a bam file including 11 million reads to the bins. FeatureCounts successfully completed the read counting in just 4 minutes.
Can you provide your session info and also featureCounts screen output?
Hi Wei,
Code above is the modified version of code I first used with featurecounts. Yes, it is fast enough with 5 cores. I have the latest version of Rsubread. I was not sure whether I was using featurecounts correctly, but with these responses I could modify the code and run it successfully. Thank you so much.
Hadi