qCount with a GRangesList is very slow
0
0
Entering edit mode
@vakulmohanty-8232
Last seen 6.8 years ago
United States

Hello,

I'm using QuasR package to process some RNAseq I have and count reads in exonic and intronic regions of all genes in the human genome. To this end I have aligned the reads and use qCount to carry out my counting using a GrangesList as the query. The list an entry for each gene and the entry contains either the exonic/intronic ranges. my code snippet is below:

clusters = makeForkCluster(nnodes = 8)
eCount = qCount(proj,exons,clObj = clusters)
stopCluster(cl = clusters)

However this is taking abnormally long to run, which I think is because qCount uses a for loop to loop over all elements of the list and remove redundancies using setdiff(). Is there a way that I can speed up this redundancy removal step, I have ~20000 genes (elements in the list) and the step of removing redundancies isn't complete even after ~40 hours.

 

I'll be grateful for any pointers.

Thanking You,

Vakul

QuasR qCount GRangesList • 1.1k views
ADD COMMENT
0
Entering edit mode

Hi Vakul

You should probably rather use a GRanges query, instead of a GRangesList.

The GRangesList query is meant for a special analysis (see ?qCount) which partitions the genome into domains.

If you want one count per exon, use a GRanges with exons without names or with unique names per exon. If you want one count per gene (creating a union of all exon), use a GRanges with exons, named by genes (all exons from the same gene have the identical names).

This should be much faster.

Michael

ADD REPLY

Login before adding your answer.

Traffic: 542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6