Question: qCount with a GRangesList is very slow
0
gravatar for vakul.mohanty
2.3 years ago by
United States
vakul.mohanty10 wrote:

Hello,

I'm using QuasR package to process some RNAseq I have and count reads in exonic and intronic regions of all genes in the human genome. To this end I have aligned the reads and use qCount to carry out my counting using a GrangesList as the query. The list an entry for each gene and the entry contains either the exonic/intronic ranges. my code snippet is below:

clusters = makeForkCluster(nnodes = 8)
eCount = qCount(proj,exons,clObj = clusters)
stopCluster(cl = clusters)

However this is taking abnormally long to run, which I think is because qCount uses a for loop to loop over all elements of the list and remove redundancies using setdiff(). Is there a way that I can speed up this redundancy removal step, I have ~20000 genes (elements in the list) and the step of removing redundancies isn't complete even after ~40 hours.

 

I'll be grateful for any pointers.

Thanking You,

Vakul

quasr grangeslist qcount • 318 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by vakul.mohanty10

Hi Vakul

You should probably rather use a GRanges query, instead of a GRangesList.

The GRangesList query is meant for a special analysis (see ?qCount) which partitions the genome into domains.

If you want one count per exon, use a GRanges with exons without names or with unique names per exon. If you want one count per gene (creating a union of all exon), use a GRanges with exons, named by genes (all exons from the same gene have the identical names).

This should be much faster.

Michael

ADD REPLYlink written 7 months ago by Michael Stadler330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 187 users visited in the last hour