qCount with a GRangesList is very slow
0
0
Entering edit mode
@vakulmohanty-8232
Last seen 4.6 years ago
United States

Hello,

I'm using QuasR package to process some RNAseq I have and count reads in exonic and intronic regions of all genes in the human genome. To this end I have aligned the reads and use qCount to carry out my counting using a GrangesList as the query. The list an entry for each gene and the entry contains either the exonic/intronic ranges. my code snippet is below:

clusters = makeForkCluster(nnodes = 8) eCount = qCount(proj,exons,clObj = clusters) stopCluster(cl = clusters)

However this is taking abnormally long to run, which I think is because qCount uses a for loop to loop over all elements of the list and remove redundancies using setdiff(). Is there a way that I can speed up this redundancy removal step, I have ~20000 genes (elements in the list) and the step of removing redundancies isn't complete even after ~40 hours.

I'll be grateful for any pointers.

Thanking You,

Vakul

QuasR qCount GRangesList • 529 views
0
Entering edit mode

Hi Vakul

You should probably rather use a GRanges query, instead of a GRangesList.

The GRangesList query is meant for a special analysis (see ?qCount) which partitions the genome into domains.

If you want one count per exon, use a GRanges with exons without names or with unique names per exon. If you want one count per gene (creating a union of all exon), use a GRanges with exons, named by genes (all exons from the same gene have the identical names).

This should be much faster.

Michael