Question: ProcessAmplicons using very little memory on big job?
gravatar for sdalin
14 months ago by
sdalin0 wrote:

I'm trying to use processAmplicons to generate sg counts for a CRISPR screen. I've got ~200 million reads per fastq file, with 4-5 barcodes per file, and 180000 guides.  I'm running this on my school's cluster which has ~30GB per node of the cluster.  When I submit one fastq for analysis, I request 30G of memory and an entire node.  After a full day, the output still says " -- Processing 10 million reads", however when I check the memory usage, I see that node is only using just under 1G of RAM. 

I'm not sure why the memory usage is so low, I would expect the script to need all 30G of memory due to the size of the job.  Is there some option in bioconductor or edgeR that may be throttling my memory usage?  Any tips to speed this up?  I have already tried "lazy parallelization" but due to this memory issue, that doesn't run any faster.

ADD COMMENTlink written 14 months ago by sdalin0

This function breaks the data into smaller chunks that it processes serially, so you shouldn't need 30G of memory. It also shouldn't take that long to process the first 10 million reads. Can you provide the example code and sessionInfo() and a small sample of the sequences from your FASTQ file (perhaps a few thousand) that we can test further on?

ADD REPLYlink written 14 months ago by Matthew Ritchie750
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 259 users visited in the last hour