Question

ProcessAmplicons using very little memory on big job?

0

Entering edit mode

sdalin • 0

@sdalin-15843

Last seen 5.5 years ago

I'm trying to use processAmplicons to generate sg counts for a CRISPR screen. I've got ~200 million reads per fastq file, with 4-5 barcodes per file, and 180000 guides. I'm running this on my school's cluster which has ~30GB per node of the cluster. When I submit one fastq for analysis, I request 30G of memory and an entire node. After a full day, the output still says " -- Processing 10 million reads", however when I check the memory usage, I see that node is only using just under 1G of RAM.

I'm not sure why the memory usage is so low, I would expect the script to need all 30G of memory due to the size of the job. Is there some option in bioconductor or edgeR that may be throttling my memory usage? Any tips to speed this up? I have already tried "lazy parallelization" but due to this memory issue, that doesn't run any faster.

edgeR processamplicons memory problem • 830 views

ADD COMMENT • link 5.8 years ago sdalin • 0

0

Entering edit mode

This function breaks the data into smaller chunks that it processes serially, so you shouldn't need 30G of memory. It also shouldn't take that long to process the first 10 million reads. Can you provide the example code and sessionInfo() and a small sample of the sequences from your FASTQ file (perhaps a few thousand) that we can test further on?

ADD REPLY • link 5.8 years ago Matthew Ritchie ▴ 1000