Index genome generation with Subread for colorspace data
1
0
Entering edit mode
gokberk • 0
@gokberk-20463
Last seen 2.5 years ago

Hi all,

I need to analyze some old SOLiD colorspace RNA-seq reads and have heard that Subread still supports colorspace data analysis. So, I downloaded version 1.6.4 and compiled it on my server. I've been trying to generate an index genome using ./subread-buildindex -c -F -o macaca_fascicularis_5.0_index ../../bowtie_index/macaca_fascicularis_5.0_genome.fa command and received the fancy output below:

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
      v1.6.4

//================================= setting ==================================\\
||                                                                            ||
||                Index name : macaca_fascicularis_5.0_index                  ||
||               Index space : color space                                    ||
||                    Memory : 8000 Mbytes                                    ||
||          Repeat threshold : 100 repeats                                    ||
||              Gapped index : no                                             ||
||                                                                            ||
||               Input files : 1 file in total                                ||
||                             o macaca_fascicularis_5.0_genome.fa            ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Check the integrity of provided reference sequences ...                    ||
|| No format issues were found                                                ||
|| Scan uninformative subreads in reference sequences ...                     ||

However, it stuck at this point about an hour and a half now, so I was wondering if something is wrong or it's normal. The genome assembly I'm indexing is 3GB.

I'd appreciate any helps, cheers. Gökberk

subread solid-seq data colorspace rna-seq • 270 views
ADD COMMENT
0
Entering edit mode
Yang Liao ▴ 260
@yang-liao-6075
Last seen 3 days ago
Australia

Hi Gökberk,

I downloaded the 5.0 version of the Macaca Fascicularis genome from Ensembl (the top-level sequences, 867 MB in gzipped format). I then ran the index builder in Subread-1.6.4 with the same arguments as you used. The index was built in 45 minutes with no error, and the "scan uninformative subreads" step used less than 15 minutes (on a Xeon E5-2690 v3 computer with 512GB of memory). If it is the same genome you used, it looks like the index builder was very slow on your computer.

The index builder uses around 10GB of memory under your settings, so please see if your computer has enough memory to run the index builder. When the physical memory runs out, the operating system may use the swap volume on the HDD and it is very slow.

BTW, if your computer has at least 24GB of free memory, I suggest to use the "-B" option to build a one-block index. This can largely improve the mapping speed.

Cheers, Yang

ADD COMMENT

Login before adding your answer.

Traffic: 316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6