Question

Rsubread buildIndex: Too many sections in reference?

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 13 hours ago

Australia/Melbourne

Dear Davis, The buildindex function has a hard limit on the number of chromosomes allowed, which is 1000. Your "rn5.fa" file contains more than 1000 chromsomes/contigs and therefore the function reported that message. The consequence was that those chromosomes/contigs located after the first 1000 chromsomes/configs in the file were not correctly indexed. We have increased the limit to 50,000 which should be OK for your dataset now. The changes have been committed to bioc devel svn. It should be available to you in a couple of days. Let us know if the problem persists. Cheers, Wei On Oct 19, 2012, at 6:30 AM, Davis, Wade wrote: > Dear Wei, > I received the following message when building an index for rn5: > > >buildindex(basename="rn5_rsubread_index",reference="rn5.fa",memory= 12000) > > Building a base-space index. > Size of memory used=12000 MB > Base name of the built index = rn5_rsubread_index > Scanning non-informative reads in the chromosomes... > completed=85.27%; time used=216.9s; rate=14099.1k bps/s; total=2926m bps > There are too many sections in the chromosome data files (more than 1000 sections). > There are 663648 non-informative subreads found in the chromosomes. > Index items per partition = 1375180800 > > My question is: What is the consequence of the message There are too many sections in the chromosome data files (more than 1000 sections). > > I imagine this is due to all of the nonstandard chromosomes in the reference. I could clean up up the reference to get rid of them, but I am curious to know the (biological) opinion of others. This to be used for a standard RNA-Seq run (on rat of course). > > I am running the development version of R and Rsubread, as shown below. > > Thanks, > Wade > > > sessionInfo() > R Under development (unstable) (2012-09-24 r60800) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Rsubread_1.9.0 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}

Rsubread Rsubread • 1.0k views

ADD COMMENT • link 12.2 years ago Wei Shi ★ 3.6k