Entering edit mode
Dear Davis,
The buildindex function has a hard limit on the number of chromosomes
allowed, which is 1000. Your "rn5.fa" file contains more than 1000
chromsomes/contigs and therefore the function reported that message.
The consequence was that those chromosomes/contigs located after the
first 1000 chromsomes/configs in the file were not correctly indexed.
We have increased the limit to 50,000 which should be OK for your
dataset now. The changes have been committed to bioc devel svn. It
should be available to you in a couple of days.
Let us know if the problem persists.
Cheers,
Wei
On Oct 19, 2012, at 6:30 AM, Davis, Wade wrote:
> Dear Wei,
> I received the following message when building an index for rn5:
>
> >buildindex(basename="rn5_rsubread_index",reference="rn5.fa",memory=
12000)
>
> Building a base-space index.
> Size of memory used=12000 MB
> Base name of the built index = rn5_rsubread_index
> Scanning non-informative reads in the chromosomes...
> completed=85.27%; time used=216.9s; rate=14099.1k bps/s; total=2926m
bps
> There are too many sections in the chromosome data files (more than
1000 sections).
> There are 663648 non-informative subreads found in the chromosomes.
> Index items per partition = 1375180800
>
> My question is: What is the consequence of the message There are
too many sections in the chromosome data files (more than 1000
sections).
>
> I imagine this is due to all of the nonstandard chromosomes in the
reference. I could clean up up the reference to get rid of them, but
I am curious to know the (biological) opinion of others. This to be
used for a standard RNA-Seq run (on rat of course).
>
> I am running the development version of R and Rsubread, as shown
below.
>
> Thanks,
> Wade
>
> > sessionInfo()
> R Under development (unstable) (2012-09-24 r60800)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsubread_1.9.0
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:8}}