I am trying to create an index file with kallisto for mapping RNA seq data. OS of my laptop is Ubuntu 18.04 and I am working on its bash.
Version $ kallisto kallisto 0.46.2
I downloaded reference cDNA data from: ftp://ftp.ensembl.org/pub/release-99/fasta/homosapiens/cdna/Homosapiens.GRCh38.cdna.all.fa.gz
Then I run the command:
$ kallisto index -i kallisto.idx Homo_sapiens.GRCh38.cdna.all.fa
Also tried:
$ kallisto index --index=kallistoidx Homosapiens.GRCh38.cdna.all.fa
Then I got the following error:
[build] loading fasta file homosapiens/Homosapiens.GRCh38.cdna.all.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 1431 target sequences
[build] warning: replaced 5 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... Forced termination
I think kallisto can work without large memory and that is why I am trying. Could you please advise me how to fix this.
Thank you
You can post this over at Biostars.org. If you do then please include some info about your machine, so available memory etc and monitor memory usage while running kallisto indexing.