Question

Error with Rsubread at first index: normalizePath(path.expand(path), winslash, mustWork)

0

Entering edit mode

peter.r.hoyt • 0

@peterrhoyt-22385

Last seen 21 months ago

United States

I was trying to start an scRNAseq experiment by generating a count matrix with Rsubread. This is using RStudio Version 1.1.453 and R3.6.1. On a Windows 10 machine. The tutorial to start the project works perfectly:

library(Rsubread) ref <- system.file("extdata","reference.fa",package="Rsubread") buildindex(basename="reference_index",reference=ref)

Everything runs exactly as it says in the vignette at: https://bioconductor.org/packages/3.10/bioc/vignettes/Rsubread/inst/doc/Rsubread.pdf

But my reference genome from NCBI (concatenated into one file) gives an error when building the index.

ref <- system.file("extdata","cat9ref.fa",package="Rsubread") buildindex(basename="reference_index",reference=ref) Error in normalizePath(path.expand(path), winslash, mustWork) : path[1]="": The filename, directory name, or volume label syntax is incorrect

I'm fairly new to R, but this looks like a formatting error. Couldn't find anything by google that helped. Hopefully someone has seen this and can tell me what I'm doing wrong. Thanks Pete

normalization • 7.8k views

ADD COMMENT • link 6.1 years ago peter.r.hoyt • 0

0

Entering edit mode

peter.r.hoyt • 0

@peterrhoyt-22385

Last seen 21 months ago

United States

Turns out the fasta files downloaded from NCBI ALL started with ">ref|" in the header.

All I had to do was replace the ">ref|" with ">" to get the program to run. I used a sed script:

sed 's/>[^|]*|/>/; ' cat9ref.fa > cat9ref2.fa

Then checked the head of the file, and made sure there were no more "ref|" anywhere:

grep "ref" cat9ref2.fa > catrefs.txt

The catrefs.txt was a blank file. so I overwrote the old file

mv cat9ref2.fa catref.fa

Then re-ran.

> ref <- "cat9ref.fa"
> buildindex(basename="cat9_index",reference=ref)

The index was built. But I am moving to our cluster now as it took 8.5 hours.

Thanks! Pete

ADD COMMENT • link 6.1 years ago peter.r.hoyt • 0

score 1 · Accepted Answer · 2019-11-18

1

Entering edit mode

Mike Smith ★ 6.6k

@mike-smith

Last seen 22 days ago

EMBL Heidelberg

The function system.file() is a command that knows where a particular R package (specified by the PACKAGE argument) has been installed, and will construct a full path to files installed as part of that package.

That means it's very useful for example workflows in package vignettes, where a small dataset is distributed with the package and the vignette author doesn't need to care about exactly where each user has install it - system.file() takes care of it.

However it's unecessary when you're working with your own data. You can just provide the path directly yourself e.g. ref <- "/path/to/cat9ref.fa"

ADD COMMENT • link 6.1 years ago Mike Smith ★ 6.6k

0

Entering edit mode

Thanks, that got me past step 1. I literally only had to put in: ref <- "cat9ref.fa" That's the answer!

I still got ERROR: repeated chromosome name 'ref' is observed in the FASTA file(s).

Can I ask about that here or start a new thread?

ADD REPLY • link 6.1 years ago peter.r.hoyt • 0

0

Entering edit mode

I don't know for certain, but that error sounds like you have duplicated some chromosome names when concatenating your single reference file. I guess there's quite a few ways that could happen, e.g.

the individual files did not have distinct names
you genuinely duplicated something
your sequences have names like ref 1 and ref 2 but Rsubread doesn't like spaces

Those are all guesses, but I would check the reference FASTA file to try and identify the cause. If it's not obvious then I would start a new post about the issue, and make sure you add the Rsubread tag - that way the author of that package will be notified of your question.

ADD REPLY • link 6.1 years ago Mike Smith ★ 6.6k