Error with Rsubread at first index: normalizePath(path.expand(path), winslash, mustWork)
2
0
Entering edit mode
@peterrhoyt-22385
Last seen 5 months ago
United States

I was trying to start an scRNAseq experiment by generating a count matrix with Rsubread. This is using RStudio Version 1.1.453 and R3.6.1. On a Windows 10 machine. The tutorial to start the project works perfectly:

But my reference genome from NCBI (concatenated into one file) gives an error when building the index.

ref <- system.file("extdata","cat9ref.fa",package="Rsubread") buildindex(basename="reference_index",reference=ref) Error in normalizePath(path.expand(path), winslash, mustWork) : path[1]="": The filename, directory name, or volume label syntax is incorrect

I'm fairly new to R, but this looks like a formatting error. Couldn't find anything by google that helped. Hopefully someone has seen this and can tell me what I'm doing wrong. Thanks Pete

normalization • 3.2k views
1
Entering edit mode
Mike Smith ★ 5.8k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg / de.NBI

The function system.file() is a command that knows where a particular R package (specified by the PACKAGE argument) has been installed, and will construct a full path to files installed as part of that package.

That means it's very useful for example workflows in package vignettes, where a small dataset is distributed with the package and the vignette author doesn't need to care about exactly where each user has install it - system.file() takes care of it.

However it's unecessary when you're working with your own data. You can just provide the path directly yourself e.g. ref <- "/path/to/cat9ref.fa"

0
Entering edit mode

Thanks, that got me past step 1. I literally only had to put in: ref <- "cat9ref.fa" That's the answer!

I still got ERROR: repeated chromosome name 'ref' is observed in the FASTA file(s).

0
Entering edit mode

I don't know for certain, but that error sounds like you have duplicated some chromosome names when concatenating your single reference file. I guess there's quite a few ways that could happen, e.g.

• the individual files did not have distinct names
• you genuinely duplicated something
• your sequences have names like ref 1 and ref 2 but Rsubread doesn't like spaces

Those are all guesses, but I would check the reference FASTA file to try and identify the cause. If it's not obvious then I would start a new post about the issue, and make sure you add the Rsubread tag - that way the author of that package will be notified of your question.

0
Entering edit mode
@peterrhoyt-22385
Last seen 5 months ago
United States

All I had to do was replace the ">ref|" with ">" to get the program to run. I used a sed script:

sed 's/>[^|]*|/>/; ' cat9ref.fa > cat9ref2.fa


Then checked the head of the file, and made sure there were no more "ref|" anywhere:

grep "ref" cat9ref2.fa > catrefs.txt


The catrefs.txt was a blank file. so I overwrote the old file

mv cat9ref2.fa catref.fa


Then re-ran.

> ref <- "cat9ref.fa"
> buildindex(basename="cat9_index",reference=ref)


The index was built. But I am moving to our cluster now as it took 8.5 hours.

Thanks! Pete