Error with Rsubread at first index: normalizePath(path.expand(path), winslash, mustWork)
2
0
Entering edit mode
@peterrhoyt-22385
Last seen 8 months ago
United States

I was trying to start an scRNAseq experiment by generating a count matrix with Rsubread. This is using RStudio Version 1.1.453 and R3.6.1. On a Windows 10 machine. The tutorial to start the project works perfectly:

library(Rsubread) ref <- system.file("extdata","reference.fa",package="Rsubread") buildindex(basename="reference_index",reference=ref)

Everything runs exactly as it says in the vignette at: https://bioconductor.org/packages/3.10/bioc/vignettes/Rsubread/inst/doc/Rsubread.pdf

But my reference genome from NCBI (concatenated into one file) gives an error when building the index.

ref <- system.file("extdata","cat9ref.fa",package="Rsubread") buildindex(basename="reference_index",reference=ref) Error in normalizePath(path.expand(path), winslash, mustWork) : path[1]="": The filename, directory name, or volume label syntax is incorrect

I'm fairly new to R, but this looks like a formatting error. Couldn't find anything by google that helped. Hopefully someone has seen this and can tell me what I'm doing wrong. Thanks Pete

normalization • 6.9k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 17 hours ago
EMBL Heidelberg

The function system.file() is a command that knows where a particular R package (specified by the PACKAGE argument) has been installed, and will construct a full path to files installed as part of that package.

That means it's very useful for example workflows in package vignettes, where a small dataset is distributed with the package and the vignette author doesn't need to care about exactly where each user has install it - system.file() takes care of it.

However it's unecessary when you're working with your own data. You can just provide the path directly yourself e.g. ref <- "/path/to/cat9ref.fa"

ADD COMMENT
0
Entering edit mode

Thanks, that got me past step 1. I literally only had to put in: ref <- "cat9ref.fa" That's the answer!

I still got ERROR: repeated chromosome name 'ref' is observed in the FASTA file(s).

Can I ask about that here or start a new thread?

ADD REPLY
0
Entering edit mode

I don't know for certain, but that error sounds like you have duplicated some chromosome names when concatenating your single reference file. I guess there's quite a few ways that could happen, e.g.

  • the individual files did not have distinct names
  • you genuinely duplicated something
  • your sequences have names like ref 1 and ref 2 but Rsubread doesn't like spaces

Those are all guesses, but I would check the reference FASTA file to try and identify the cause. If it's not obvious then I would start a new post about the issue, and make sure you add the Rsubread tag - that way the author of that package will be notified of your question.

ADD REPLY
0
Entering edit mode
@peterrhoyt-22385
Last seen 8 months ago
United States

Turns out the fasta files downloaded from NCBI ALL started with ">ref|" in the header.

All I had to do was replace the ">ref|" with ">" to get the program to run. I used a sed script:

sed 's/>[^|]*|/>/; ' cat9ref.fa > cat9ref2.fa

Then checked the head of the file, and made sure there were no more "ref|" anywhere:

grep "ref" cat9ref2.fa > catrefs.txt

The catrefs.txt was a blank file. so I overwrote the old file

mv cat9ref2.fa catref.fa

Then re-ran.

> ref <- "cat9ref.fa"
> buildindex(basename="cat9_index",reference=ref)

The index was built. But I am moving to our cluster now as it took 8.5 hours.

Thanks! Pete

ADD COMMENT

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6