loading .bam file in CrispRVarients
1
0
Entering edit mode
@matthew-mccormack-2021
Last seen 9 months ago
United States

I am following the CrispRVarients User Guide and when I get to this command:

reference=system(sprintf("B621_279062w_CF5_TAIR10_sort.bam %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE) 

I always get this error message:

'CreateProcess' failed to run 'D:\NGS_data\WEN_CR~1\ANALYS~1\BWA\B621_2~2.BAM chr5:24858967-24859213'

I am running CrispRVariants on R 3.3.2,  x86_64-w64-mingw32, windows10, in RStudio.

I have tried adding full path to the .bam file with backslashes, double backslashes ( ...\\NGS_data\\Wen...), forward slashes and double forward slashes, but all give the same error. 

Matthew

crisprvariants • 1.1k views
ADD COMMENT
0
Entering edit mode
@helenlindsay-8595
Last seen 7.7 years ago
European Union

Hi Matthew,

This command in the User Guide uses the program samtools faidx to fetch the guide sequence from the reference genome.  The reference genome should be in .fasta format.  In your command above, you are using a .bam file of mapped reads rather than the reference genome.  The command is also missing the call to samtools.  So, if your reference genome is called genome.fasta, the command should look like this:

reference=system(sprintf("samtools faidx genome.fasta %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE) 

For this to work, you will need to have samtools installed.  You can use the R function file.exists() to check if your filenames are correct.

If you have mapped your reads to a single amplicon sequence rather than a full genome, an alternative to using samtools is to read in the amplicon sequence using Biostrings then select the reference sequence using substr(), e.g.

amplicon <- Biostrings::readDNAStringSet("genome.fasta")
reference <- substr(amplicon, guide_start, guide_end)

Or if you already know the sequence of the region you want to use, you could create the reference sequence directly, e.g. reference <- Biostrings::DNAString("AACCTTGG")

Hope this helps, and happy new year!

Helen

 

ADD COMMENT
0
Entering edit mode

samtools isn't available on Windows (or at least not as a part of the samtools distribution). Fasta files can be indexed with Rsamtools::indexFa(), and indexed fasta files can be read in using genomic ranges using getSeq(); see ?"getSeq,FaFile-method". Is this step an essential part of the work flow? If so, then either the code & vignette should be modified to use Rsamtools, or the package should be marked as not supported on windows. For the latter, the package maintainer (helen) should ask for guidance on the bioc-devel mailing list. There have been reports of problems with indexing (large) fasta files on windows using indexFa, so even the use of Rsamtools may be risky.

ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6