0
2.7 years ago by
United States
Matthew McCormack180 wrote:

I am following the CrispRVarients User Guide and when I get to this command:

reference=system(sprintf("B621_279062w_CF5_TAIR10_sort.bam %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE)

I always get this error message:

'CreateProcess' failed to run 'D:\NGS_data\WEN_CR~1\ANALYS~1\BWA\B621_2~2.BAM chr5:24858967-24859213'

I am running CrispRVariants on R 3.3.2,  x86_64-w64-mingw32, windows10, in RStudio.

I have tried adding full path to the .bam file with backslashes, double backslashes ( ...\\NGS_data\\Wen...), forward slashes and double forward slashes, but all give the same error.

Matthew

crisprvariants • 502 views
modified 2.7 years ago by helen.lindsay0 • written 2.7 years ago by Matthew McCormack180
0
2.7 years ago by
European Union
helen.lindsay0 wrote:

Hi Matthew,

This command in the User Guide uses the program samtools faidx to fetch the guide sequence from the reference genome.  The reference genome should be in .fasta format.  In your command above, you are using a .bam file of mapped reads rather than the reference genome.  The command is also missing the call to samtools.  So, if your reference genome is called genome.fasta, the command should look like this:

reference=system(sprintf("samtools faidx genome.fasta %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE)

For this to work, you will need to have samtools installed.  You can use the R function file.exists() to check if your filenames are correct.

If you have mapped your reads to a single amplicon sequence rather than a full genome, an alternative to using samtools is to read in the amplicon sequence using Biostrings then select the reference sequence using substr(), e.g.

amplicon <- Biostrings::readDNAStringSet("genome.fasta")
reference <- substr(amplicon, guide_start, guide_end)


Or if you already know the sequence of the region you want to use, you could create the reference sequence directly, e.g. reference <- Biostrings::DNAString("AACCTTGG")

Hope this helps, and happy new year!

Helen

samtools isn't available on Windows (or at least not as a part of the samtools distribution). Fasta files can be indexed with Rsamtools::indexFa(), and indexed fasta files can be read in using genomic ranges using getSeq(); see ?"getSeq,FaFile-method". Is this step an essential part of the work flow? If so, then either the code & vignette should be modified to use Rsamtools, or the package should be marked as not supported on windows. For the latter, the package maintainer (helen) should ask for guidance on the bioc-devel mailing list. There have been reports of problems with indexing (large) fasta files on windows using indexFa, so even the use of Rsamtools may be risky.