loading .bam file in CrispRVarients

0

Entering edit mode

Matthew McCormack ▴ 180

@matthew-mccormack-2021

Last seen 9 weeks ago

United States

I am following the CrispRVarients User Guide and when I get to this command:

reference=system(sprintf("B621_279062w_CF5_TAIR10_sort.bam %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE)

I always get this error message:

'CreateProcess' failed to run 'D:\NGS_data\WEN_CR~1\ANALYS~1\BWA\B621_2~2.BAM chr5:24858967-24859213'

I am running CrispRVariants on R 3.3.2, x86_64-w64-mingw32, windows10, in RStudio.

I have tried adding full path to the .bam file with backslashes, double backslashes ( ...\\NGS_data\\Wen...), forward slashes and double forward slashes, but all give the same error.

Matthew

crisprvariants • 1.0k views

ADD COMMENT • link updated 7.3 years ago by helen.lindsay • 0 • written 7.3 years ago by Matthew McCormack ▴ 180

0

Entering edit mode

helen.lindsay • 0

@helenlindsay-8595

Last seen 7.1 years ago

European Union

Hi Matthew,

This command in the User Guide uses the program samtools faidx to fetch the guide sequence from the reference genome. The reference genome should be in .fasta format. In your command above, you are using a .bam file of mapped reads rather than the reference genome. The command is also missing the call to samtools. So, if your reference genome is called genome.fasta, the command should look like this:

reference=system(sprintf("samtools faidx genome.fasta %s:%s-%s", seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]), intern = TRUE)

For this to work, you will need to have samtools installed. You can use the R function file.exists() to check if your filenames are correct.

If you have mapped your reads to a single amplicon sequence rather than a full genome, an alternative to using samtools is to read in the amplicon sequence using Biostrings then select the reference sequence using substr(), e.g.

amplicon <- Biostrings::readDNAStringSet("genome.fasta")
reference <- substr(amplicon, guide_start, guide_end)

Or if you already know the sequence of the region you want to use, you could create the reference sequence directly, e.g. reference <- Biostrings::DNAString("AACCTTGG")

Hope this helps, and happy new year!

Helen

ADD COMMENT • link 7.3 years ago helen.lindsay • 0

0

Entering edit mode

samtools isn't available on Windows (or at least not as a part of the samtools distribution). Fasta files can be indexed with Rsamtools::indexFa(), and indexed fasta files can be read in using genomic ranges using getSeq(); see ?"getSeq,FaFile-method". Is this step an essential part of the work flow? If so, then either the code & vignette should be modified to use Rsamtools, or the package should be marked as not supported on windows. For the latter, the package maintainer (helen) should ask for guidance on the bioc-devel mailing list. There have been reports of problems with indexing (large) fasta files on windows using indexFa, so even the use of Rsamtools may be risky.

ADD REPLY • link 7.3 years ago Martin Morgan 25k

Login before adding your answer.