Create a GmapGenome from a FaFile from AnnotationHub
1
0
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 4 weeks ago
Italy

Dear all,

I was just wondering, would it be possible to create a GmapGenome from one of the genomic FaFiles from AnnotationHub? Might be eventually nice to have that as an option for GmapGenome.

cheers, jo

gmapr gmapgenome annotationhub fafile • 1.4k views
ADD COMMENT
0
Entering edit mode

Judging from the help page of GmapGenome, is should be straightforward:

Creates a ‘GmapGenome’ corresponding to the ‘genome’ argument, which may be either a string identifier of the genome within ‘directory’, a ‘FastaFile’ or ‘DNAStringSet’ of the genome sequence, or a ‘BSgenome’ object.
ADD REPLY
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

This is pretty easy already. Just GmapGenome(getSeq(file), ...). I added something to devel for this to be automatic, but I have not tested it. Obviously, this is not very efficient, but it works. 

ADD COMMENT
0
Entering edit mode

Dear Michael,

yes, indeed, it's straight forward. Actually, I did already implement a method for FaFile... in case you would like to add that...

## Add additional parameter "seqnames" that allows to export only specified
## sequences from the FaFile such that the index is build on them.
setMethod("gmap_build", c("FaFile", "GmapGenome"),
          function(x, genome, seqnames=NULL, ...){
              if(is.na(index(x))){
                  message("Creating missing index...", appendLF=FALSE)
                  indexFa(x)
                  x <- FaFile(path(x))
                  message("done")
              }
              ## The question is whether we should restrict to some chromosomes?
              faIndex <- scanFaIndex(x)
              if(!is.null(seqnames)){
                  ## Restrict to specified seqnames.
                  seqnames <- as.character(seqnames)
                  notPresent <- !(seqnames %in% seqnames(faIndex))
                  if(any(notPresent))
                      stop(paste0("Provided seqnames ",
                                  paste(seqnames[notPresent], collapse=", "),
                                  " not present in the FaFile!"))
                  ## Subset the GRanges:
                  faIndex <- faIndex[seqnames(faIndex) %in% seqnames]
              }
              ## Extract the sequences from the FaFile as DNAStringSet and
              ## export them a temporary file.
              dds <- scanFa(x, param=faIndex)
              tmpfile <- tempfile()
              export(dds, con=tmpfile, format="fasta")
              gmap_build(tmpfile, genome, ...)
          })


This would require that you import in addition FaFile, scanFaIndex, scanFa, indexFa from Rsamtools.

Would be nice to have that (especially with the option to subset by seqnames... as usually the FaFiles, like "AH49186" from AnnotationHub with the genomic DNA sequence for GRCh38 contain a lot of additional sequences).

Thanks!

jo

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I changed it to allow the user to pass a custom param argument, so there is some control. 

ADD REPLY

Login before adding your answer.

Traffic: 498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6