Create a GmapGenome from a FaFile from AnnotationHub
1
0
Entering edit mode
Johannes Rainer ★ 1.9k
@johannes-rainer-6987
Last seen 12 days ago
Italy

Dear all,

I was just wondering, would it be possible to create a GmapGenome from one of the genomic FaFiles from AnnotationHub? Might be eventually nice to have that as an option for GmapGenome.

cheers, jo

gmapr gmapgenome annotationhub fafile • 684 views
0
Entering edit mode

Judging from the help page of GmapGenome, is should be straightforward:

Creates a ‘GmapGenome’ corresponding to the ‘genome’ argument, which may be either a string identifier of the genome within ‘directory’, a ‘FastaFile’ or ‘DNAStringSet’ of the genome sequence, or a ‘BSgenome’ object.
1
Entering edit mode
@michael-lawrence-3846
Last seen 7 weeks ago
United States

This is pretty easy already. Just GmapGenome(getSeq(file), ...). I added something to devel for this to be automatic, but I have not tested it. Obviously, this is not very efficient, but it works.

0
Entering edit mode

Dear Michael,

yes, indeed, it's straight forward. Actually, I did already implement a method for FaFile... in case you would like to add that...

## Add additional parameter "seqnames" that allows to export only specified
## sequences from the FaFile such that the index is build on them.
setMethod("gmap_build", c("FaFile", "GmapGenome"),
function(x, genome, seqnames=NULL, ...){
if(is.na(index(x))){
message("Creating missing index...", appendLF=FALSE)
indexFa(x)
x <- FaFile(path(x))
message("done")
}
## The question is whether we should restrict to some chromosomes?
faIndex <- scanFaIndex(x)
if(!is.null(seqnames)){
## Restrict to specified seqnames.
seqnames <- as.character(seqnames)
notPresent <- !(seqnames %in% seqnames(faIndex))
if(any(notPresent))
stop(paste0("Provided seqnames ",
paste(seqnames[notPresent], collapse=", "),
" not present in the FaFile!"))
## Subset the GRanges:
faIndex <- faIndex[seqnames(faIndex) %in% seqnames]
}
## Extract the sequences from the FaFile as DNAStringSet and
## export them a temporary file.
dds <- scanFa(x, param=faIndex)
tmpfile <- tempfile()
export(dds, con=tmpfile, format="fasta")
gmap_build(tmpfile, genome, ...)
})

This would require that you import in addition FaFile, scanFaIndex, scanFa, indexFa from Rsamtools.

Would be nice to have that (especially with the option to subset by seqnames... as usually the FaFiles, like "AH49186" from AnnotationHub with the genomic DNA sequence for GRCh38 contain a lot of additional sequences).

Thanks!

jo

0
Entering edit mode

Thanks for the suggestion. I changed it to allow the user to pass a custom param argument, so there is some control.