Dear all,
I was just wondering, would it be possible to create a GmapGenome
from one of the genomic FaFiles
from AnnotationHub
? Might be eventually nice to have that as an option for GmapGenome
.
cheers, jo
Dear all,
I was just wondering, would it be possible to create a GmapGenome
from one of the genomic FaFiles
from AnnotationHub
? Might be eventually nice to have that as an option for GmapGenome
.
cheers, jo
This is pretty easy already. Just GmapGenome(getSeq(file), ...)
. I added something to devel for this to be automatic, but I have not tested it. Obviously, this is not very efficient, but it works.
Dear Michael,
yes, indeed, it's straight forward. Actually, I did already implement a method for FaFile
... in case you would like to add that...
## Add additional parameter "seqnames" that allows to export only specified ## sequences from the FaFile such that the index is build on them. setMethod("gmap_build", c("FaFile", "GmapGenome"), function(x, genome, seqnames=NULL, ...){ if(is.na(index(x))){ message("Creating missing index...", appendLF=FALSE) indexFa(x) x <- FaFile(path(x)) message("done") } ## The question is whether we should restrict to some chromosomes? faIndex <- scanFaIndex(x) if(!is.null(seqnames)){ ## Restrict to specified seqnames. seqnames <- as.character(seqnames) notPresent <- !(seqnames %in% seqnames(faIndex)) if(any(notPresent)) stop(paste0("Provided seqnames ", paste(seqnames[notPresent], collapse=", "), " not present in the FaFile!")) ## Subset the GRanges: faIndex <- faIndex[seqnames(faIndex) %in% seqnames] } ## Extract the sequences from the FaFile as DNAStringSet and ## export them a temporary file. dds <- scanFa(x, param=faIndex) tmpfile <- tempfile() export(dds, con=tmpfile, format="fasta") gmap_build(tmpfile, genome, ...) })
This would require that you import in addition FaFile
, scanFaIndex
, scanFa
, indexFa
from Rsamtools
.
Would be nice to have that (especially with the option to subset by seqnames... as usually the FaFiles, like "AH49186"
from AnnotationHub
with the genomic DNA sequence for GRCh38 contain a lot of additional sequences).
Thanks!
jo
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Judging from the help page of GmapGenome, is should be straightforward: