For the first question, you will use the GenomicRanges package and a package or other source with relevant reference genome, e.g.,
library(GenomicRanges)
library("BSgenome.Hsapiens.UCSC.hg38")
Create or input (e.g., with VariantExperiment::readVCF()
or rtracklayer::import()
) a GRanges
object representing the SNPs
snps <- GRanges(
c("chr1:1000000", "chr2:20000000", "chr3:30000000")
)
Use flank()
to define the regions of interest
flanks <- flank(snps, 100)
Retrieve the flanking sequences from the relevant genome
getSeq(BSgenome.Hsapiens.UCSC.hg38, flanks)
Packages with whole-genome sequences start with "BSgenome", so use
BiocManager::available("BSgenome")
to see available genomes, followed by BiocManager::install(full_package_name)
to install it.
Alternatively, search the AnnotationHub for whole genome sequences, especially from Ensembl
> library(AnnotationHub)
> hub = AnnotationHub()
> query(hub, c("ailMel1", "2bit", "release-96"))
AnnotationHub with 4 records
# snapshotDate(): 2019-07-10
# $dataprovider: Ensembl
# $species: Ailuropoda melanoleuca
# $rdataclass: TwoBitFile
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH69768"]]'
title
AH69768 | Ailuropoda_melanoleuca.ailMel1.cdna.all.2bit
AH69769 | Ailuropoda_melanoleuca.ailMel1.dna_rm.toplevel.2bit
AH69770 | Ailuropoda_melanoleuca.ailMel1.dna_sm.toplevel.2bit
AH69771 | Ailuropoda_melanoleuca.ailMel1.ncrna.2bit
and use that
seq = hub[["AH69769"]]
getSeq(seq, flanks)