Question

Using injectSNPs with a custom set of SNPs

0

Entering edit mode

Stevie Pederson • 0

@steve-pederson-23427

Last seen 10 days ago

Australia

Hi,

I'm sorry if this is a dumb question, but I have a set of SNPs obtained from a large cohort study which can't be released as a SNPlocs package. Is there any way (or alternative strategies) for using injectSNPs() to modify a BSGenome object without having to create a SNPlocs package? It's obviously quite trivial to create a GRanges object with this kind of information so I'd hoped that a simple method for a GRanges object might exist somewhere, but I haven't been able to find it yet.

I haven't been able to spot any alternative strategies for forging a local SNPlocs package either. Do these instructions exist somewhere?

Thanks in advance,

Stevie

SNPlocs BSgenome • 579 views

ADD COMMENT • link written 6 months ago by Stevie Pederson • 0

score 1 · Answer 1 · 2023-10-20

Hi,

Unfortunately injectSNPs() can not handle a GRanges or GPos object at the moment.

However, as long as you have access to the positions and alleles of the SNPs for each chromosome, you should be able to make your own substitutions with replaceLetterAt() from the Biostrings package. For example, let's say your SNPs are stored in a GPos object my_snps similar to the one returned by snpsBySeqname() (that is, with the alleles stored as IUPAC ambiguity codes in the alleles_as_ambig metadata column), then you should be able to do something like this:

library(BSgenome.Hsapiens.UCSC.hg38)
genome <- BSgenome.Hsapiens.UCSC.hg38

## Make sure 'my_snps' and 'genome' use the same chromosome naming convention:
seqinfo(genome)
seqinfo(my_snps)

## Inject SNPs in chr1:
chr1snps <- my_snps[seqnames(my_snps) == "chr1"] 
chr1altered <- replaceLetterAt(genome$chr1, at=pos(chr1snps), mcols(chr1snps)$alleles_as_ambig)

See ?replaceLetterAt for more information about how to use replaceLetterAt().

As for creating your own SNPlocs package, the tools that are currently available for this are in the SNPlocsForge package. Unfortunately the tools in the package are hard to use and inefficient. Plus they only work with the huge JSON files provided by the dbSNP folks, which require an insane amount of computing power to parse. See some discussion about this here. In other words, the package would require a lot of improvements before it can be used by someone else. This is why I've not submitted it to Bioconductor yet.

Hope this helps,

H.