injectSNPS only for common variants
1
0
Entering edit mode
arubio • 0
@arubio-11795
Last seen 6.9 years ago

 

 

Hi all, I am developing a method to select automatically primers to validate splicing events. I am using SNPlocs.Hsapiens.dbSNP144.GRCh37 and injectSNSPs to identify regions with genomic variants (to avoid placing the primers on them). So far, so good.

However, the number of SNPs is very high (around 150 million) and is almost impossible to find a sufficiently large region with no variants to place the primers. Is it possible to inject on the reference genome only the SNPs with a minor allele frequency (MAF) larger than a threshold (say 5%)? Is the information of the MAF somewhere in the annotation data of bioconductor?

Thanks,

Angel

 

 

snp snplocs • 1.5k views
ADD COMMENT
0
Entering edit mode

MAF is a population-specific parameter.  You may be able to get some information out of AnnotationHub; I can't verify as I have a terrible connection at the moment.

> query(ah, "Common SNPs")
AnnotationHub with 3 records
# snapshotDate(): 2016-10-11 
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH5105"]]' 


           title           

  AH5105 | Common SNPs(137)

  AH5108 | Common SNPs(135)

  AH5111 | Common SNPs(132)

It seems that UCSC has a table that will have the information

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp141Common.sql

so you may be able to get useful statistics with rtracklayer.  Another approach is to query the 1000 genomes VCFs; snpStats::col.summary will compute MAF, via VariantAnnotation::genotypesToSnpMatrix

ADD REPLY
0
Entering edit mode
Robert Castelo ★ 3.1k
@rcastelo
Last seen 1 day ago
Barcelona/Universitat Pompeu Fabra

Hi,

the MafDb.* annotation packages store MAF values from a number of sources:

MafDb.1Kgenomes.phase1.hs37d5
MafDb.1Kgenomes.phase3.hs37d5
MafDb.ESP6500SI.V2.SSA137.GRCh38
MafDb.ESP6500SI.V2.SSA137.hs37d5
MafDb.ExAC.r0.3.1.nonTCGA.snvs.hs37d5
MafDb.ExAC.r0.3.1.snvs.hs37d5

to access the values through those packages you should install first the package you need, then load it and use the function mafByOverlaps() or mafById(). type the following to see an example:

library(MafDb.1Kgenomes.phase3.hs37d5)
example(MafDb.1Kgenomes.phase3.hs37d5)

cheers,

robert.

ADD COMMENT

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6