Overlap the snps
Entering edit mode
Last seen 7.0 years ago
Russian Federation

Hello! Could you help me please.

I use  predictCoding() to get information about nucleotide substitutions:

> nonsyn <- predictCoding(vcf_mod, txdb, Hsapiens)

Then I need to overlap the GRanges I've got from predictCoding() with the positition in
the SNPloc package:

> library(SNPlocs.Hsapiens.dbSNP.20100427)

Get the locations and alleles of all SNPs on chromosomes 1:

> ch1snps <- getSNPlocs("ch1")
> ch1gr <-getSNPlocs("ch1", as.GRanges=TRUE)

Overlap the snps with my ranges from predictCoding():

> mysnps <- GRanges("ch1", IRanges(865692:249150116, width=1))

When 865692 is the first start position for chr1 , 249150116 is the last start position for chr1.

> hits <- findOverlaps(mysnps, ch1gr)
> hits
Hits of length 1418032
queryLength: 242470673
subjectLength: 1422439

Pull out the snp that was hit:
> ch1gr[subjectHits(hits),]

GRanges with 1418032 ranges and 2 metadata columns

I don't understand which input is used for GRanges?I want to apply GRanges to my nonsyn.

my nonsyn table contains 20859 rows for all chromosomes.

i need to overlap only ranges for chr2 in nonsyn (2294 rows), but hits contain too many overlaps.

I know how overlap 1 range:

> mysnps <- GRanges("ch1", IRanges(865692, width=1))

but how overlape all ranges for chr1 from nonsyn?

Thank you!

overlap snps granges iranges • 1.2k views
Entering edit mode
Last seen 7 weeks ago
United States

If all you want to know is whether a coding variant is a SNP, then you might do this:

vr_mod <- as(vcf_mod, "VRanges")
vr_dbsnp <- readVcfAsVRanges("dbsnp.vcf", genome="hg19")
# may need to adapt your seqnameStyle here so that things match
# then
vr_mod$dbSNP <- vr_mod %in% vr_dbsnp
vr_coding <- predictCoding(vr_mod, txdb, Hsapiens)

Then you have your coding predictions and dbSNP concordance status all in one place. This does require downloading and reading the dbSNP VCF file. It would also be possible to convert the SNPlocs data to a VRanges, but it's a bit painful with the ambiguity codes. I think this part of Bioconductor needs some attention next release cycle.

Entering edit mode
Last seen 7 hours ago
Seattle, WA, United States

Hi Anastasiya,

Here is how you can proceed to map the genomic ranges returned by predictCoding() to the SNPs from the SNPlocs package.

nonsyn <- predictCoding(vcf_mod, txdb, Hsapiens)
ch1gr <- getSNPlocs("ch1", as.GRanges=TRUE)
seqlevelsStyle(ch1gr) <- "UCSC"
genome(ch1gr) <- "hg19"
hits <- findOverlaps(nonsyn, ch1gr)

Before we go further, we check that the mapping from nonsyn to ch1gr is not ambiguous, i.e. that the genomic ranges returned by predictCoding() are mapped to at most 1 SNP from the SNPlocs package:

stopifnot(sum(duplicated(queryHits(hits))) == 0)

Then we compute the mapping from nonsyn to ch1gr. This will be an integer vector parallel to nonsyn i.e. of the same length as nonsyn:

nonsyn_to_ch1gr <- as.integer(as(hits, "List"))

Note that if we didn't care about the risk of ambiguous mapping, we could have obtained nonsyn_to_ch1gr directly with findOverlaps(nonsyn, ch1gr, select="first")

Now we can use nonsyn_to_ch1gr to add RefSNP_id and alleles_as_ambig metadata column to nonsyn:

mcols(nonsyn)$RefSNP_id <- mcols(ch1gr)$RefSNP_id[nonsyn_to_ch1gr]
mcols(nonsyn)$alleles_as_ambig <- mcols(ch1gr)$alleles_as_ambig[nonsyn_to_ch1gr]

2 notes:

  1. Not all ranges in nonsyn will necessarily be mapped to a SNP from the SNPlocs package. In that case, the RefSNP_id and alleles_as_ambig metadata columns added to nonsyn will contain NAs.
  2. The alleles in the alleles_as_ambig metadata column are always reported with respect to the plus strand.

Hope this helps,




Entering edit mode

Thank you very much!It works)


Login before adding your answer.

Traffic: 244 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6