Question

SNPs in multiple locations

0

Entering edit mode

Lna • 0

@lna-10651

Last seen 7.8 years ago

Dear all,

I used the locateVariants() function in the VariantAnnotation package to annotate a big list of SNPs and I'm having some problems interpreting the results.

I checked for all available locations (intergenic, intron, coding, fiveUTR, threeUTR, promoter and splicesite) and I found out that one can pick any two of these locations and there are always some SNPs which are assigned to both categories, e.g. rs9778016 (chr1:996184) is annotated to be located in an intron as well as in an intergenic region. I'm not so sure what this means.

Is the reason that genes and gene predictions in the UCSC browser come form different sources and accordingly can be annotated differently? So the contradicting annotations refer to different sources? Or is it more probable that there are some errors in the annotation?

In the case of rs9778016 I checked the output of the UCSC table browser. For chr1:996184. I obtained a table with intron regions for two genes (uc009vjs.1 and uc001acl.1). I would like to understand where the information that the SNP is located in an intergenic region comes from on the UCSC page. Is there any way in which I can reproduce this directly by entering the SNP position into the browser? To me it just looks like as if it is part of an intron.

Thank you for your help!

txdb.hsapiens.ucsc.hg19.knowngene ucsc annotation locatevariants variantannotation • 1.7k views

ADD COMMENT • link updated 7.8 years ago by Valerie Obenchain ★ 6.8k • written 7.8 years ago by Lna • 0

score 0 · Answer 1 · 2017-06-30

Hi,

On the locateVariants man page it says this under the 'Value' section:

A ‘GRanges’ object with a row for each variant-transcript match.

This means a single SNP may be classified as more than one feature and each feature is returned as a separate row. For your example it sounds like the SNP is in an intron region of a transcript but the transcript may not be part of a known gene. The term 'intergenic' means the range in 'query' does not fall in a gene range as defined by 'subject'.

I'm not sure what annotation you're using, I'll use TxDb.Hsapiens.UCSC.hg19.knownGene as an example.

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> gr <- GRanges("chr1", IRanges(996184, width=1))

locateVariants() uses the extractors in the GenomicFeatures package to extract features and then finds overlaps between the given ranges and features. Here we see the range falls in a couple of transcripts

> findOverlaps(gr, intronsByTranscript(txdb))
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1          65
  [2]         1          66
  -------
  queryLength: 1 / subjectLength: 82960

but does not fall in any known gene ranges.

> findOverlaps(gr, genes(txdb))
Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 1 / subjectLength: 23056

Therefore the SNP would be classified as both 'intron' and 'intergenic'.

Valerie