Dear all,
I used the locateVariants() function in the VariantAnnotation package to annotate a big list of SNPs and I'm having some problems interpreting the results.
I checked for all available locations (intergenic, intron, coding, fiveUTR, threeUTR, promoter and splicesite) and I found out that one can pick any two of these locations and there are always some SNPs which are assigned to both categories, e.g. rs9778016 (chr1:996184) is annotated to be located in an intron as well as in an intergenic region. I'm not so sure what this means.
Is the reason that genes and gene predictions in the UCSC browser come form different sources and accordingly can be annotated differently? So the contradicting annotations refer to different sources? Or is it more probable that there are some errors in the annotation?
In the case of rs9778016 I checked the output of the UCSC table browser. For chr1:996184. I obtained a table with intron regions for two genes (uc009vjs.1 and uc001acl.1). I would like to understand where the information that the SNP is located in an intergenic region comes from on the UCSC page. Is there any way in which I can reproduce this directly by entering the SNP position into the browser? To me it just looks like as if it is part of an intron.
Thank you for your help!
Thank you very much for the detailed explanation. Sorry, I forgot to add the annotation source, yes, I also used TxDb.Hsapiens.UCSC.hg19.knownGene. Now I understand how the output is generated based on the annotation package, but I still don't really understand how to relate it to the information on the UCSC page. I don't have much experience using the UCSC page, maybe I get something wrong!? Given the position I mentioned (chr1:996184) the result is: Intron region in genes AK310350 and BC033949 on the UCSC genes track in the genome browser (uc009vjs.1 and uc001acl.1 in the table browser). According to your example query, aren't these "known genes"? I would be really grateful if you could comment on that...!