Search
Question: Gene symbol annotation not converted for intergenic SNPs with multiple genes
0
gravatar for noor.suaini
2.8 years ago by
noor.suaini0 wrote:

Hi,

I am new to R and am trying to use the Variant Annotation package to annotate my list of SNPs to genes within a distance of 50kb. For the intergenic SNPs, there are several gene values/IDs listed under the PRECEDEID and FOLLOWID columns. Symbol annotation doesn’t seem to work for these and I was wondering if anyone could suggest a solution to this? The intronic SNPs were successfully converted to gene symbols, only the intergenic ones with more than 1 gene were not successfully converted. I would also like to get the distance between the intergenic SNP and it's flanking genes. But i receive an error "Error: could not find function "elementNROWS". This was also the same when i tried elementLengths.  I have provided the codes used and session info below. Any help is appreciated, thank you!

ucscallchr <-read.csv(file = "UCSC output chb ceu all chr.csv")
target <- with(ucscallchr,
               GRanges( seqnames = Rle(chr),
                        ranges   = IRanges(start, end=end, names=rsid),
                        strand   = Rle(strand("*")) ) )

loc_all <- locateVariants(target, txdb, AllVariants(intergenic=IntergenicVariants(50000,50000)))
names(loc_all) <- NULL
out <- as.data.frame(loc_all)
out$names <- names(target)[ out$QUERYID ]
out <- out[ , c("names", "seqnames", "start", "end", "LOCATION", "GENEID", "PRECEDEID", "FOLLOWID")]
out <- unique(out)

#annotating
Symbol2id <- as.list( org.Hs.egSYMBOL2EG )
id2Symbol <- rep( names(Symbol2id), sapply(Symbol2id, length) )
names(id2Symbol) <- unlist(Symbol2id)

x <- unique( with(out, c(levels(GENEID), levels(PRECEDEID), levels(FOLLOWID))) )
table( x %in% names(id2Symbol) ) 

out$GENESYMBOL <- id2Symbol[ as.character(out$GENEID) ]
out$PRECEDESYMBOL <- id2Symbol[ as.character(out$PRECEDEID) ]
out$FOLLOWSYMBOL <- id2Symbol[ as.character(out$FOLLOWID) ]
head(out)

#intergenic SNPs

intergenic50 <- subset(out, LOCATION == 'intergenic')

p_ids <- unlist(intergenic50$PRECEDEID, use.names=FALSE)
exp_ranges <- rep(intergenic50, elementNROWS(intergenic50$PRECEDEID))
Error: could not find function "elementNROWS"

p_ids <- unlist(loc_all$PRECEDEID, use.names=FALSE) 
exp_ranges <- rep(loc_all,  elementLengths(loc_int$PRECEDEID))
Error in x[rep(seq_len(length(x)), ...)] : 
  error in evaluating the argument 'i' in selecting a method for function '[': Error in rep(seq_len(length(x)), ...) : invalid 'times' argument
SessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Scerevisiae.UCSC.sacCer1_1.4.0 BSgenome_1.38.0                        
 [3] rtracklayer_1.30.2                      BiocInstaller_1.20.1                   
 [5] org.Hs.eg.db_3.2.3                      RSQLite_1.0.0                          
 [7] DBI_0.3.1                               TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [9] GenomicFeatures_1.22.13                 AnnotationDbi_1.32.3                   
[11] VariantAnnotation_1.16.4                Rsamtools_1.22.0                       
[13] Biostrings_2.38.4                       XVector_0.10.0                         
[15] SummarizedExperiment_1.0.2              Biobase_2.30.0                         
[17] GenomicRanges_1.22.4                    GenomeInfoDb_1.6.3                     
[19] IRanges_2.4.8                           S4Vectors_0.8.11                       
[21] BiocGenerics_0.16.1                    

loaded via a namespace (and not attached):
 [1] GenomicAlignments_1.6.3 zlibbioc_1.16.0         BiocParallel_1.4.3      tools_3.2.3          [5] lambda.r_1.1.7          futile.logger_1.4.1     futile.options_1.0.0    bitops_1.0-6        [9] RCurl_1.95-4.8          biomaRt_2.26.1          XML_3.98-1.4           
ADD COMMENTlink modified 2.8 years ago by Valerie Obenchain ♦♦ 6.7k • written 2.8 years ago by noor.suaini0
0
gravatar for Valerie Obenchain
2.8 years ago by
Valerie Obenchain ♦♦ 6.7k
United States
Valerie Obenchain ♦♦ 6.7k wrote:

Hi Noor,

There is an example at the bottom of the ?locateVariants man page that shows how to annotate intergenic variants, see 'Variants in Intergenic Regions'. The code above is patterned off that example.

You mentioned off -line that this code was not written by you. The sessionInfo() is for R-release but the code uses elementNROWS() which is only available in devel. My guess is that whoever wrote the code was using devel. The example also works in release but you need to follow the example on the release man page.

library(VariantAnnotation)

?locateVariants

Walk through the example so you understand the steps. It can then be modified for your files/annotations above. Let me know if you have problems.

Valerie 

 

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Valerie Obenchain ♦♦ 6.7k

Hi Valerie,

Thanks for your reply. I have actually tried the commands in that section on how to annotate intergenic variants. While the commands did work, it did not actually solve my problem. Variants with more than 1 PRECEDEID/FOLLOWID still did not get converted to a symbol. For example, the PRECEDEID column had only 64902 listed while FOLLOWID column has c("134218", "55299", "5810") listed. 64902 was successfully converted to a symbol but not the c("134218", "55299", "5810"), it was converted to NA.

Is there anything else I could try? Thanks Valerie.

Noor

ADD REPLYlink written 2.8 years ago by noor.suaini0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 357 users visited in the last hour