Entering edit mode
hi,
i have the following feature request for the VariantAnnotation
package.
currently, the function predictCoding() annotates the strand of
variants
within exons according to a given gene annotation. would it be
possible
that the function locateVariants() in the VariantAnnotation package
annotates the strand for intronic variants?
introns are non-coding, and therefore, not annotated with
predictCoding(), but are stranded (GT-AG).
here goes a code snippet that illustrates what i'm talking about
(adapted from the vignette):
=================
library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
fl <- system.file("extdata", "chr22.vcf.gz",
package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
seqlevels(vcf) <- "chr22"
rd <- rowData(vcf)
loc <- locateVariants(rd, txdb, IntronVariants())
head(loc, n=3)
GRanges with 3 ranges and 7 metadata columns:
seqnames ranges strand | LOCATION QUERYID
TXID CDSID GENEID
<rle> <iranges> <rle> | <factor> <integer>
<integer> <integer> <character>
[1] chr22 [50300078, 50300078] * | intron 1
75253 <na> 79087
[2] chr22 [50300086, 50300086] * | intron 2
75253 <na> 79087
[3] chr22 [50300101, 50300101] * | intron 3
75253 <na> 79087
PRECEDEID FOLLOWID
<characterlist> <characterlist>
[1]
[2]
[3]
---
seqlengths:
chr22
NA
=================
i.e., the strand column is set to * for the intronic variants. it's ok
if this new feature would be added to the devel version, as happens
normally with new features.
thanks!
robert.
ps: sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
[2] GenomicFeatures_1.14.0
[3] AnnotationDbi_1.24.0
[4] Biobase_2.22.0
[5] VariantAnnotation_1.8.0
[6] Rsamtools_1.14.1
[7] Biostrings_2.30.0
[8] GenomicRanges_1.14.1
[9] XVector_0.2.0
[10] IRanges_1.20.0
[11] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
[5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0
stats4_3.0.2
[9] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.8.0