stranded intronic variants with VariantAnnotation::locateVariants()
1
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
This is implemented in v 1.9.7. locateVariants() now returns the strand of the subject that was hit except for IntergenicVariants(). The intergenic case returns multiple precede and follow gene id's. When 'ignore.strand=TRUE' genes on both strands are searched and the result can be a mixture of '+' and '-'. For this case the strand returned is '*'. When 'ignore.strand=FALSE' only genes on the same strand as the 'query' are searched so the return strand matches the query. Valerie On 10/18/2013 02:41 PM, Robert Castelo wrote: > Great! thanks a lot Valerie!! > > robert. > > On 10/18/13 10:19 PM, Valerie Obenchain wrote: >> Hi Robert, >> >> Yes, I can add that. I'll let you know when it's done. >> >> Valerie >> >> On 10/17/2013 04:01 AM, Robert Castelo wrote: >>> hi, >>> >>> i have the following feature request for the VariantAnnotation package. >>> >>> currently, the function predictCoding() annotates the strand of variants >>> within exons according to a given gene annotation. would it be possible >>> that the function locateVariants() in the VariantAnnotation package >>> annotates the strand for intronic variants? >>> >>> introns are non-coding, and therefore, not annotated with >>> predictCoding(), but are stranded (GT-AG). >>> >>> here goes a code snippet that illustrates what i'm talking about >>> (adapted from the vignette): >>> >>> ================= >>> library(VariantAnnotation) >>> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >>> >>> fl <- system.file("extdata", "chr22.vcf.gz", >>> package="VariantAnnotation") >>> vcf <- readVcf(fl, "hg19") >>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >>> seqlevels(vcf) <- "chr22" >>> rd <- rowData(vcf) >>> loc <- locateVariants(rd, txdb, IntronVariants()) >>> >>> head(loc, n=3) >>> GRanges with 3 ranges and 7 metadata columns: >>> seqnames ranges strand | LOCATION QUERYID >>> TXID CDSID GENEID >>> <rle> <iranges> <rle> | <factor> <integer> >>> <integer> <integer> <character> >>> [1] chr22 [50300078, 50300078] * | intron 1 >>> 75253 <na> 79087 >>> [2] chr22 [50300086, 50300086] * | intron 2 >>> 75253 <na> 79087 >>> [3] chr22 [50300101, 50300101] * | intron 3 >>> 75253 <na> 79087 >>> PRECEDEID FOLLOWID >>> <characterlist> <characterlist> >>> [1] >>> [2] >>> [3] >>> --- >>> seqlengths: >>> chr22 >>> NA >>> ================= >>> >>> i.e., the strand column is set to * for the intronic variants. it's ok >>> if this new feature would be added to the devel version, as happens >>> normally with new features. >>> >>> >>> thanks! >>> robert. >>> ps: sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 >>> [2] GenomicFeatures_1.14.0 >>> [3] AnnotationDbi_1.24.0 >>> [4] Biobase_2.22.0 >>> [5] VariantAnnotation_1.8.0 >>> [6] Rsamtools_1.14.1 >>> [7] Biostrings_2.30.0 >>> [8] GenomicRanges_1.14.1 >>> [9] XVector_0.2.0 >>> [10] IRanges_1.20.0 >>> [11] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 >>> stats4_3.0.2 >>> [9] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.8.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
Annotation Annotation • 929 views
ADD COMMENT
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 19 days ago
Barcelona/Universitat Pompeu Fabra
Wonderful!! thanks a lot!! robert. On 11/6/13 1:15 AM, Valerie Obenchain wrote: > This is implemented in v 1.9.7. locateVariants() now returns the > strand of the subject that was hit except for IntergenicVariants(). > > The intergenic case returns multiple precede and follow gene id's. > When 'ignore.strand=TRUE' genes on both strands are searched and the > result can be a mixture of '+' and '-'. For this case the strand > returned is '*'. When 'ignore.strand=FALSE' only genes on the same > strand as the 'query' are searched so the return strand matches the > query. > > Valerie > > > > On 10/18/2013 02:41 PM, Robert Castelo wrote: >> Great! thanks a lot Valerie!! >> >> robert. >> >> On 10/18/13 10:19 PM, Valerie Obenchain wrote: >>> Hi Robert, >>> >>> Yes, I can add that. I'll let you know when it's done. >>> >>> Valerie >>> >>> On 10/17/2013 04:01 AM, Robert Castelo wrote: >>>> hi, >>>> >>>> i have the following feature request for the VariantAnnotation >>>> package. >>>> >>>> currently, the function predictCoding() annotates the strand of >>>> variants >>>> within exons according to a given gene annotation. would it be >>>> possible >>>> that the function locateVariants() in the VariantAnnotation package >>>> annotates the strand for intronic variants? >>>> >>>> introns are non-coding, and therefore, not annotated with >>>> predictCoding(), but are stranded (GT-AG). >>>> >>>> here goes a code snippet that illustrates what i'm talking about >>>> (adapted from the vignette): >>>> >>>> ================= >>>> library(VariantAnnotation) >>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >>>> >>>> fl <- system.file("extdata", "chr22.vcf.gz", >>>> package="VariantAnnotation") >>>> vcf <- readVcf(fl, "hg19") >>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >>>> seqlevels(vcf) <- "chr22" >>>> rd <- rowData(vcf) >>>> loc <- locateVariants(rd, txdb, IntronVariants()) >>>> >>>> head(loc, n=3) >>>> GRanges with 3 ranges and 7 metadata columns: >>>> seqnames ranges strand | LOCATION QUERYID >>>> TXID CDSID GENEID >>>> <rle> <iranges> <rle> | <factor> <integer> >>>> <integer> <integer> <character> >>>> [1] chr22 [50300078, 50300078] * | intron 1 >>>> 75253 <na> 79087 >>>> [2] chr22 [50300086, 50300086] * | intron 2 >>>> 75253 <na> 79087 >>>> [3] chr22 [50300101, 50300101] * | intron 3 >>>> 75253 <na> 79087 >>>> PRECEDEID FOLLOWID >>>> <characterlist> <characterlist> >>>> [1] >>>> [2] >>>> [3] >>>> --- >>>> seqlengths: >>>> chr22 >>>> NA >>>> ================= >>>> >>>> i.e., the strand column is set to * for the intronic variants. it's ok >>>> if this new feature would be added to the devel version, as happens >>>> normally with new features. >>>> >>>> >>>> thanks! >>>> robert. >>>> ps: sessionInfo() >>>> R version 3.0.2 (2013-09-25) >>>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 >>>> [2] GenomicFeatures_1.14.0 >>>> [3] AnnotationDbi_1.24.0 >>>> [4] Biobase_2.22.0 >>>> [5] VariantAnnotation_1.8.0 >>>> [6] Rsamtools_1.14.1 >>>> [7] Biostrings_2.30.0 >>>> [8] GenomicRanges_1.14.1 >>>> [9] XVector_0.2.0 >>>> [10] IRanges_1.20.0 >>>> [11] BiocGenerics_0.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >>>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 >>>> stats4_3.0.2 >>>> [9] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.8.0 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >
ADD COMMENT

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6