Question: VariantAnnotation ALT Field
0
gravatar for Samuel Younkin
6.6 years ago by
Samuel Younkin60 wrote:
I have been looking at the VariantAnnotation vignette and have encountered something strange. The R code is below. See how the ALT field lists only ########. The vignette, however, correctly shows the alternate allele. The data file chr22.vcf.gz also correctly contains the alternate allele information. Any suggestions? Thanks. Sam ~~ > library(VariantAnnotation) > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") > vcf <- readVcf(fl, "hg19") > head( fixed(vcf), 3 ) GRanges with 3 ranges and 5 metadata columns: seqnames ranges strand | paramRangeID <rle> <iranges> <rle> | <factor> rs7410291 22 [50300078, 50300078] * | <na> rs147922003 22 [50300086, 50300086] * | <na> rs114143073 22 [50300101, 50300101] * | <na> REF ALT QUAL FILTER <dnastringset> <dnastringsetlist> <numeric> <character> rs7410291 A ######## 100 PASS rs147922003 C ######## 100 PASS rs114143073 G ######## 100 PASS --- seqlengths: 22 NA > sessionInfo() R version 2.15.2 Patched (2012-10-28 r61038) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 >
variantannotation • 719 views
ADD COMMENTlink modified 6.6 years ago by Paul Shannon750 • written 6.6 years ago by Samuel Younkin60
Answer: VariantAnnotation ALT Field
0
gravatar for Paul Shannon
6.6 years ago by
Paul Shannon750
Paul Shannon750 wrote:
Hi Sam, Here's a quick workaround: fixed(vcf)[ , c("REF", "ALT")] The backstory on this is that the ALT field is a DNAStringSetList which, until very recently (the change is in bioc-devel) displayed itself, via its show methods, as '######'. Realizing this was somewhat less than helpful, the latest version of VariantAnnotation display the alt sequence in a more natural way. But in the meantime, and if you do not use bioc devel, the explicit extraction of REF and ALT demonstrated above should get you part of what you want. - Paul On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote: > I have been looking at the VariantAnnotation vignette and have encountered something strange. The R code is below. See how the ALT field lists only ########. The vignette, however, correctly shows the alternate allele. The data file chr22.vcf.gz also correctly contains the alternate allele information. > > Any suggestions? > > Thanks. > > Sam > > ~~ > > > library(VariantAnnotation) > > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") > > vcf <- readVcf(fl, "hg19") > > head( fixed(vcf), 3 ) > GRanges with 3 ranges and 5 metadata columns: > seqnames ranges strand | paramRangeID > <rle> <iranges> <rle> | <factor> > rs7410291 22 [50300078, 50300078] * | <na> > rs147922003 22 [50300086, 50300086] * | <na> > rs114143073 22 [50300101, 50300101] * | <na> > REF ALT QUAL FILTER > <dnastringset> <dnastringsetlist> <numeric> <character> > rs7410291 A ######## 100 PASS > rs147922003 C ######## 100 PASS > rs114143073 G ######## 100 PASS > --- > seqlengths: > 22 > NA > > sessionInfo() > R version 2.15.2 Patched (2012-10-28 r61038) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 > [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 > [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 6.6 years ago by Paul Shannon750
Also, Val has added in bioc-devel the ExpandedVCF class which is like the old VCF, except one row per position+alt, so that there is a single ALT per row, and positions can occur multiple times, one for each ALT. This simplifies the DNAStringSetList column to a DNAStringSet, which is much easier to manipulate. Michael On Wed, Nov 21, 2012 at 9:19 AM, Paul Shannon <pshannon@fhcrc.org> wrote: > Hi Sam, > > Here's a quick workaround: > > fixed(vcf)[ , c("REF", "ALT")] > > The backstory on this is that the ALT field is a DNAStringSetList which, > until very recently (the change is in bioc-devel) displayed itself, via its > show methods, as '######'. Realizing this was somewhat less than helpful, > the latest version of VariantAnnotation display the alt sequence in a more > natural way. > > But in the meantime, and if you do not use bioc devel, the explicit > extraction of REF and ALT demonstrated above should get you part of what > you want. > > - Paul > > > On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote: > > > I have been looking at the VariantAnnotation vignette and have > encountered something strange. The R code is below. See how the ALT field > lists only ########. The vignette, however, correctly shows the alternate > allele. The data file chr22.vcf.gz also correctly contains the alternate > allele information. > > > > Any suggestions? > > > > Thanks. > > > > Sam > > > > ~~ > > > > > library(VariantAnnotation) > > > fl <- system.file("extdata", "chr22.vcf.gz", > package="VariantAnnotation") > > > vcf <- readVcf(fl, "hg19") > > > head( fixed(vcf), 3 ) > > GRanges with 3 ranges and 5 metadata columns: > > seqnames ranges strand | paramRangeID > > <rle> <iranges> <rle> | <factor> > > rs7410291 22 [50300078, 50300078] * | <na> > > rs147922003 22 [50300086, 50300086] * | <na> > > rs114143073 22 [50300101, 50300101] * | <na> > > REF ALT QUAL FILTER > > <dnastringset> <dnastringsetlist> <numeric> <character> > > rs7410291 A ######## 100 PASS > > rs147922003 C ######## 100 PASS > > rs114143073 G ######## 100 PASS > > --- > > seqlengths: > > 22 > > NA > > > sessionInfo() > > R version 2.15.2 Patched (2012-10-28 r61038) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > > [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > > [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 > > [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 > > [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 6.6 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 241 users visited in the last hour