Printing Alt alleles using VariantAnnotation
1
0
Entering edit mode
Mark Dunning ★ 1.1k
@mark-dunning-3319
Last seen 22 months ago
Sheffield, Uk
Hi all, I am doing some processing of vcf files using the VariantAnnotation package, and eventually I want to write out a table that I can use the annovar annotation package tool on (http://www.openbioinformatics.org/annovar/). The table needs to be in the form CHR, Start, end, Ref, Alt e.g. 1 55 55 T G 1 2646 2646 G A I'm fine extracting the chromosome, start and end. To get the referrence alleles I do. >Ref <- as.data.frame(values(ref(vcf))[["REF"]])[,1] But the Alt allele is a bit more complicated. If I do something like; >alternate = as.data.frame(unlist(values(fixed(vcf))[["ALT"]]))[,1] The number of rows could be greater than the number of variants in the vcf file, especially for indels where more than one alternate allele could be found. I can no longer easily construct the data frame. Is there an easy way to write all alternate alleles for the same position in a comma-separated string so that entries in the table could be in the form 1 55 55 T G,C (e,g, G and C alternate alleles were found for the SNP at position chromosome 1: 55-55) Regards, Mark > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] VariantAnnotation_1.2.11 Rsamtools_1.8.6 Biostrings_2.24.1 [4] ggplot2_0.9.2.1 GenomicRanges_1.8.13 IRanges_1.14.4 [7] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.18.3 Biobase_2.16.0 biomaRt_2.12.0 [4] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 [7] DBI_0.2-5 dichromat_1.2-4 digest_0.5.2 [10] GenomicFeatures_1.8.3 grid_2.15.1 gtable_0.1.1 [13] labeling_0.1 lattice_0.20-10 MASS_7.3-21 [16] Matrix_1.0-9 memoise_0.1 munsell_0.4 [19] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 [22] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.2 [25] rtracklayer_1.16.3 scales_0.2.2 snpStats_1.6.0 [28] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 [31] survival_2.36-14 tools_2.15.1 XML_3.9-4 [34] zlibbioc_1.2.0
SNP Annotation SNP Annotation • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 days ago
United States
Hi Mark, On 9/28/2012 6:06 AM, Mark Dunning wrote: > Hi all, > > I am doing some processing of vcf files using the VariantAnnotation > package, and eventually I want to write out a table that I can use the > annovar annotation package tool on > (http://www.openbioinformatics.org/annovar/). The table needs to be in > the form > > CHR, Start, end, Ref, Alt > > e.g. > > 1 55 55 T G > 1 2646 2646 G A > > I'm fine extracting the chromosome, start and end. To get the > referrence alleles I do. > >> Ref<- as.data.frame(values(ref(vcf))[["REF"]])[,1] > But the Alt allele is a bit more complicated. If I do something like; > >> alternate = as.data.frame(unlist(values(fixed(vcf))[["ALT"]]))[,1] How about alternate <- sapply(values(fixed(vcf))[["ALT"]], paste, collapse = ",") Best, Jim > The number of rows could be greater than the number of variants in the > vcf file, especially for indels where more than one alternate allele > could be found. I can no longer easily construct the data frame. > > Is there an easy way to write all alternate alleles for the same > position in a comma-separated string so that entries in the table > could be in the form > > 1 55 55 T G,C > (e,g, G and C alternate alleles were found for the SNP at position > chromosome 1: 55-55) > > > Regards, > > Mark > > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.2.11 Rsamtools_1.8.6 Biostrings_2.24.1 > [4] ggplot2_0.9.2.1 GenomicRanges_1.8.13 IRanges_1.14.4 > [7] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.3 Biobase_2.16.0 biomaRt_2.12.0 > [4] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 > [7] DBI_0.2-5 dichromat_1.2-4 digest_0.5.2 > [10] GenomicFeatures_1.8.3 grid_2.15.1 gtable_0.1.1 > [13] labeling_0.1 lattice_0.20-10 MASS_7.3-21 > [16] Matrix_1.0-9 memoise_0.1 munsell_0.4 > [19] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 > [22] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.2 > [25] rtracklayer_1.16.3 scales_0.2.2 snpStats_1.6.0 > [28] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 > [31] survival_2.36-14 tools_2.15.1 XML_3.9-4 > [34] zlibbioc_1.2.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
As a heads up, the behavior of the ref(), alt(), qual() and filt() accessors have changed in the devel version of VariantAnnotation. Now instead of values(fixed(vcf))[["ALT"]])) you can simply alt(vcf) This now returns the single value instead of a GRanges with the value as an elementMetadata column. Hopefully this makes getting at these data easier. Valerie On 09/28/2012 07:44 AM, James W. MacDonald wrote: > Hi Mark, > > On 9/28/2012 6:06 AM, Mark Dunning wrote: >> Hi all, >> >> I am doing some processing of vcf files using the VariantAnnotation >> package, and eventually I want to write out a table that I can use the >> annovar annotation package tool on >> (http://www.openbioinformatics.org/annovar/). The table needs to be in >> the form >> >> CHR, Start, end, Ref, Alt >> >> e.g. >> >> 1 55 55 T G >> 1 2646 2646 G A >> >> I'm fine extracting the chromosome, start and end. To get the >> referrence alleles I do. >> >>> Ref<- as.data.frame(values(ref(vcf))[["REF"]])[,1] >> But the Alt allele is a bit more complicated. If I do something like; >> >>> alternate = as.data.frame(unlist(values(fixed(vcf))[["ALT"]]))[,1] > > How about > > alternate <- sapply(values(fixed(vcf))[["ALT"]], paste, collapse = ",") > > Best, > > Jim > > >> The number of rows could be greater than the number of variants in the >> vcf file, especially for indels where more than one alternate allele >> could be found. I can no longer easily construct the data frame. >> >> Is there an easy way to write all alternate alleles for the same >> position in a comma-separated string so that entries in the table >> could be in the form >> >> 1 55 55 T G,C >> (e,g, G and C alternate alleles were found for the SNP at position >> chromosome 1: 55-55) >> >> >> Regards, >> >> Mark >> >> >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] VariantAnnotation_1.2.11 Rsamtools_1.8.6 Biostrings_2.24.1 >> [4] ggplot2_0.9.2.1 GenomicRanges_1.8.13 IRanges_1.14.4 >> [7] BiocGenerics_0.2.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.18.3 Biobase_2.16.0 biomaRt_2.12.0 >> [4] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1 >> [7] DBI_0.2-5 dichromat_1.2-4 digest_0.5.2 >> [10] GenomicFeatures_1.8.3 grid_2.15.1 gtable_0.1.1 >> [13] labeling_0.1 lattice_0.20-10 MASS_7.3-21 >> [16] Matrix_1.0-9 memoise_0.1 munsell_0.4 >> [19] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5 >> [22] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.2 >> [25] rtracklayer_1.16.3 scales_0.2.2 snpStats_1.6.0 >> [28] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 >> [31] survival_2.36-14 tools_2.15.1 XML_3.9-4 >> [34] zlibbioc_1.2.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6