VariantAnnotation - extracting AD and PL fields
1
0
Entering edit mode
@lavinia-gordon-6486
Last seen 8.3 years ago
Dear All Using VariantAnnotation to parse a vcf file: ADvcf <- geno(vcf)$AD How can I access these values? ADvcf[1:2,1:2] Sample1 Sample2 chrM:72_T/C Integer,0 Integer,2 chrM:73_G/A Integer,0 Integer,2 as ideally I'd like something like: ADvcf[1:2,1:2] Sample1 Sample2 chrM:72_T/C 6,2 10,18 chrM:73_G/A 5,40 0,23 Thank you. Lavinia Gordon Bioinformatics Manager Australian Genome Research Facility Ltd The Walter and Eliza Hall Institute 1G Royal Parade Parkville VIC 3050 Australia sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] VariantAnnotation_1.8.13 Rsamtools_1.14.3 Biostrings_2.30.1 [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 [7] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0 [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 [7] GenomicFeatures_1.14.5 RCurl_1.95-4.1 RSQLite_0.11.4 [10] rtracklayer_1.22.7 stats4_3.0.3 tools_3.0.3 [13] XML_3.98-1.1 zlibbioc_1.8.0 [[alternative HTML version deleted]]
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.3k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Do you really just want to see the values, or is the goal to compute on them? That AD field seems a bit strange, because the first sample is missing values. By convention, the AD has values for the ref and alts, across all of the samples. In order to really work with these data, you'll need to convert that list matrix to a cube with NAs as padding. I wouldn't be able to suggest how without having my hands on the actual matrix. If you really do just want to see them, you can do unstrsplit(CharacterList(AD), ",") in devel, or use rtracklayer:::pasteCollapse instead of unstrsplit in release. Then wrap that result back into a matrix. On Mon, Apr 7, 2014 at 10:53 PM, Lavinia Gordon <lavinia.gordon@agrf.org.au>wrote: > Dear All > > Using VariantAnnotation to parse a vcf file: > > ADvcf <- geno(vcf)$AD > > How can I access these values? > > ADvcf[1:2,1:2] > Sample1 Sample2 > chrM:72_T/C Integer,0 Integer,2 > chrM:73_G/A Integer,0 Integer,2 > > as ideally I'd like something like: > ADvcf[1:2,1:2] > Sample1 Sample2 > chrM:72_T/C 6,2 10,18 > chrM:73_G/A 5,40 0,23 > > Thank you. > > Lavinia Gordon > Bioinformatics Manager > > Australian Genome Research Facility Ltd > The Walter and Eliza Hall Institute > 1G Royal Parade > Parkville VIC 3050 > Australia > > sessionInfo() > R version 3.0.3 (2014-03-06) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantAnnotation_1.8.13 Rsamtools_1.14.3 Biostrings_2.30.1 > [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 > [7] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0 > [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 > [7] GenomicFeatures_1.14.5 RCurl_1.95-4.1 RSQLite_0.11.4 > [10] rtracklayer_1.22.7 stats4_3.0.3 tools_3.0.3 > [13] XML_3.98-1.1 zlibbioc_1.8.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 4/8/14 4:51 AM, Michael Lawrence wrote: > Do you really just want to see the values, or is the goal to compute on > them? That AD field seems a bit strange, because the first sample is > missing values. By convention, the AD has values for the ref and alts, > across all of the samples. In order to really work with these data, you'll > need to convert that list matrix to a cube with NAs as padding. I wouldn't > be able to suggest how without having my hands on the actual matrix. VariantAnnotation has an internal function .matrixOfListsToArray that does exactly this. Maybe it should be exported? > > If you really do just want to see them, you can do > unstrsplit(CharacterList(AD), ",") in devel, or use > rtracklayer:::pasteCollapse instead of unstrsplit in release. Then wrap > that result back into a matrix. > > > > On Mon, Apr 7, 2014 at 10:53 PM, Lavinia Gordon > <lavinia.gordon at="" agrf.org.au="">wrote: > >> Dear All >> >> Using VariantAnnotation to parse a vcf file: >> >> ADvcf <- geno(vcf)$AD >> >> How can I access these values? >> >> ADvcf[1:2,1:2] >> Sample1 Sample2 >> chrM:72_T/C Integer,0 Integer,2 >> chrM:73_G/A Integer,0 Integer,2 >> >> as ideally I'd like something like: >> ADvcf[1:2,1:2] >> Sample1 Sample2 >> chrM:72_T/C 6,2 10,18 >> chrM:73_G/A 5,40 0,23 >> >> Thank you. >> >> Lavinia Gordon >> Bioinformatics Manager >> >> Australian Genome Research Facility Ltd >> The Walter and Eliza Hall Institute >> 1G Royal Parade >> Parkville VIC 3050 >> Australia >> >> sessionInfo() >> R version 3.0.3 (2014-03-06) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] VariantAnnotation_1.8.13 Rsamtools_1.14.3 Biostrings_2.30.1 >> [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 >> [7] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0 >> [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >> [7] GenomicFeatures_1.14.5 RCurl_1.95-4.1 RSQLite_0.11.4 >> [10] rtracklayer_1.22.7 stats4_3.0.3 tools_3.0.3 >> [13] XML_3.98-1.1 zlibbioc_1.8.0 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Looks like a useful function. Someone should optimize it though. On Tue, Apr 8, 2014 at 8:52 AM, Stephanie M. Gogarten < sdmorris@u.washington.edu> wrote: > > > On 4/8/14 4:51 AM, Michael Lawrence wrote: > >> Do you really just want to see the values, or is the goal to compute on >> them? That AD field seems a bit strange, because the first sample is >> missing values. By convention, the AD has values for the ref and alts, >> across all of the samples. In order to really work with these data, >> you'll >> need to convert that list matrix to a cube with NAs as padding. I wouldn't >> be able to suggest how without having my hands on the actual matrix. >> > > VariantAnnotation has an internal function .matrixOfListsToArray that does > exactly this. Maybe it should be exported? > > > >> If you really do just want to see them, you can do >> unstrsplit(CharacterList(AD), ",") in devel, or use >> rtracklayer:::pasteCollapse instead of unstrsplit in release. Then wrap >> that result back into a matrix. >> >> >> >> On Mon, Apr 7, 2014 at 10:53 PM, Lavinia Gordon >> <lavinia.gordon@agrf.org.au>wrote: >> >> Dear All >>> >>> Using VariantAnnotation to parse a vcf file: >>> >>> ADvcf <- geno(vcf)$AD >>> >>> How can I access these values? >>> >>> ADvcf[1:2,1:2] >>> Sample1 Sample2 >>> chrM:72_T/C Integer,0 Integer,2 >>> chrM:73_G/A Integer,0 Integer,2 >>> >>> as ideally I'd like something like: >>> ADvcf[1:2,1:2] >>> Sample1 Sample2 >>> chrM:72_T/C 6,2 10,18 >>> chrM:73_G/A 5,40 0,23 >>> >>> Thank you. >>> >>> Lavinia Gordon >>> Bioinformatics Manager >>> >>> Australian Genome Research Facility Ltd >>> The Walter and Eliza Hall Institute >>> 1G Royal Parade >>> Parkville VIC 3050 >>> Australia >>> >>> sessionInfo() >>> R version 3.0.3 (2014-03-06) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] VariantAnnotation_1.8.13 Rsamtools_1.14.3 Biostrings_2.30.1 >>> [4] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 >>> [7] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0 >>> [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >>> [7] GenomicFeatures_1.14.5 RCurl_1.95-4.1 RSQLite_0.11.4 >>> [10] rtracklayer_1.22.7 stats4_3.0.3 tools_3.0.3 >>> [13] XML_3.98-1.1 zlibbioc_1.8.0 >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6