Complete variant toolbox: gmapR/VariantTools/VariantAnnotation
8
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States
Dear Michael and Valerie, VariantTools and VariantAnnotation are awesome packages. To the best of my knowledge, VariantTools is currently the only Bioc/R package that performs variant calling and it does this in a very nice way. With the available resources it is now straightforward to set up complete workflows for variant calling projects: (1) variant aware read alignments with GSNAP from gmapR -> (2) variant calling/filtering with VariantTools -> (3) adding genomic context with VariantAnnotation. This is really amazing!!! Here are a few questions related to both packages: (1) For teaching purposes and other obvious reasons it would be useful if a Windows version of VariantTools were available (and perhaps for gmapR too). Installing the package (includes gmapR) from source works fine on both Linux and OS X, but not on Windows. (2) The VRanges class is another great resource for filtering variant calls. What I was not able to locate though is a description/definition of the content of its different columns/components. Is something like this available somewhere? (3) When annotation variants with utilities from VariantAnnotation, it would useful to provide a convenience Summary Report function at the end of the workflow that exports the annotations to a file. A very common need here is to collapse the annotations for each variant on a single line so that one doesn't end up with annotation results of millions of lines as it is typical for many variant discovery projects. This also simplifies joins among different annotation instances because it maintains uniqueness among variant identifiers. This approach is often useful when comparing (joining) the variants among different genotypes (e.g. which variants are identical or unique among different mutants). An example solution is shown on slides 34-35 of this presentation: http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec _12_16_2013/Rvarseq/Rvarseq.pdf (4) predictCoding() reports the relative location where exactly a variant maps to an annotation range. It would be nice if locateVariants() could report the exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to position x of 5'UTR. Perhaps this is already possible but I couldn't figure out how to do it without reaching too far into my own hacking toolbox. Thanks for providing these excellent resources and most importantly your patience listing to these unsolicited questions. Best, Thomas > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 [7] IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 [28] XML_3.95-0.2 zlibbioc_1.8.0
VariantAnnotation Annotation VariantAnnotation VariantTools VariantAnnotation Annotation • 2.3k views
ADD COMMENT
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States
Hi Julian, Yes, I have seen it, but I cannot find explanations for things like "n.read.pos", "mean.quality.ref", etc. In most cases I can guess what it is but often I am not sure. Thomas On Sun, Dec 08, 2013 at 05:19:02PM +0000, Julian Gehring wrote: > > > Hi Thomas, > > I'm not sure if I understood you correctly, but did you have a look at > the 'VRanges' help (by calling '?VRanges')? This lists the different > slots and gives a short explanation for each. > > Best wishes > Julian > > > On 12/08/2013 06:08 PM, Thomas Girke wrote: > > (2) The VRanges class is another great resource for filtering variant calls. > > What I was not able to locate though is a description/definition of the content > > of its different columns/components. Is something like this available > > somewhere? > >
ADD COMMENT
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States
Hi Julian, I certainly understand the difficulty of supporting all OSs for certain packages. If it were possible in this case then it would certainly not be a waste of time. The latter snp example would remain ambiguous in its gene assignment which is fine. Usually, we would just flag it that way. Thomas On Sun, Dec 08, 2013 at 05:45:51PM +0000, Julian Gehring wrote: > > > Hi Thomas, > > > (1) For teaching purposes and other obvious reasons it would be useful if a > > Windows version of VariantTools were available (and perhaps for gmapR too). > > Installing the package (includes gmapR) from source works fine on both Linux > > and OS X, but not on Windows. > > Due to many differences between the operating systems, building a > package like 'gmapR' (and every package that depends on it, like > 'VariantTools') is often not possible for the windows OS. While Michael > or Thomas Wu may know more about the details, I would doubt that these > packages will be available for windows soon. As an alternative, the > amazon bioconductor instances may be useful for you in this context. > > > > (3) When annotation variants with utilities from VariantAnnotation, it would > > useful to provide a convenience Summary Report function at the end of the > > workflow that exports the annotations to a file. A very common need here is to > > collapse the annotations for each variant on a single line so that one doesn't > > end up with annotation results of millions of lines as it is typical for many > > variant discovery projects. This also simplifies joins among different > > annotation instances because it maintains uniqueness among variant identifiers. > > This approach is often useful when comparing (joining) the variants among > > different genotypes (e.g. which variants are identical or unique among > > different mutants). An example solution is shown on slides 34-35 of this > > presentation: > > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf > > The fact that one variant may have multiple consequences makes it often > harder to report or post-process the results, than it would be with a > simple 1:1 mapping. Other softwares have the concept of reporting the > 'most severe' consequence (as annovar), but the definition for this is > not well defined and may result in missing interesting consequences. > > Merging the consequences of a variant into a single line, as you have > shown in your slides, may make it difficult to disentangle the > relationship between the consequences. As an example, taking the last > line from your presentation p. 35: > > ID: Chr5:6455_T/C > Location: promoter coding > Gene: AT5G01010 AT5G01015 AT5G01020 > > Here, it is not possible anymore to relate the location of the variant > to the affected gene. Out of interest, how are you dealing with this in > your reports? > > Best wishes > Julian >
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Sun, Dec 8, 2013 at 9:08 AM, Thomas Girke <thomas.girke@ucr.edu> wrote: > Dear Michael and Valerie, > > VariantTools and VariantAnnotation are awesome packages. To the best of my > knowledge, VariantTools is currently the only Bioc/R package that performs > variant calling and it does this in a very nice way. With the available > resources it is now straightforward to set up complete workflows for > variant > calling projects: (1) variant aware read alignments with GSNAP from gmapR > -> > (2) variant calling/filtering with VariantTools -> (3) adding genomic > context > with VariantAnnotation. This is really amazing!!! > > Here are a few questions related to both packages: > > (1) For teaching purposes and other obvious reasons it would be useful if a > Windows version of VariantTools were available (and perhaps for gmapR too). > Installing the package (includes gmapR) from source works fine on both > Linux > and OS X, but not on Windows. > > Julian has already helped answer some of these questions (thanks!). For Windows support, I would need to talk to Tom about how far he could port the GMAP suite. VariantTools currently relies on bam_tally, but I've also written a simple function that generates a basic VRanges via Rsamtools::applyPileups. It will become part of VariantAnnotation. Many filters in VariantTools just rely on the basic read depth information, so I could make gmapR a Suggested dependency of VariantTools, and thus allow VariantTools to work on Windows. Tallying is a computationally intensive operation, so I'm guessing Windows users would be using the downstream functionality. Also interesting would be integration of the HDF5 representation, i.e., input/output to/from VRanges and generation via applyPileups. Does that already exist? And there's also the idea of storing the tallies as a tab-separated file, with a Tabix index. The advantage is that it would rely only on Rsamtools. (2) The VRanges class is another great resource for filtering variant calls. > What I was not able to locate though is a description/definition of the > content > of its different columns/components. Is something like this available > somewhere? > > (3) When annotation variants with utilities from VariantAnnotation, it > would > useful to provide a convenience Summary Report function at the end of the > workflow that exports the annotations to a file. A very common need here > is to > collapse the annotations for each variant on a single line so that one > doesn't > end up with annotation results of millions of lines as it is typical for > many > variant discovery projects. This also simplifies joins among different > annotation instances because it maintains uniqueness among variant > identifiers. > This approach is often useful when comparing (joining) the variants among > different genotypes (e.g. which variants are identical or unique among > different mutants). An example solution is shown on slides 34-35 of this > presentation: > > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_D ec_12_16_2013/Rvarseq/Rvarseq.pdf > > (4) predictCoding() reports the relative location where exactly a variant > maps > to an annotation range. It would be nice if locateVariants() could report > the > exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to > position x of 5'UTR. Perhaps this is already possible but I couldn't figure > out how to do it without reaching too far into my own hacking toolbox. > > Thanks for providing these excellent resources and most importantly your > patience > listing to these unsolicited questions. > > Best, > > > Thomas > > > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
----- Original Message ----- > From: "Michael Lawrence" <lawrence.michael at="" gene.com=""> > To: "Thomas Girke" <thomas.girke at="" ucr.edu=""> > Cc: "Bioconductor mailing list" <bioconductor at="" stat.math.ethz.ch=""> > Sent: Sunday, December 8, 2013 11:35:31 AM > Subject: Re: [BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation > > On Sun, Dec 8, 2013 at 9:08 AM, Thomas Girke <thomas.girke at="" ucr.edu=""> > wrote: > > > Dear Michael and Valerie, > > > > VariantTools and VariantAnnotation are awesome packages. To the > > best of my > > knowledge, VariantTools is currently the only Bioc/R package that > > performs > > variant calling and it does this in a very nice way. With the > > available > > resources it is now straightforward to set up complete workflows > > for > > variant > > calling projects: (1) variant aware read alignments with GSNAP from > > gmapR > > -> > > (2) variant calling/filtering with VariantTools -> (3) adding > > genomic > > context > > with VariantAnnotation. This is really amazing!!! > > > > Here are a few questions related to both packages: > > > > (1) For teaching purposes and other obvious reasons it would be > > useful if a > > Windows version of VariantTools were available (and perhaps for > > gmapR too). > > Installing the package (includes gmapR) from source works fine on > > both > > Linux > > and OS X, but not on Windows. > > > > > Julian has already helped answer some of these questions (thanks!). > For > Windows support, I would need to talk to Tom about how far he could > port > the GMAP suite. VariantTools currently relies on bam_tally, but I've > also > written a simple function that generates a basic VRanges via > Rsamtools::applyPileups. It will become part of VariantAnnotation. > Many > filters in VariantTools just rely on the basic read depth > information, so I > could make gmapR a Suggested dependency of VariantTools, and thus > allow > VariantTools to work on Windows. It would need to be an Enhances: dependency (with gmapR-specific functionality wrapped in if(require(gmapR)). Dan > Tallying is a computationally > intensive > operation, so I'm guessing Windows users would be using the > downstream > functionality. > > Also interesting would be integration of the HDF5 representation, > i.e., > input/output to/from VRanges and generation via applyPileups. Does > that > already exist? And there's also the idea of storing the tallies as a > tab-separated file, with a Tabix index. The advantage is that it > would rely > only on Rsamtools. > > (2) The VRanges class is another great resource for filtering variant > calls. > > What I was not able to locate though is a description/definition of > > the > > content > > of its different columns/components. Is something like this > > available > > somewhere? > > > > (3) When annotation variants with utilities from VariantAnnotation, > > it > > would > > useful to provide a convenience Summary Report function at the end > > of the > > workflow that exports the annotations to a file. A very common need > > here > > is to > > collapse the annotations for each variant on a single line so that > > one > > doesn't > > end up with annotation results of millions of lines as it is > > typical for > > many > > variant discovery projects. This also simplifies joins among > > different > > annotation instances because it maintains uniqueness among variant > > identifiers. > > This approach is often useful when comparing (joining) the variants > > among > > different genotypes (e.g. which variants are identical or unique > > among > > different mutants). An example solution is shown on slides 34-35 of > > this > > presentation: > > > > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf > > > > (4) predictCoding() reports the relative location where exactly a > > variant > > maps > > to an annotation range. It would be nice if locateVariants() could > > report > > the > > exact relative mapping locations too, e.g. variant chr1:1033_A/T > > maps to > > position x of 5'UTR. Perhaps this is already possible but I > > couldn't figure > > out how to do it without reaching too far into my own hacking > > toolbox. > > > > Thanks for providing these excellent resources and most importantly > > your > > patience > > listing to these unsolicited questions. > > > > Best, > > > > > > Thomas > > > > > > > > > sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets > > methods > > [8] base > > > > other attached packages: > > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 > > Rsamtools_1.14.2 > > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > > [13] fail_1.2 foreach_1.4.1 > > GenomicFeatures_1.14.2 > > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > > [22] RCurl_1.95-4.1 RSQLite_0.11.4 > > rtracklayer_1.22.0 > > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago
Hi Thomas, I'm not sure if I understood you correctly, but did you have a look at the 'VRanges' help (by calling '?VRanges')? This lists the different slots and gives a short explanation for each. Best wishes Julian On 12/08/2013 06:08 PM, Thomas Girke wrote: > (2) The VRanges class is another great resource for filtering variant calls. > What I was not able to locate though is a description/definition of the content > of its different columns/components. Is something like this available > somewhere?
ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago
Hi Thomas, > (1) For teaching purposes and other obvious reasons it would be useful if a > Windows version of VariantTools were available (and perhaps for gmapR too). > Installing the package (includes gmapR) from source works fine on both Linux > and OS X, but not on Windows. Due to many differences between the operating systems, building a package like 'gmapR' (and every package that depends on it, like 'VariantTools') is often not possible for the windows OS. While Michael or Thomas Wu may know more about the details, I would doubt that these packages will be available for windows soon. As an alternative, the amazon bioconductor instances may be useful for you in this context. > (3) When annotation variants with utilities from VariantAnnotation, it would > useful to provide a convenience Summary Report function at the end of the > workflow that exports the annotations to a file. A very common need here is to > collapse the annotations for each variant on a single line so that one doesn't > end up with annotation results of millions of lines as it is typical for many > variant discovery projects. This also simplifies joins among different > annotation instances because it maintains uniqueness among variant identifiers. > This approach is often useful when comparing (joining) the variants among > different genotypes (e.g. which variants are identical or unique among > different mutants). An example solution is shown on slides 34-35 of this > presentation: > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_D ec_12_16_2013/Rvarseq/Rvarseq.pdf The fact that one variant may have multiple consequences makes it often harder to report or post-process the results, than it would be with a simple 1:1 mapping. Other softwares have the concept of reporting the 'most severe' consequence (as annovar), but the definition for this is not well defined and may result in missing interesting consequences. Merging the consequences of a variant into a single line, as you have shown in your slides, may make it difficult to disentangle the relationship between the consequences. As an example, taking the last line from your presentation p. 35: ID: Chr5:6455_T/C Location: promoter coding Gene: AT5G01010 AT5G01015 AT5G01020 Here, it is not possible anymore to relate the location of the variant to the affected gene. Out of interest, how are you dealing with this in your reports? Best wishes Julian
ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago
Hi Thomas, In this case, I assume that you are referring to columns added by functions of the 'VariantTools' and 'gmapR' packages. These are simply additional columns in a 'VRanges' object and not part of a standard 'VRanges' class. 'help("tallyVariants", package = "VariantTools")' goes a bit into the details. Best wishes Julian On 12/08/2013 06:54 PM, Thomas Girke wrote: > Hi Julian, > > Yes, I have seen it, but I cannot find explanations for things like > "n.read.pos", "mean.quality.ref", etc. In most cases I can guess what > it is but often I am not sure. >
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Thomas, On 12/08/2013 09:08 AM, Thomas Girke wrote: > Dear Michael and Valerie, > > VariantTools and VariantAnnotation are awesome packages. To the best of my > knowledge, VariantTools is currently the only Bioc/R package that performs > variant calling and it does this in a very nice way. With the available > resources it is now straightforward to set up complete workflows for variant > calling projects: (1) variant aware read alignments with GSNAP from gmapR -> > (2) variant calling/filtering with VariantTools -> (3) adding genomic context > with VariantAnnotation. This is really amazing!!! > > Here are a few questions related to both packages: > > (1) For teaching purposes and other obvious reasons it would be useful if a > Windows version of VariantTools were available (and perhaps for gmapR too). > Installing the package (includes gmapR) from source works fine on both Linux > and OS X, but not on Windows. > > (2) The VRanges class is another great resource for filtering variant calls. > What I was not able to locate though is a description/definition of the content > of its different columns/components. Is something like this available > somewhere? > > (3) When annotation variants with utilities from VariantAnnotation, it would > useful to provide a convenience Summary Report function at the end of the > workflow that exports the annotations to a file. A very common need here is to > collapse the annotations for each variant on a single line so that one doesn't > end up with annotation results of millions of lines as it is typical for many > variant discovery projects. This also simplifies joins among different > annotation instances because it maintains uniqueness among variant identifiers. > This approach is often useful when comparing (joining) the variants among > different genotypes (e.g. which variants are identical or unique among > different mutants). An example solution is shown on slides 34-35 of this > presentation: > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_D ec_12_16_2013/Rvarseq/Rvarseq.pdf > The variantReport() and codingReport() functions looks great. Would you be willing to contribute them to VariantAnnotation? > (4) predictCoding() reports the relative location where exactly a variant maps > to an annotation range. It would be nice if locateVariants() could report the > exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to > position x of 5'UTR. Perhaps this is already possible but I couldn't figure > out how to do it without reaching too far into my own hacking toolbox. > I could add a 'REFLOC' column to the otuput of locateVariants() that would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). Valerie > Thanks for providing these excellent resources and most importantly your patience > listing to these unsolicited questions. > > Best, > > > Thomas > > > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
hi Valerie cc Thomas, sorry for hijacking the thread, regarding the request made below.. On 12/09/2013 09:07 PM, Valerie Obenchain wrote: [...] > I could add a 'REFLOC' column to the otuput of locateVariants() that > would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). for the purpose of ordering cDNA primers flanking variants which one may want to validate through sanger sequencing, it is useful to have at hand the position of the variant with respect to the beginning of the transcript (cDNA) where it has been observed, thus not just from the beginning of the CDS but from the beginning of the transcript. is this newer 'REFLOC' going to contain this position? if not, would it be possible to get also a column for that from the locateVariants() call? (e.g., TXLOC) thanks!! robert.
ADD REPLY
0
Entering edit mode
Hi, On 12/17/2013 09:40 AM, Robert Castelo wrote: > hi Valerie cc Thomas, > > sorry for hijacking the thread, regarding the request made below.. > > On 12/09/2013 09:07 PM, Valerie Obenchain wrote: > [...] >> I could add a 'REFLOC' column to the otuput of locateVariants() that >> would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). > > for the purpose of ordering cDNA primers flanking variants which one may > want to validate through sanger sequencing, it is useful to have at hand > the position of the variant with respect to the beginning of the > transcript (cDNA) where it has been observed, thus not just from the > beginning of the CDS but from the beginning of the transcript. > > is this newer 'REFLOC' going to contain this position? if not, would it > be possible to get also a column for that from the locateVariants() > call? (e.g., TXLOC) Yes, I think it makes sense to have 'REFLOC' be the position in the reference starting from the beginning of the transcript. Unless others have different thoughts this is what I'll go ahead with. Valerie > > > thanks!! > robert. -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 5 days ago
Australia/Melbourne
I just want to point out that Rsubread package includes a snp calling function called exactSNP. Wei On Dec 9, 2013, at 4:08 AM, Thomas Girke wrote: > Dear Michael and Valerie, > > VariantTools and VariantAnnotation are awesome packages. To the best of my > knowledge, VariantTools is currently the only Bioc/R package that performs > variant calling and it does this in a very nice way. With the available > resources it is now straightforward to set up complete workflows for variant > calling projects: (1) variant aware read alignments with GSNAP from gmapR -> > (2) variant calling/filtering with VariantTools -> (3) adding genomic context > with VariantAnnotation. This is really amazing!!! > > Here are a few questions related to both packages: > > (1) For teaching purposes and other obvious reasons it would be useful if a > Windows version of VariantTools were available (and perhaps for gmapR too). > Installing the package (includes gmapR) from source works fine on both Linux > and OS X, but not on Windows. > > (2) The VRanges class is another great resource for filtering variant calls. > What I was not able to locate though is a description/definition of the content > of its different columns/components. Is something like this available > somewhere? > > (3) When annotation variants with utilities from VariantAnnotation, it would > useful to provide a convenience Summary Report function at the end of the > workflow that exports the annotations to a file. A very common need here is to > collapse the annotations for each variant on a single line so that one doesn't > end up with annotation results of millions of lines as it is typical for many > variant discovery projects. This also simplifies joins among different > annotation instances because it maintains uniqueness among variant identifiers. > This approach is often useful when comparing (joining) the variants among > different genotypes (e.g. which variants are identical or unique among > different mutants). An example solution is shown on slides 34-35 of this > presentation: > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_D ec_12_16_2013/Rvarseq/Rvarseq.pdf > > (4) predictCoding() reports the relative location where exactly a variant maps > to an annotation range. It would be nice if locateVariants() could report the > exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to > position x of 5'UTR. Perhaps this is already possible but I couldn't figure > out how to do it without reaching too far into my own hacking toolbox. > > Thanks for providing these excellent resources and most importantly your patience > listing to these unsolicited questions. > > Best, > > > Thomas > > > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6