Entering edit mode
Dear Michael and Valerie,
VariantTools and VariantAnnotation are awesome packages. To the best
of my
knowledge, VariantTools is currently the only Bioc/R package that
performs
variant calling and it does this in a very nice way. With the
available
resources it is now straightforward to set up complete workflows for
variant
calling projects: (1) variant aware read alignments with GSNAP from
gmapR ->
(2) variant calling/filtering with VariantTools -> (3) adding genomic
context
with VariantAnnotation. This is really amazing!!!
Here are a few questions related to both packages:
(1) For teaching purposes and other obvious reasons it would be useful
if a
Windows version of VariantTools were available (and perhaps for gmapR
too).
Installing the package (includes gmapR) from source works fine on both
Linux
and OS X, but not on Windows.
(2) The VRanges class is another great resource for filtering variant
calls.
What I was not able to locate though is a description/definition of
the content
of its different columns/components. Is something like this available
somewhere?
(3) When annotation variants with utilities from VariantAnnotation, it
would
useful to provide a convenience Summary Report function at the end of
the
workflow that exports the annotations to a file. A very common need
here is to
collapse the annotations for each variant on a single line so that one
doesn't
end up with annotation results of millions of lines as it is typical
for many
variant discovery projects. This also simplifies joins among different
annotation instances because it maintains uniqueness among variant
identifiers.
This approach is often useful when comparing (joining) the variants
among
different genotypes (e.g. which variants are identical or unique among
different mutants). An example solution is shown on slides 34-35 of
this
presentation:
http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec
_12_16_2013/Rvarseq/Rvarseq.pdf
(4) predictCoding() reports the relative location where exactly a
variant maps
to an annotation range. It would be nice if locateVariants() could
report the
exact relative mapping locations too, e.g. variant chr1:1033_A/T maps
to
position x of 5'UTR. Perhaps this is already possible but I couldn't
figure
out how to do it without reaching too far into my own hacking toolbox.
Thanks for providing these excellent resources and most importantly
your patience
listing to these unsolicited questions.
Best,
Thomas
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2
[4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0
[7] IRanges_1.20.6 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4
[4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0
[7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0
[10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3
[13] fail_1.2 foreach_1.4.1
GenomicFeatures_1.14.2
[16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6
[19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8
[22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0
[25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2
[28] XML_3.95-0.2 zlibbioc_1.8.0