Complete variant toolbox: gmapR/VariantTools/VariantAnnotation

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 28 days ago

United States

Thanks Wei for pointing this out, and sorry for missing this important utility. I will definitely give your exactSNP function a try. Associating Rsubread with the biocViews term "GeneticVariability" (or similar) would help users finding it in the future. Thanks, Thomas On Mon, Dec 09, 2013 at 02:12:23AM +0000, Wei Shi wrote: > I just want to point out that Rsubread package includes a snp calling function called exactSNP. > > Wei > > On Dec 9, 2013, at 4:08 AM, Thomas Girke wrote: > > > Dear Michael and Valerie, > > > > VariantTools and VariantAnnotation are awesome packages. To the best of my > > knowledge, VariantTools is currently the only Bioc/R package that performs > > variant calling and it does this in a very nice way. With the available > > resources it is now straightforward to set up complete workflows for variant > > calling projects: (1) variant aware read alignments with GSNAP from gmapR -> > > (2) variant calling/filtering with VariantTools -> (3) adding genomic context > > with VariantAnnotation. This is really amazing!!! > > > > Here are a few questions related to both packages: > > > > (1) For teaching purposes and other obvious reasons it would be useful if a > > Windows version of VariantTools were available (and perhaps for gmapR too). > > Installing the package (includes gmapR) from source works fine on both Linux > > and OS X, but not on Windows. > > > > (2) The VRanges class is another great resource for filtering variant calls. > > What I was not able to locate though is a description/definition of the content > > of its different columns/components. Is something like this available > > somewhere? > > > > (3) When annotation variants with utilities from VariantAnnotation, it would > > useful to provide a convenience Summary Report function at the end of the > > workflow that exports the annotations to a file. A very common need here is to > > collapse the annotations for each variant on a single line so that one doesn't > > end up with annotation results of millions of lines as it is typical for many > > variant discovery projects. This also simplifies joins among different > > annotation instances because it maintains uniqueness among variant identifiers. > > This approach is often useful when comparing (joining) the variants among > > different genotypes (e.g. which variants are identical or unique among > > different mutants). An example solution is shown on slides 34-35 of this > > presentation: > > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf > > > > (4) predictCoding() reports the relative location where exactly a variant maps > > to an annotation range. It would be nice if locateVariants() could report the > > exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to > > position x of 5'UTR. Perhaps this is already possible but I couldn't figure > > out how to do it without reaching too far into my own hacking toolbox. > > > > Thanks for providing these excellent resources and most importantly your patience > > listing to these unsolicited questions. > > > > Best, > > > > > > Thomas > > > > > > > >> sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 > > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > > [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 > > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > > [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 > > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:4}}

BiocViews VariantAnnotation Annotation biocViews Rsubread VariantAnnotation gmapR BiocViews • 1.8k views

ADD COMMENT • link 10.4 years ago • updated 10.3 years ago Thomas Girke ★ 1.7k

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 10 days ago

Australia/Melbourne/Olivia Newton-John …

Thanks Thomas, I have updated the biocViews terms for Rsubread to make it easier to be found. Best regards, Wei On Dec 9, 2013, at 3:47 PM, Thomas Girke wrote: > Thanks Wei for pointing this out, and sorry for missing this important > utility. I will definitely give your exactSNP function a try. Associating > Rsubread with the biocViews term "GeneticVariability" (or similar) would > help users finding it in the future. > > Thanks, > > Thomas > > On Mon, Dec 09, 2013 at 02:12:23AM +0000, Wei Shi wrote: >> I just want to point out that Rsubread package includes a snp calling function called exactSNP. >> >> Wei >> >> On Dec 9, 2013, at 4:08 AM, Thomas Girke wrote: >> >>> Dear Michael and Valerie, >>> >>> VariantTools and VariantAnnotation are awesome packages. To the best of my >>> knowledge, VariantTools is currently the only Bioc/R package that performs >>> variant calling and it does this in a very nice way. With the available >>> resources it is now straightforward to set up complete workflows for variant >>> calling projects: (1) variant aware read alignments with GSNAP from gmapR -> >>> (2) variant calling/filtering with VariantTools -> (3) adding genomic context >>> with VariantAnnotation. This is really amazing!!! >>> >>> Here are a few questions related to both packages: >>> >>> (1) For teaching purposes and other obvious reasons it would be useful if a >>> Windows version of VariantTools were available (and perhaps for gmapR too). >>> Installing the package (includes gmapR) from source works fine on both Linux >>> and OS X, but not on Windows. >>> >>> (2) The VRanges class is another great resource for filtering variant calls. >>> What I was not able to locate though is a description/definition of the content >>> of its different columns/components. Is something like this available >>> somewhere? >>> >>> (3) When annotation variants with utilities from VariantAnnotation, it would >>> useful to provide a convenience Summary Report function at the end of the >>> workflow that exports the annotations to a file. A very common need here is to >>> collapse the annotations for each variant on a single line so that one doesn't >>> end up with annotation results of millions of lines as it is typical for many >>> variant discovery projects. This also simplifies joins among different >>> annotation instances because it maintains uniqueness among variant identifiers. >>> This approach is often useful when comparing (joining) the variants among >>> different genotypes (e.g. which variants are identical or unique among >>> different mutants). An example solution is shown on slides 34-35 of this >>> presentation: >>> http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf >>> >>> (4) predictCoding() reports the relative location where exactly a variant maps >>> to an annotation range. It would be nice if locateVariants() could report the >>> exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to >>> position x of 5'UTR. Perhaps this is already possible but I couldn't figure >>> out how to do it without reaching too far into my own hacking toolbox. >>> >>> Thanks for providing these excellent resources and most importantly your patience >>> listing to these unsolicited questions. >>> >>> Best, >>> >>> >>> Thomas >>> >>> >>> >>>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 >>> [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 >>> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 >>> [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 >>> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 >>> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 >>> [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 >>> [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 >>> [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 >>> [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 >>> [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 >>> [28] XML_3.95-0.2 zlibbioc_1.8.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the addressee. >> You must not disclose, forward, print or use it without the permission of the sender. >> ______________________________________________________________________ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 10.4 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 28 days ago

United States

Hi Valerie, Adding a 'REFLOC' column to the output of locateVariants() would address this need. Thanks for looking into this. As for the need for a summary_var_report IN ADDITION TO to a complete_var_report, the primitive approach, used to create the results shown on the slides, is here: http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec _12_16_2013/Rvarseq/Rvarseq_Fct.R Right now this is just a pointer to show students how this could be done rather than something I would consider even remotely a finished solution for a package. To achieve the latter, one definitely should look into how to get rid of some of the tapply steps. As expert of the VCF and related classes you might have much more elegant and efficient solutions to this? Also, to address some of Julian's concerns related to ambiguous annotations, in case of overlapping genes one would append/prepend (but only for those) the GENEID to the annotation feature names, e.g. coding_GENE1__coding_GENE2. The result will end up being a gene- centric rather a transcript-centric report, meaning we are loosing the assignment to specific transcript variants. In 90% of the use cases of our discovery oriented VAR-Seq projects, gene resolution is sufficient here (e.g. supplement tables for publications or grant applications). If transcript resolution is needed then users are usually happy to look the results up in the complete variant report. Alternatively, one could easily do the same on the transcript level, but here a summary report may become quickly too complex to be useful for practitioners. Perhaps a well designed Var Summary Report function would include a summary_mode argument where the user could decide whether to output a gene- or transcript-centric summary_var_report. In general, this is obviously one of these tasks where it will be hard to reach consensus among biologists how exactly the ideal VAR summary report should look like. However, tackling this problem at least somehow is extremely important as for biologist this may be one of the most crucial features of any variant annotation tool. Most of them will not know how to get things from VRanges/GRanges/VCF objects into a file containing less than 100K lines that they can easily digest in a spreadsheet program and is also supported in the supplement section of most scientific journals (usually limited to Excel). Best, Thomas On Mon, Dec 09, 2013 at 08:07:34PM +0000, Valerie Obenchain wrote: > Hi Thomas, > > On 12/08/2013 09:08 AM, Thomas Girke wrote: > > Dear Michael and Valerie, > > > > VariantTools and VariantAnnotation are awesome packages. To the best of my > > knowledge, VariantTools is currently the only Bioc/R package that performs > > variant calling and it does this in a very nice way. With the available > > resources it is now straightforward to set up complete workflows for variant > > calling projects: (1) variant aware read alignments with GSNAP from gmapR -> > > (2) variant calling/filtering with VariantTools -> (3) adding genomic context > > with VariantAnnotation. This is really amazing!!! > > > > Here are a few questions related to both packages: > > > > (1) For teaching purposes and other obvious reasons it would be useful if a > > Windows version of VariantTools were available (and perhaps for gmapR too). > > Installing the package (includes gmapR) from source works fine on both Linux > > and OS X, but not on Windows. > > > > (2) The VRanges class is another great resource for filtering variant calls. > > What I was not able to locate though is a description/definition of the content > > of its different columns/components. Is something like this available > > somewhere? > > > > (3) When annotation variants with utilities from VariantAnnotation, it would > > useful to provide a convenience Summary Report function at the end of the > > workflow that exports the annotations to a file. A very common need here is to > > collapse the annotations for each variant on a single line so that one doesn't > > end up with annotation results of millions of lines as it is typical for many > > variant discovery projects. This also simplifies joins among different > > annotation instances because it maintains uniqueness among variant identifiers. > > This approach is often useful when comparing (joining) the variants among > > different genotypes (e.g. which variants are identical or unique among > > different mutants). An example solution is shown on slides 34-35 of this > > presentation: > > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf > > > > The variantReport() and codingReport() functions looks great. Would you > be willing to contribute them to VariantAnnotation? > > > (4) predictCoding() reports the relative location where exactly a variant maps > > to an annotation range. It would be nice if locateVariants() could report the > > exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to > > position x of 5'UTR. Perhaps this is already possible but I couldn't figure > > out how to do it without reaching too far into my own hacking toolbox. > > > > I could add a 'REFLOC' column to the otuput of locateVariants() that > would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). > > Valerie > > > > Thanks for providing these excellent resources and most importantly your patience > > listing to these unsolicited questions. > > > > Best, > > > > > > Thomas > > > > > > > >> sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 > > [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 > > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 > > [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 > > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 > > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 > > [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 > > [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 > > [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 > > [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 > > [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 > > [28] XML_3.95-0.2 zlibbioc_1.8.0 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > Valerie Obenchain > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B155 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: vobencha at fhcrc.org > Phone: (206) 667-3158 > Fax: (206) 667-1319

ADD COMMENT • link 10.4 years ago Thomas Girke ★ 1.7k

0

Entering edit mode

Hi Thomas, I agree with you that there is obviously no right or wrong on how to report the effect of variants - different solutions have their pros and cons. > results up in the complete variant report. Alternatively, one could > easily do the same on the transcript level, but here a summary report > may become quickly too complex to be useful for practitioners. Perhaps a > well designed Var Summary Report function would include a summary_mode > argument where the user could decide whether to output a gene- or > transcript-centric summary_var_report. One could generalize it even more, e.g. for summarizing over affected proteins or any desired variable of interest. If I understood you correctly, one would need to pass the respective column name to the 'tapply' function. Best wishes Julian

ADD REPLY • link 10.4 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Thanks for the detail on the summary functions. I agree these would be useful to have. I'll put this on the TODO for this dev cycle. Thanks. Valerie On 12/10/2013 09:09 AM, Thomas Girke wrote: > Hi Valerie, > > Adding a 'REFLOC' column to the output of locateVariants() would address > this need. Thanks for looking into this. > > As for the need for a summary_var_report IN ADDITION TO to a > complete_var_report, the primitive approach, used to create the results > shown on the slides, is here: > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_D ec_12_16_2013/Rvarseq/Rvarseq_Fct.R > Right now this is just a pointer to show students how this could be done > rather than something I would consider even remotely a finished solution > for a package. To achieve the latter, one definitely should look into > how to get rid of some of the tapply steps. As expert of the VCF and > related classes you might have much more elegant and efficient solutions > to this? Also, to address some of Julian's concerns related to ambiguous > annotations, in case of overlapping genes one would append/prepend (but > only for those) the GENEID to the annotation feature names, e.g. > coding_GENE1__coding_GENE2. The result will end up being a gene- centric > rather a transcript-centric report, meaning we are loosing the > assignment to specific transcript variants. In 90% of the use cases of > our discovery oriented VAR-Seq projects, gene resolution is sufficient > here (e.g. supplement tables for publications or grant applications). If > transcript resolution is needed then users are usually happy to look the > results up in the complete variant report. Alternatively, one could > easily do the same on the transcript level, but here a summary report > may become quickly too complex to be useful for practitioners. Perhaps a > well designed Var Summary Report function would include a summary_mode > argument where the user could decide whether to output a gene- or > transcript-centric summary_var_report. > > In general, this is obviously one of these tasks where it will be hard > to reach consensus among biologists how exactly the ideal VAR summary > report should look like. However, tackling this problem at least somehow > is extremely important as for biologist this may be one of the most > crucial features of any variant annotation tool. Most of them will not > know how to get things from VRanges/GRanges/VCF objects into a file > containing less than 100K lines that they can easily digest in a > spreadsheet program and is also supported in the supplement section of > most scientific journals (usually limited to Excel). > > Best, > > Thomas > > > On Mon, Dec 09, 2013 at 08:07:34PM +0000, Valerie Obenchain wrote: >> Hi Thomas, >> >> On 12/08/2013 09:08 AM, Thomas Girke wrote: >>> Dear Michael and Valerie, >>> >>> VariantTools and VariantAnnotation are awesome packages. To the best of my >>> knowledge, VariantTools is currently the only Bioc/R package that performs >>> variant calling and it does this in a very nice way. With the available >>> resources it is now straightforward to set up complete workflows for variant >>> calling projects: (1) variant aware read alignments with GSNAP from gmapR -> >>> (2) variant calling/filtering with VariantTools -> (3) adding genomic context >>> with VariantAnnotation. This is really amazing!!! >>> >>> Here are a few questions related to both packages: >>> >>> (1) For teaching purposes and other obvious reasons it would be useful if a >>> Windows version of VariantTools were available (and perhaps for gmapR too). >>> Installing the package (includes gmapR) from source works fine on both Linux >>> and OS X, but not on Windows. >>> >>> (2) The VRanges class is another great resource for filtering variant calls. >>> What I was not able to locate though is a description/definition of the content >>> of its different columns/components. Is something like this available >>> somewhere? >>> >>> (3) When annotation variants with utilities from VariantAnnotation, it would >>> useful to provide a convenience Summary Report function at the end of the >>> workflow that exports the annotations to a file. A very common need here is to >>> collapse the annotations for each variant on a single line so that one doesn't >>> end up with annotation results of millions of lines as it is typical for many >>> variant discovery projects. This also simplifies joins among different >>> annotation instances because it maintains uniqueness among variant identifiers. >>> This approach is often useful when comparing (joining) the variants among >>> different genotypes (e.g. which variants are identical or unique among >>> different mutants). An example solution is shown on slides 34-35 of this >>> presentation: >>> http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop _Dec_12_16_2013/Rvarseq/Rvarseq.pdf >>> >> >> The variantReport() and codingReport() functions looks great. Would you >> be willing to contribute them to VariantAnnotation? >> >>> (4) predictCoding() reports the relative location where exactly a variant maps >>> to an annotation range. It would be nice if locateVariants() could report the >>> exact relative mapping locations too, e.g. variant chr1:1033_A/T maps to >>> position x of 5'UTR. Perhaps this is already possible but I couldn't figure >>> out how to do it without reaching too far into my own hacking toolbox. >>> >> >> I could add a 'REFLOC' column to the otuput of locateVariants() that >> would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). >> >> Valerie >> >> >>> Thanks for providing these excellent resources and most importantly your patience >>> listing to these unsolicited questions. >>> >>> Best, >>> >>> >>> Thomas >>> >>> >>> >>>> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] VariantTools_1.4.5 VariantAnnotation_1.8.7 Rsamtools_1.14.2 >>> [4] Biostrings_2.30.1 GenomicRanges_1.14.3 XVector_0.2.0 >>> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.24.0 BatchJobs_1.1-1135 BBmisc_1.4 >>> [4] Biobase_2.22.0 BiocParallel_0.4.1 biomaRt_2.18.0 >>> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.30.0 >>> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.3 >>> [13] fail_1.2 foreach_1.4.1 GenomicFeatures_1.14.2 >>> [16] gmapR_1.4.2 grid_3.0.2 iterators_1.0.6 >>> [19] lattice_0.20-24 Matrix_1.1-0 plyr_1.8 >>> [22] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.0 >>> [25] sendmailR_1.1-2 stats4_3.0.2 tools_3.0.2 >>> [28] XML_3.95-0.2 zlibbioc_1.8.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> -- >> Valerie Obenchain >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B155 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: vobencha at fhcrc.org >> Phone: (206) 667-3158 >> Fax: (206) 667-1319 -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319

ADD REPLY • link 10.4 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 28 days ago

United States

Sorry for not responding sooner. - I agree, providing the variant mapping position relative to transcripts will be the most useful one, at least for features that are part of transcripts. For others that are not, e.g. intron and intergenic features, one could report the variant mappings relative to the start position of these features. Thomas On Thu, Dec 19, 2013 at 03:08:19AM +0000, Valerie Obenchain wrote: > Hi, > > On 12/17/2013 09:40 AM, Robert Castelo wrote: > > hi Valerie cc Thomas, > > > > sorry for hijacking the thread, regarding the request made below.. > > > > On 12/09/2013 09:07 PM, Valerie Obenchain wrote: > > [...] > >> I could add a 'REFLOC' column to the otuput of locateVariants() that > >> would essentially be the "equivalent" to 'CDSLOC' from predictCoding(). > > > > for the purpose of ordering cDNA primers flanking variants which one may > > want to validate through sanger sequencing, it is useful to have at hand > > the position of the variant with respect to the beginning of the > > transcript (cDNA) where it has been observed, thus not just from the > > beginning of the CDS but from the beginning of the transcript. > > > > is this newer 'REFLOC' going to contain this position? if not, would it > > be possible to get also a column for that from the locateVariants() > > call? (e.g., TXLOC) > > Yes, I think it makes sense to have 'REFLOC' be the position in the > reference starting from the beginning of the transcript. Unless others > have different thoughts this is what I'll go ahead with. > > Valerie > > > > > > > > > > > thanks!! > > robert. > > > -- > Valerie Obenchain > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B155 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: vobencha at fhcrc.org > Phone: (206) 667-3158 > Fax: (206) 667-1319

ADD COMMENT • link 10.3 years ago Thomas Girke ★ 1.7k

Login before adding your answer.