Filtering BAM files by start position for VariantTools
1
0
Entering edit mode
@martin-morgan-1513
Last seen 20 hours ago
United States
On 07/16/2013 05:40 PM, Michael Lawrence wrote: > The necessary update to VariantTools will propagate soon to devel. Or you > can grab it from svn. This isn't building in devel; does it require something special with gmapR? Dan and I looked at this (??) for the conference, but were not quite able to master the automake foo required to get off the ground. Martin > > Michael > > > On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> wrote: > >> In order to work through some of the code, I installed the devel version >> of R and updated all the packages. Now when I run tallyVariants() I get the >> following error message:**** >> >> ** ** >> >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : **** >> >> object 'castList' not found**** >> >> ** ** >> >>> traceback()**** >> >> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** >> >> 11: IRanges:::castList**** >> >> 10: safe_mclapply(ind, function(i, ...) {**** >> >> FUN(gr[i], ...)**** >> >> }, ...)**** >> >> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** >> >> 8: .local(x, ...)**** >> >> 7: tallyVariants(x, tally.param)**** >> >> 6: tallyVariants(x, tally.param)**** >> >> 5: .local(x, ...)**** >> >> 4: callVariants(BamFile(x), ...)**** >> >> 3: callVariants(BamFile(x), ...)**** >> >> 2: callVariants(destination, tally.param)**** >> >> 1: callVariants(destination, tally.param)**** >> >> ** ** >> >> Perhaps this is something missing/changed in the devel version of IRanges? >> **** >> >> ** ** >> >> Updated sessionInfo() below:**** >> >> ** ** >> >> R version 3.0.1 (2013-05-16)**** >> >> Platform: x86_64-unknown-linux-gnu (64-bit)**** >> >> ** ** >> >> locale:**** >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** >> >> [7] LC_PAPER=C LC_NAME=C **** >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C **** >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** >> >> ** ** >> >> attached base packages:**** >> >> [1] grid parallel stats graphics grDevices utils datasets * >> *** >> >> [8] methods base **** >> >> ** ** >> >> other attached packages:**** >> >> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 >> lattice_0.20-15 **** >> >> [4] RColorBrewer_1.0-5 genoPlotR_0.8 >> ade4_1.5-2 **** >> >> [7] gmapR_1.3.2 VariantTools_1.3.2 >> VariantAnnotation_1.7.34**** >> >> [10] Rsamtools_1.13.24 Biostrings_2.29.13 >> GenomicRanges_1.13.33 **** >> >> [13] XVector_0.1.0 IRanges_1.19.18 >> BiocGenerics_0.7.3 **** >> >> ** ** >> >> loaded via a namespace (and not attached):**** >> >> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 >> **** >> >> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 >> **** >> >> [7] GenomicFeatures_1.13.19 graph_1.39.3 >> Matrix_1.0-12 **** >> >> [10] RBGL_1.37.2 RCurl_1.95-4.1 >> RSQLite_0.11.4 **** >> >> [13] rtracklayer_1.20.2 stats4_3.0.1 >> tools_3.0.1 **** >> >> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** >> >>> ** ** >> >> ** ** >> >> Thanks again,**** >> >> Sean**** >> >> ** ** >> >> *From:* Michael Lawrence [mailto:lawrence.michael at gene.com] >> *Sent:* Saturday, July 13, 2013 1:59 PM >> *To:* Taylor, Sean D >> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; >> bioconductor at r-project.org >> >> *Subject:* Re: [BioC] Filtering BAM files by start position for >> VariantTools**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >> wrote:**** >> >> Thanks Michael, **** >> >> This is an interesting idea. Usually, we resolve PCR/optical duplicates >> with the Picard MarkDuplicates command, which just chooses the read with >> the highest average quality. This approximates what you want, I think. *** >> * >> >> Is the Picard MarkDuplicates command run by default? Or is this a separate >> command from a different package?**** >> >> ** ** >> >> It's a separate package, basically the Java implementation of samtools; >> you might that first and see how it improves your error rates, before >> taking a more complicated approach.**** >> >> Also, I find it hard to believe that sequence errors would be causing >> you trouble at 10%, as long as you're filtering for quality and have decent >> coverage. PCR might in limited cases, and Picard effectively takes care of >> that.**** >> >> 10% is fine, but we have problems at 1% and lower. I think even the >> defaults in tallyVariants() use 1% as a cutoff if I?m not mistaken.**** >> >> **** >> >> ** ** >> >> It uses a ~4% cutoff. And yes, you'll start running into issues around 1%. >> Calling at such a low frequency is kind of ambitious.**** >> >> **** >> >> What you want is to specify the cycleBreaks argument to >> VariantTallyParam. It allows you to define bins by cycle. So if you made >> bins for all 100bp, you could do things like generate a consensus by read >> family. A family corresponding to position X would be cycle bin #1 at X, >> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the highest >> count. Might be tricky to implement efficiently for the whole genome. Of >> course, this assumes you've got single end reads, otherwise this won't >> work, because the mate is not considered.**** >> >> Cool, I will try that out, that may be just what I was looking for. It >> might be tricky for the whole genome, but we are restricting ourselves to >> just the mitochondrial genome. Still gives us several thousand start sites >> to work from but should be easier than the whole genome. As for single end >> reads, can I just filter on the strand (i.e. ?+? or ?-?)? Or will I have to >> perform separate alignments for each of the paired reads?**** >> >> **** >> >> ** ** >> >> Now that I think about it, you should be fine just considering the >> alignments individually (as if they were unpaired).**** >> >> **** >> >> ** ** >> > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Coverage Cancer cycle VariantTools Coverage Cancer cycle VariantTools • 1.7k views
ADD COMMENT
0
Entering edit mode
@taylor-sean-d-5800
Last seen 9.6 years ago
Martin, it always makes me feel better when I see you scratching your head too! ;) Sean Sent from my iPod On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: > On 07/16/2013 05:40 PM, Michael Lawrence wrote: >> The necessary update to VariantTools will propagate soon to devel. Or you >> can grab it from svn. > > This isn't building in devel; does it require something special with gmapR? Dan and I looked at this (??) for the conference, but were not quite able to master the automake foo required to get off the ground. > > Martin > >> >> Michael >> >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> wrote: >> >>> In order to work through some of the code, I installed the devel version >>> of R and updated all the packages. Now when I run tallyVariants() I get the >>> following error message:**** >>> >>> ** ** >>> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : **** >>> >>> object 'castList' not found**** >>> >>> ** ** >>> >>>> traceback()**** >>> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** >>> >>> 11: IRanges:::castList**** >>> >>> 10: safe_mclapply(ind, function(i, ...) {**** >>> >>> FUN(gr[i], ...)**** >>> >>> }, ...)**** >>> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** >>> >>> 8: .local(x, ...)**** >>> >>> 7: tallyVariants(x, tally.param)**** >>> >>> 6: tallyVariants(x, tally.param)**** >>> >>> 5: .local(x, ...)**** >>> >>> 4: callVariants(BamFile(x), ...)**** >>> >>> 3: callVariants(BamFile(x), ...)**** >>> >>> 2: callVariants(destination, tally.param)**** >>> >>> 1: callVariants(destination, tally.param)**** >>> >>> ** ** >>> >>> Perhaps this is something missing/changed in the devel version of IRanges? >>> **** >>> >>> ** ** >>> >>> Updated sessionInfo() below:**** >>> >>> ** ** >>> >>> R version 3.0.1 (2013-05-16)**** >>> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** >>> >>> ** ** >>> >>> locale:**** >>> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** >>> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** >>> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** >>> >>> [7] LC_PAPER=C LC_NAME=C **** >>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** >>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** >>> >>> ** ** >>> >>> attached base packages:**** >>> >>> [1] grid parallel stats graphics grDevices utils datasets * >>> *** >>> >>> [8] methods base **** >>> >>> ** ** >>> >>> other attached packages:**** >>> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 >>> lattice_0.20-15 **** >>> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 >>> ade4_1.5-2 **** >>> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 >>> VariantAnnotation_1.7.34**** >>> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 >>> GenomicRanges_1.13.33 **** >>> >>> [13] XVector_0.1.0 IRanges_1.19.18 >>> BiocGenerics_0.7.3 **** >>> >>> ** ** >>> >>> loaded via a namespace (and not attached):**** >>> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 >>> **** >>> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 >>> **** >>> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 >>> Matrix_1.0-12 **** >>> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 >>> RSQLite_0.11.4 **** >>> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 >>> tools_3.0.1 **** >>> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** >>> >>>> ** ** >>> >>> ** ** >>> >>> Thanks again,**** >>> >>> Sean**** >>> >>> ** ** >>> >>> *From:* Michael Lawrence [mailto:lawrence.michael at gene.com] >>> *Sent:* Saturday, July 13, 2013 1:59 PM >>> *To:* Taylor, Sean D >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; >>> bioconductor at r-project.org >>> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for >>> VariantTools**** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >>> wrote:**** >>> >>> Thanks Michael, **** >>> >>> This is an interesting idea. Usually, we resolve PCR/optical duplicates >>> with the Picard MarkDuplicates command, which just chooses the read with >>> the highest average quality. This approximates what you want, I think. *** >>> * >>> >>> Is the Picard MarkDuplicates command run by default? Or is this a separate >>> command from a different package?**** >>> >>> ** ** >>> >>> It's a separate package, basically the Java implementation of samtools; >>> you might that first and see how it improves your error rates, before >>> taking a more complicated approach.**** >>> >>> Also, I find it hard to believe that sequence errors would be causing >>> you trouble at 10%, as long as you're filtering for quality and have decent >>> coverage. PCR might in limited cases, and Picard effectively takes care of >>> that.**** >>> >>> 10% is fine, but we have problems at 1% and lower. I think even the >>> defaults in tallyVariants() use 1% as a cutoff if I?m not mistaken.**** >>> >>> **** >>> >>> ** ** >>> >>> It uses a ~4% cutoff. And yes, you'll start running into issues around 1%. >>> Calling at such a low frequency is kind of ambitious.**** >>> >>> **** >>> >>> What you want is to specify the cycleBreaks argument to >>> VariantTallyParam. It allows you to define bins by cycle. So if you made >>> bins for all 100bp, you could do things like generate a consensus by read >>> family. A family corresponding to position X would be cycle bin #1 at X, >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the highest >>> count. Might be tricky to implement efficiently for the whole genome. Of >>> course, this assumes you've got single end reads, otherwise this won't >>> work, because the mate is not considered.**** >>> >>> Cool, I will try that out, that may be just what I was looking for. It >>> might be tricky for the whole genome, but we are restricting ourselves to >>> just the mitochondrial genome. Still gives us several thousand start sites >>> to work from but should be easier than the whole genome. As for single end >>> reads, can I just filter on the strand (i.e. ?+? or ?-?)? Or will I have to >>> perform separate alignments for each of the paired reads?**** >>> >>> **** >>> >>> ** ** >>> >>> Now that I think about it, you should be fine just considering the >>> alignments individually (as if they were unpaired).**** >>> >>> **** >>> >>> ** ** >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
I will fix this today out of necessity to get it building on one of our clusters. It's not going to be fun. On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor@fhcrc.org> wrote: > Martin, it always makes me feel better when I see you scratching your head > too! ;) > > Sean > > Sent from my iPod > > On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan@fhcrc.org> wrote: > > > On 07/16/2013 05:40 PM, Michael Lawrence wrote: > >> The necessary update to VariantTools will propagate soon to devel. Or > you > >> can grab it from svn. > > > > This isn't building in devel; does it require something special with > gmapR? Dan and I looked at this (??) for the conference, but were not quite > able to master the automake foo required to get off the ground. > > > > Martin > > > >> > >> Michael > >> > >> > >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor@fhcrc.org> > wrote: > >> > >>> In order to work through some of the code, I installed the devel > version > >>> of R and updated all the packages. Now when I run tallyVariants() I > get the > >>> following error message:**** > >>> > >>> ** ** > >>> > >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : **** > >>> > >>> object 'castList' not found**** > >>> > >>> ** ** > >>> > >>>> traceback()**** > >>> > >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** > >>> > >>> 11: IRanges:::castList**** > >>> > >>> 10: safe_mclapply(ind, function(i, ...) {**** > >>> > >>> FUN(gr[i], ...)**** > >>> > >>> }, ...)**** > >>> > >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** > >>> > >>> 8: .local(x, ...)**** > >>> > >>> 7: tallyVariants(x, tally.param)**** > >>> > >>> 6: tallyVariants(x, tally.param)**** > >>> > >>> 5: .local(x, ...)**** > >>> > >>> 4: callVariants(BamFile(x), ...)**** > >>> > >>> 3: callVariants(BamFile(x), ...)**** > >>> > >>> 2: callVariants(destination, tally.param)**** > >>> > >>> 1: callVariants(destination, tally.param)**** > >>> > >>> ** ** > >>> > >>> Perhaps this is something missing/changed in the devel version of > IRanges? > >>> **** > >>> > >>> ** ** > >>> > >>> Updated sessionInfo() below:**** > >>> > >>> ** ** > >>> > >>> R version 3.0.1 (2013-05-16)**** > >>> > >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** > >>> > >>> ** ** > >>> > >>> locale:**** > >>> > >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** > >>> > >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** > >>> > >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** > >>> > >>> [7] LC_PAPER=C LC_NAME=C **** > >>> > >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** > >>> > >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** > >>> > >>> ** ** > >>> > >>> attached base packages:**** > >>> > >>> [1] grid parallel stats graphics grDevices utils > datasets * > >>> *** > >>> > >>> [8] methods base **** > >>> > >>> ** ** > >>> > >>> other attached packages:**** > >>> > >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 > >>> lattice_0.20-15 **** > >>> > >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 > >>> ade4_1.5-2 **** > >>> > >>> [7] gmapR_1.3.2 VariantTools_1.3.2 > >>> VariantAnnotation_1.7.34**** > >>> > >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 > >>> GenomicRanges_1.13.33 **** > >>> > >>> [13] XVector_0.1.0 IRanges_1.19.18 > >>> BiocGenerics_0.7.3 **** > >>> > >>> ** ** > >>> > >>> loaded via a namespace (and not attached):**** > >>> > >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 > >>> **** > >>> > >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 > >>> **** > >>> > >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 > >>> Matrix_1.0-12 **** > >>> > >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 > >>> RSQLite_0.11.4 **** > >>> > >>> [13] rtracklayer_1.20.2 stats4_3.0.1 > >>> tools_3.0.1 **** > >>> > >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** > >>> > >>>> ** ** > >>> > >>> ** ** > >>> > >>> Thanks again,**** > >>> > >>> Sean**** > >>> > >>> ** ** > >>> > >>> *From:* Michael Lawrence [mailto:lawrence.michael@gene.com] > >>> *Sent:* Saturday, July 13, 2013 1:59 PM > >>> *To:* Taylor, Sean D > >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; > >>> bioconductor@r-project.org > >>> > >>> *Subject:* Re: [BioC] Filtering BAM files by start position for > >>> VariantTools**** > >>> > >>> ** ** > >>> > >>> ** ** > >>> > >>> ** ** > >>> > >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor@fhcrc.org> > >>> wrote:**** > >>> > >>> Thanks Michael, **** > >>> > >>> This is an interesting idea. Usually, we resolve PCR/optical duplicates > >>> with the Picard MarkDuplicates command, which just chooses the read > with > >>> the highest average quality. This approximates what you want, I think. > *** > >>> * > >>> > >>> Is the Picard MarkDuplicates command run by default? Or is this a > separate > >>> command from a different package?**** > >>> > >>> ** ** > >>> > >>> It's a separate package, basically the Java implementation of samtools; > >>> you might that first and see how it improves your error rates, before > >>> taking a more complicated approach.**** > >>> > >>> Also, I find it hard to believe that sequence errors would be > causing > >>> you trouble at 10%, as long as you're filtering for quality and have > decent > >>> coverage. PCR might in limited cases, and Picard effectively takes > care of > >>> that.**** > >>> > >>> 10% is fine, but we have problems at 1% and lower. I think even the > >>> defaults in tallyVariants() use 1% as a cutoff if I’m not mistaken.**** > >>> > >>> **** > >>> > >>> ** ** > >>> > >>> It uses a ~4% cutoff. And yes, you'll start running into issues around > 1%. > >>> Calling at such a low frequency is kind of ambitious.**** > >>> > >>> **** > >>> > >>> What you want is to specify the cycleBreaks argument to > >>> VariantTallyParam. It allows you to define bins by cycle. So if you > made > >>> bins for all 100bp, you could do things like generate a consensus by > read > >>> family. A family corresponding to position X would be cycle bin #1 at > X, > >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the > highest > >>> count. Might be tricky to implement efficiently for the whole genome. > Of > >>> course, this assumes you've got single end reads, otherwise this won't > >>> work, because the mate is not considered.**** > >>> > >>> Cool, I will try that out, that may be just what I was looking for. It > >>> might be tricky for the whole genome, but we are restricting ourselves > to > >>> just the mitochondrial genome. Still gives us several thousand start > sites > >>> to work from but should be easier than the whole genome. As for single > end > >>> reads, can I just filter on the strand (i.e. ‘+’ or ‘-‘)? Or will I > have to > >>> perform separate alignments for each of the paired reads?**** > >>> > >>> **** > >>> > >>> ** ** > >>> > >>> Now that I think about it, you should be fine just considering the > >>> alignments individually (as if they were unpaired).**** > >>> > >>> **** > >>> > >>> ** ** > >> > >> [[alternative HTML version deleted]] > >> > >> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: > I will fix this today out of necessity to get it building on one of our > clusters. It's not going to be fun. > Thanks! If the solution could work with standard versions of automake available on Ubuntu 12.04LTS that would be great, and mean that our build systems would not need any special tweaking. Dan > > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> wrote: >> >> Martin, it always makes me feel better when I see you scratching your head >> too! ;) >> >> Sean >> >> Sent from my iPod >> >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: >> >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: >> >> The necessary update to VariantTools will propagate soon to devel. Or >> >> you >> >> can grab it from svn. >> > >> > This isn't building in devel; does it require something special with >> > gmapR? Dan and I looked at this (??) for the conference, but were not quite >> > able to master the automake foo required to get off the ground. >> > >> > Martin >> > >> >> >> >> Michael >> >> >> >> >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >> >> wrote: >> >> >> >>> In order to work through some of the code, I installed the devel >> >>> version >> >>> of R and updated all the packages. Now when I run tallyVariants() I >> >>> get the >> >>> following error message:**** >> >>> >> >>> ** ** >> >>> >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : **** >> >>> >> >>> object 'castList' not found**** >> >>> >> >>> ** ** >> >>> >> >>>> traceback()**** >> >>> >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** >> >>> >> >>> 11: IRanges:::castList**** >> >>> >> >>> 10: safe_mclapply(ind, function(i, ...) {**** >> >>> >> >>> FUN(gr[i], ...)**** >> >>> >> >>> }, ...)**** >> >>> >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** >> >>> >> >>> 8: .local(x, ...)**** >> >>> >> >>> 7: tallyVariants(x, tally.param)**** >> >>> >> >>> 6: tallyVariants(x, tally.param)**** >> >>> >> >>> 5: .local(x, ...)**** >> >>> >> >>> 4: callVariants(BamFile(x), ...)**** >> >>> >> >>> 3: callVariants(BamFile(x), ...)**** >> >>> >> >>> 2: callVariants(destination, tally.param)**** >> >>> >> >>> 1: callVariants(destination, tally.param)**** >> >>> >> >>> ** ** >> >>> >> >>> Perhaps this is something missing/changed in the devel version of >> >>> IRanges? >> >>> **** >> >>> >> >>> ** ** >> >>> >> >>> Updated sessionInfo() below:**** >> >>> >> >>> ** ** >> >>> >> >>> R version 3.0.1 (2013-05-16)**** >> >>> >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** >> >>> >> >>> ** ** >> >>> >> >>> locale:**** >> >>> >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** >> >>> >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** >> >>> >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** >> >>> >> >>> [7] LC_PAPER=C LC_NAME=C **** >> >>> >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** >> >>> >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** >> >>> >> >>> ** ** >> >>> >> >>> attached base packages:**** >> >>> >> >>> [1] grid parallel stats graphics grDevices utils >> >>> datasets * >> >>> *** >> >>> >> >>> [8] methods base **** >> >>> >> >>> ** ** >> >>> >> >>> other attached packages:**** >> >>> >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 >> >>> lattice_0.20-15 **** >> >>> >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 >> >>> ade4_1.5-2 **** >> >>> >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 >> >>> VariantAnnotation_1.7.34**** >> >>> >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 >> >>> GenomicRanges_1.13.33 **** >> >>> >> >>> [13] XVector_0.1.0 IRanges_1.19.18 >> >>> BiocGenerics_0.7.3 **** >> >>> >> >>> ** ** >> >>> >> >>> loaded via a namespace (and not attached):**** >> >>> >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 >> >>> **** >> >>> >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 >> >>> **** >> >>> >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 >> >>> Matrix_1.0-12 **** >> >>> >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 >> >>> RSQLite_0.11.4 **** >> >>> >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 >> >>> tools_3.0.1 **** >> >>> >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** >> >>> >> >>>> ** ** >> >>> >> >>> ** ** >> >>> >> >>> Thanks again,**** >> >>> >> >>> Sean**** >> >>> >> >>> ** ** >> >>> >> >>> *From:* Michael Lawrence [mailto:lawrence.michael at gene.com] >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM >> >>> *To:* Taylor, Sean D >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; >> >>> bioconductor at r-project.org >> >>> >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for >> >>> VariantTools**** >> >>> >> >>> ** ** >> >>> >> >>> ** ** >> >>> >> >>> ** ** >> >>> >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >> >>> wrote:**** >> >>> >> >>> Thanks Michael, **** >> >>> >> >>> This is an interesting idea. Usually, we resolve PCR/optical >> >>> duplicates >> >>> with the Picard MarkDuplicates command, which just chooses the read >> >>> with >> >>> the highest average quality. This approximates what you want, I think. >> >>> *** >> >>> * >> >>> >> >>> Is the Picard MarkDuplicates command run by default? Or is this a >> >>> separate >> >>> command from a different package?**** >> >>> >> >>> ** ** >> >>> >> >>> It's a separate package, basically the Java implementation of >> >>> samtools; >> >>> you might that first and see how it improves your error rates, before >> >>> taking a more complicated approach.**** >> >>> >> >>> Also, I find it hard to believe that sequence errors would be >> >>> causing >> >>> you trouble at 10%, as long as you're filtering for quality and have >> >>> decent >> >>> coverage. PCR might in limited cases, and Picard effectively takes >> >>> care of >> >>> that.**** >> >>> >> >>> 10% is fine, but we have problems at 1% and lower. I think even the >> >>> defaults in tallyVariants() use 1% as a cutoff if I?m not >> >>> mistaken.**** >> >>> >> >>> **** >> >>> >> >>> ** ** >> >>> >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues around >> >>> 1%. >> >>> Calling at such a low frequency is kind of ambitious.**** >> >>> >> >>> **** >> >>> >> >>> What you want is to specify the cycleBreaks argument to >> >>> VariantTallyParam. It allows you to define bins by cycle. So if you >> >>> made >> >>> bins for all 100bp, you could do things like generate a consensus by >> >>> read >> >>> family. A family corresponding to position X would be cycle bin #1 at >> >>> X, >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the >> >>> highest >> >>> count. Might be tricky to implement efficiently for the whole genome. >> >>> Of >> >>> course, this assumes you've got single end reads, otherwise this won't >> >>> work, because the mate is not considered.**** >> >>> >> >>> Cool, I will try that out, that may be just what I was looking for. It >> >>> might be tricky for the whole genome, but we are restricting ourselves >> >>> to >> >>> just the mitochondrial genome. Still gives us several thousand start >> >>> sites >> >>> to work from but should be easier than the whole genome. As for single >> >>> end >> >>> reads, can I just filter on the strand (i.e. ?+? or ?-?)? Or will I >> >>> have to >> >>> perform separate alignments for each of the paired reads?**** >> >>> >> >>> **** >> >>> >> >>> ** ** >> >>> >> >>> Now that I think about it, you should be fine just considering the >> >>> alignments individually (as if they were unpaired).**** >> >>> >> >>> **** >> >>> >> >>> ** ** >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> > -- >> > Computational Biology / Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N. >> > PO Box 19024 Seattle, WA 98109 >> > >> > Location: Arnold Building M1 B861 >> > Phone: (206) 667-2793 > >
ADD REPLY
0
Entering edit mode
The problem is pretty simple, the mtimes are arbitrary from any VCS, and autotools of course relies on those. When the same version of the tools are not available, things break. Ideally, the user would not need autotools AT ALL to build gmapR, but without mtime preservation, we can't avoid it. There are some sub-optimal solutions to this problem, like telling svn on the client side to use commit times for the mtimes (requires everyone to modify their svn config), or tell configure to ignore the mtimes for automake-generated files (would be a pain for the maintainers at least, who need the files regenerated when changed). The final one is to pick some version of autotools that everyone has and generate everything with that version, living with the autotools requirement and inefficiencies of arbitrary regeneration. We can try the last one. The cluster has automake 1.10; how about you guys? On Fri, Aug 2, 2013 at 10:18 AM, Dan Tenenbaum <dtenenba@fhcrc.org> wrote: > On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence > <lawrence.michael@gene.com> wrote: > > I will fix this today out of necessity to get it building on one of our > > clusters. It's not going to be fun. > > > > Thanks! If the solution could work with standard versions of automake > available on Ubuntu 12.04LTS that would be great, and mean that our > build systems would not need any special tweaking. > > Dan > > > > > > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor@fhcrc.org> > wrote: > >> > >> Martin, it always makes me feel better when I see you scratching your > head > >> too! ;) > >> > >> Sean > >> > >> Sent from my iPod > >> > >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan@fhcrc.org> wrote: > >> > >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: > >> >> The necessary update to VariantTools will propagate soon to devel. Or > >> >> you > >> >> can grab it from svn. > >> > > >> > This isn't building in devel; does it require something special with > >> > gmapR? Dan and I looked at this (??) for the conference, but were not > quite > >> > able to master the automake foo required to get off the ground. > >> > > >> > Martin > >> > > >> >> > >> >> Michael > >> >> > >> >> > >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor@fhcrc.org> > >> >> wrote: > >> >> > >> >>> In order to work through some of the code, I installed the devel > >> >>> version > >> >>> of R and updated all the packages. Now when I run tallyVariants() I > >> >>> get the > >> >>> following error message:**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > **** > >> >>> > >> >>> object 'castList' not found**** > >> >>> > >> >>> ** ** > >> >>> > >> >>>> traceback()**** > >> >>> > >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** > >> >>> > >> >>> 11: IRanges:::castList**** > >> >>> > >> >>> 10: safe_mclapply(ind, function(i, ...) {**** > >> >>> > >> >>> FUN(gr[i], ...)**** > >> >>> > >> >>> }, ...)**** > >> >>> > >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** > >> >>> > >> >>> 8: .local(x, ...)**** > >> >>> > >> >>> 7: tallyVariants(x, tally.param)**** > >> >>> > >> >>> 6: tallyVariants(x, tally.param)**** > >> >>> > >> >>> 5: .local(x, ...)**** > >> >>> > >> >>> 4: callVariants(BamFile(x), ...)**** > >> >>> > >> >>> 3: callVariants(BamFile(x), ...)**** > >> >>> > >> >>> 2: callVariants(destination, tally.param)**** > >> >>> > >> >>> 1: callVariants(destination, tally.param)**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> Perhaps this is something missing/changed in the devel version of > >> >>> IRanges? > >> >>> **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> Updated sessionInfo() below:**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> R version 3.0.1 (2013-05-16)**** > >> >>> > >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> locale:**** > >> >>> > >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** > >> >>> > >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** > >> >>> > >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** > >> >>> > >> >>> [7] LC_PAPER=C LC_NAME=C **** > >> >>> > >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** > >> >>> > >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> attached base packages:**** > >> >>> > >> >>> [1] grid parallel stats graphics grDevices utils > >> >>> datasets * > >> >>> *** > >> >>> > >> >>> [8] methods base **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> other attached packages:**** > >> >>> > >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 > >> >>> lattice_0.20-15 **** > >> >>> > >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 > >> >>> ade4_1.5-2 **** > >> >>> > >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 > >> >>> VariantAnnotation_1.7.34**** > >> >>> > >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 > >> >>> GenomicRanges_1.13.33 **** > >> >>> > >> >>> [13] XVector_0.1.0 IRanges_1.19.18 > >> >>> BiocGenerics_0.7.3 **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> loaded via a namespace (and not attached):**** > >> >>> > >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 > >> >>> **** > >> >>> > >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 > >> >>> **** > >> >>> > >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 > >> >>> Matrix_1.0-12 **** > >> >>> > >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 > >> >>> RSQLite_0.11.4 **** > >> >>> > >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 > >> >>> tools_3.0.1 **** > >> >>> > >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** > >> >>> > >> >>>> ** ** > >> >>> > >> >>> ** ** > >> >>> > >> >>> Thanks again,**** > >> >>> > >> >>> Sean**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> *From:* Michael Lawrence [mailto:lawrence.michael@gene.com] > >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM > >> >>> *To:* Taylor, Sean D > >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; > >> >>> bioconductor@r-project.org > >> >>> > >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for > >> >>> VariantTools**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> ** ** > >> >>> > >> >>> ** ** > >> >>> > >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D <sdtaylor@fhcrc.org> > > >> >>> wrote:**** > >> >>> > >> >>> Thanks Michael, **** > >> >>> > >> >>> This is an interesting idea. Usually, we resolve PCR/optical > >> >>> duplicates > >> >>> with the Picard MarkDuplicates command, which just chooses the read > >> >>> with > >> >>> the highest average quality. This approximates what you want, I > think. > >> >>> *** > >> >>> * > >> >>> > >> >>> Is the Picard MarkDuplicates command run by default? Or is this a > >> >>> separate > >> >>> command from a different package?**** > >> >>> > >> >>> ** ** > >> >>> > >> >>> It's a separate package, basically the Java implementation of > >> >>> samtools; > >> >>> you might that first and see how it improves your error rates, > before > >> >>> taking a more complicated approach.**** > >> >>> > >> >>> Also, I find it hard to believe that sequence errors would be > >> >>> causing > >> >>> you trouble at 10%, as long as you're filtering for quality and have > >> >>> decent > >> >>> coverage. PCR might in limited cases, and Picard effectively takes > >> >>> care of > >> >>> that.**** > >> >>> > >> >>> 10% is fine, but we have problems at 1% and lower. I think even the > >> >>> defaults in tallyVariants() use 1% as a cutoff if I’m not > >> >>> mistaken.**** > >> >>> > >> >>> **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues > around > >> >>> 1%. > >> >>> Calling at such a low frequency is kind of ambitious.**** > >> >>> > >> >>> **** > >> >>> > >> >>> What you want is to specify the cycleBreaks argument to > >> >>> VariantTallyParam. It allows you to define bins by cycle. So if you > >> >>> made > >> >>> bins for all 100bp, you could do things like generate a consensus by > >> >>> read > >> >>> family. A family corresponding to position X would be cycle bin #1 > at > >> >>> X, > >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the > >> >>> highest > >> >>> count. Might be tricky to implement efficiently for the whole > genome. > >> >>> Of > >> >>> course, this assumes you've got single end reads, otherwise this > won't > >> >>> work, because the mate is not considered.**** > >> >>> > >> >>> Cool, I will try that out, that may be just what I was looking for. > It > >> >>> might be tricky for the whole genome, but we are restricting > ourselves > >> >>> to > >> >>> just the mitochondrial genome. Still gives us several thousand start > >> >>> sites > >> >>> to work from but should be easier than the whole genome. As for > single > >> >>> end > >> >>> reads, can I just filter on the strand (i.e. ‘+’ or ‘-‘)? Or will I > >> >>> have to > >> >>> perform separate alignments for each of the paired reads?**** > >> >>> > >> >>> **** > >> >>> > >> >>> ** ** > >> >>> > >> >>> Now that I think about it, you should be fine just considering the > >> >>> alignments individually (as if they were unpaired).**** > >> >>> > >> >>> **** > >> >>> > >> >>> ** ** > >> >> > >> >> [[alternative HTML version deleted]] > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > >> > > >> > -- > >> > Computational Biology / Fred Hutchinson Cancer Research Center > >> > 1100 Fairview Ave. N. > >> > PO Box 19024 Seattle, WA 98109 > >> > > >> > Location: Arnold Building M1 B861 > >> > Phone: (206) 667-2793 > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Fri, Aug 2, 2013 at 11:12 AM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: > The problem is pretty simple, the mtimes are arbitrary from any VCS, and > autotools of course relies on those. When the same version of the tools are > not available, things break. Ideally, the user would not need autotools AT > ALL to build gmapR, but without mtime preservation, we can't avoid it. > > There are some sub-optimal solutions to this problem, like telling svn on > the client side to use commit times for the mtimes (requires everyone to > modify their svn config), or tell configure to ignore the mtimes for > automake-generated files (would be a pain for the maintainers at least, who > need the files regenerated when changed). The final one is to pick some > version of autotools that everyone has and generate everything with that > version, living with the autotools requirement and inefficiencies of > arbitrary regeneration. > > We can try the last one. The cluster has automake 1.10; how about you guys? > > 1.11.3. Somehow in the last go-round I thought you were using a higher version. Dan > > > > On Fri, Aug 2, 2013 at 10:18 AM, Dan Tenenbaum <dtenenba at="" fhcrc.org=""> wrote: >> >> On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence >> <lawrence.michael at="" gene.com=""> wrote: >> > I will fix this today out of necessity to get it building on one of our >> > clusters. It's not going to be fun. >> > >> >> Thanks! If the solution could work with standard versions of automake >> available on Ubuntu 12.04LTS that would be great, and mean that our >> build systems would not need any special tweaking. >> >> Dan >> >> >> > >> > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >> > wrote: >> >> >> >> Martin, it always makes me feel better when I see you scratching your >> >> head >> >> too! ;) >> >> >> >> Sean >> >> >> >> Sent from my iPod >> >> >> >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan at="" fhcrc.org=""> wrote: >> >> >> >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: >> >> >> The necessary update to VariantTools will propagate soon to devel. >> >> >> Or >> >> >> you >> >> >> can grab it from svn. >> >> > >> >> > This isn't building in devel; does it require something special with >> >> > gmapR? Dan and I looked at this (??) for the conference, but were not >> >> > quite >> >> > able to master the automake foo required to get off the ground. >> >> > >> >> > Martin >> >> > >> >> >> >> >> >> Michael >> >> >> >> >> >> >> >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D <sdtaylor at="" fhcrc.org=""> >> >> >> wrote: >> >> >> >> >> >>> In order to work through some of the code, I installed the devel >> >> >>> version >> >> >>> of R and updated all the packages. Now when I run tallyVariants() I >> >> >>> get the >> >> >>> following error message:**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> >> >>> **** >> >> >>> >> >> >>> object 'castList' not found**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>>> traceback()**** >> >> >>> >> >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** >> >> >>> >> >> >>> 11: IRanges:::castList**** >> >> >>> >> >> >>> 10: safe_mclapply(ind, function(i, ...) {**** >> >> >>> >> >> >>> FUN(gr[i], ...)**** >> >> >>> >> >> >>> }, ...)**** >> >> >>> >> >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = mc.cores)**** >> >> >>> >> >> >>> 8: .local(x, ...)**** >> >> >>> >> >> >>> 7: tallyVariants(x, tally.param)**** >> >> >>> >> >> >>> 6: tallyVariants(x, tally.param)**** >> >> >>> >> >> >>> 5: .local(x, ...)**** >> >> >>> >> >> >>> 4: callVariants(BamFile(x), ...)**** >> >> >>> >> >> >>> 3: callVariants(BamFile(x), ...)**** >> >> >>> >> >> >>> 2: callVariants(destination, tally.param)**** >> >> >>> >> >> >>> 1: callVariants(destination, tally.param)**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> Perhaps this is something missing/changed in the devel version of >> >> >>> IRanges? >> >> >>> **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> Updated sessionInfo() below:**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> R version 3.0.1 (2013-05-16)**** >> >> >>> >> >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> locale:**** >> >> >>> >> >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** >> >> >>> >> >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** >> >> >>> >> >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** >> >> >>> >> >> >>> [7] LC_PAPER=C LC_NAME=C **** >> >> >>> >> >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** >> >> >>> >> >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> attached base packages:**** >> >> >>> >> >> >>> [1] grid parallel stats graphics grDevices utils >> >> >>> datasets * >> >> >>> *** >> >> >>> >> >> >>> [8] methods base **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> other attached packages:**** >> >> >>> >> >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 >> >> >>> lattice_0.20-15 **** >> >> >>> >> >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 >> >> >>> ade4_1.5-2 **** >> >> >>> >> >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 >> >> >>> VariantAnnotation_1.7.34**** >> >> >>> >> >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 >> >> >>> GenomicRanges_1.13.33 **** >> >> >>> >> >> >>> [13] XVector_0.1.0 IRanges_1.19.18 >> >> >>> BiocGenerics_0.7.3 **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> loaded via a namespace (and not attached):**** >> >> >>> >> >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 >> >> >>> **** >> >> >>> >> >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 >> >> >>> **** >> >> >>> >> >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 >> >> >>> Matrix_1.0-12 **** >> >> >>> >> >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 >> >> >>> RSQLite_0.11.4 **** >> >> >>> >> >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 >> >> >>> tools_3.0.1 **** >> >> >>> >> >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** >> >> >>> >> >> >>>> ** ** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> Thanks again,**** >> >> >>> >> >> >>> Sean**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> *From:* Michael Lawrence [mailto:lawrence.michael at gene.com] >> >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM >> >> >>> *To:* Taylor, Sean D >> >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; >> >> >>> bioconductor at r-project.org >> >> >>> >> >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for >> >> >>> VariantTools**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D >> >> >>> <sdtaylor at="" fhcrc.org=""> >> >> >>> wrote:**** >> >> >>> >> >> >>> Thanks Michael, **** >> >> >>> >> >> >>> This is an interesting idea. Usually, we resolve PCR/optical >> >> >>> duplicates >> >> >>> with the Picard MarkDuplicates command, which just chooses the read >> >> >>> with >> >> >>> the highest average quality. This approximates what you want, I >> >> >>> think. >> >> >>> *** >> >> >>> * >> >> >>> >> >> >>> Is the Picard MarkDuplicates command run by default? Or is this a >> >> >>> separate >> >> >>> command from a different package?**** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> It's a separate package, basically the Java implementation of >> >> >>> samtools; >> >> >>> you might that first and see how it improves your error rates, >> >> >>> before >> >> >>> taking a more complicated approach.**** >> >> >>> >> >> >>> Also, I find it hard to believe that sequence errors would be >> >> >>> causing >> >> >>> you trouble at 10%, as long as you're filtering for quality and >> >> >>> have >> >> >>> decent >> >> >>> coverage. PCR might in limited cases, and Picard effectively takes >> >> >>> care of >> >> >>> that.**** >> >> >>> >> >> >>> 10% is fine, but we have problems at 1% and lower. I think even the >> >> >>> defaults in tallyVariants() use 1% as a cutoff if I?m not >> >> >>> mistaken.**** >> >> >>> >> >> >>> **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues >> >> >>> around >> >> >>> 1%. >> >> >>> Calling at such a low frequency is kind of ambitious.**** >> >> >>> >> >> >>> **** >> >> >>> >> >> >>> What you want is to specify the cycleBreaks argument to >> >> >>> VariantTallyParam. It allows you to define bins by cycle. So if you >> >> >>> made >> >> >>> bins for all 100bp, you could do things like generate a consensus >> >> >>> by >> >> >>> read >> >> >>> family. A family corresponding to position X would be cycle bin #1 >> >> >>> at >> >> >>> X, >> >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the >> >> >>> highest >> >> >>> count. Might be tricky to implement efficiently for the whole >> >> >>> genome. >> >> >>> Of >> >> >>> course, this assumes you've got single end reads, otherwise this >> >> >>> won't >> >> >>> work, because the mate is not considered.**** >> >> >>> >> >> >>> Cool, I will try that out, that may be just what I was looking for. >> >> >>> It >> >> >>> might be tricky for the whole genome, but we are restricting >> >> >>> ourselves >> >> >>> to >> >> >>> just the mitochondrial genome. Still gives us several thousand >> >> >>> start >> >> >>> sites >> >> >>> to work from but should be easier than the whole genome. As for >> >> >>> single >> >> >>> end >> >> >>> reads, can I just filter on the strand (i.e. ?+? or ?-?)? Or will I >> >> >>> have to >> >> >>> perform separate alignments for each of the paired reads?**** >> >> >>> >> >> >>> **** >> >> >>> >> >> >>> ** ** >> >> >>> >> >> >>> Now that I think about it, you should be fine just considering the >> >> >>> alignments individually (as if they were unpaired).**** >> >> >>> >> >> >>> **** >> >> >>> >> >> >>> ** ** >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> Bioconductor mailing list >> >> >> Bioconductor at r-project.org >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> Search the archives: >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >> >> > >> >> > -- >> >> > Computational Biology / Fred Hutchinson Cancer Research Center >> >> > 1100 Fairview Ave. N. >> >> > PO Box 19024 Seattle, WA 98109 >> >> > >> >> > Location: Arnold Building M1 B861 >> >> > Phone: (206) 667-2793 >> > >> > > >
ADD REPLY
0
Entering edit mode
I am completely baffled. Usually automake is used once to generate portable Makefiles etc. on a single machine. These files are then distributed. No user should ever have to run automake thermselves; it is only re-run when the source files are modified. Kasper On Fri, Aug 2, 2013 at 2:36 PM, Dan Tenenbaum <dtenenba@fhcrc.org> wrote: > On Fri, Aug 2, 2013 at 11:12 AM, Michael Lawrence > <lawrence.michael@gene.com> wrote: > > The problem is pretty simple, the mtimes are arbitrary from any VCS, and > > autotools of course relies on those. When the same version of the tools > are > > not available, things break. Ideally, the user would not need autotools > AT > > ALL to build gmapR, but without mtime preservation, we can't avoid it. > > > > There are some sub-optimal solutions to this problem, like telling svn on > > the client side to use commit times for the mtimes (requires everyone to > > modify their svn config), or tell configure to ignore the mtimes for > > automake-generated files (would be a pain for the maintainers at least, > who > > need the files regenerated when changed). The final one is to pick some > > version of autotools that everyone has and generate everything with that > > version, living with the autotools requirement and inefficiencies of > > arbitrary regeneration. > > > > We can try the last one. The cluster has automake 1.10; how about you > guys? > > > > > > 1.11.3. Somehow in the last go-round I thought you were using a higher > version. > > Dan > > > > > > > > > > On Fri, Aug 2, 2013 at 10:18 AM, Dan Tenenbaum <dtenenba@fhcrc.org> > wrote: > >> > >> On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence > >> <lawrence.michael@gene.com> wrote: > >> > I will fix this today out of necessity to get it building on one of > our > >> > clusters. It's not going to be fun. > >> > > >> > >> Thanks! If the solution could work with standard versions of automake > >> available on Ubuntu 12.04LTS that would be great, and mean that our > >> build systems would not need any special tweaking. > >> > >> Dan > >> > >> > >> > > >> > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor@fhcrc.org> > >> > wrote: > >> >> > >> >> Martin, it always makes me feel better when I see you scratching your > >> >> head > >> >> too! ;) > >> >> > >> >> Sean > >> >> > >> >> Sent from my iPod > >> >> > >> >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan@fhcrc.org> > wrote: > >> >> > >> >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: > >> >> >> The necessary update to VariantTools will propagate soon to devel. > >> >> >> Or > >> >> >> you > >> >> >> can grab it from svn. > >> >> > > >> >> > This isn't building in devel; does it require something special > with > >> >> > gmapR? Dan and I looked at this (??) for the conference, but were > not > >> >> > quite > >> >> > able to master the automake foo required to get off the ground. > >> >> > > >> >> > Martin > >> >> > > >> >> >> > >> >> >> Michael > >> >> >> > >> >> >> > >> >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D < > sdtaylor@fhcrc.org> > >> >> >> wrote: > >> >> >> > >> >> >>> In order to work through some of the code, I installed the devel > >> >> >>> version > >> >> >>> of R and updated all the packages. Now when I run > tallyVariants() I > >> >> >>> get the > >> >> >>> following error message:**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > >> >> >>> **** > >> >> >>> > >> >> >>> object 'castList' not found**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>>> traceback()**** > >> >> >>> > >> >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** > >> >> >>> > >> >> >>> 11: IRanges:::castList**** > >> >> >>> > >> >> >>> 10: safe_mclapply(ind, function(i, ...) {**** > >> >> >>> > >> >> >>> FUN(gr[i], ...)**** > >> >> >>> > >> >> >>> }, ...)**** > >> >> >>> > >> >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = > mc.cores)**** > >> >> >>> > >> >> >>> 8: .local(x, ...)**** > >> >> >>> > >> >> >>> 7: tallyVariants(x, tally.param)**** > >> >> >>> > >> >> >>> 6: tallyVariants(x, tally.param)**** > >> >> >>> > >> >> >>> 5: .local(x, ...)**** > >> >> >>> > >> >> >>> 4: callVariants(BamFile(x), ...)**** > >> >> >>> > >> >> >>> 3: callVariants(BamFile(x), ...)**** > >> >> >>> > >> >> >>> 2: callVariants(destination, tally.param)**** > >> >> >>> > >> >> >>> 1: callVariants(destination, tally.param)**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> Perhaps this is something missing/changed in the devel version of > >> >> >>> IRanges? > >> >> >>> **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> Updated sessionInfo() below:**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> R version 3.0.1 (2013-05-16)**** > >> >> >>> > >> >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> locale:**** > >> >> >>> > >> >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** > >> >> >>> > >> >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** > >> >> >>> > >> >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** > >> >> >>> > >> >> >>> [7] LC_PAPER=C LC_NAME=C **** > >> >> >>> > >> >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** > >> >> >>> > >> >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> attached base packages:**** > >> >> >>> > >> >> >>> [1] grid parallel stats graphics grDevices utils > >> >> >>> datasets * > >> >> >>> *** > >> >> >>> > >> >> >>> [8] methods base **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> other attached packages:**** > >> >> >>> > >> >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 > >> >> >>> lattice_0.20-15 **** > >> >> >>> > >> >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 > >> >> >>> ade4_1.5-2 **** > >> >> >>> > >> >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 > >> >> >>> VariantAnnotation_1.7.34**** > >> >> >>> > >> >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 > >> >> >>> GenomicRanges_1.13.33 **** > >> >> >>> > >> >> >>> [13] XVector_0.1.0 IRanges_1.19.18 > >> >> >>> BiocGenerics_0.7.3 **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> loaded via a namespace (and not attached):**** > >> >> >>> > >> >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 > biomaRt_2.17.2 > >> >> >>> **** > >> >> >>> > >> >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 > >> >> >>> **** > >> >> >>> > >> >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 > >> >> >>> Matrix_1.0-12 **** > >> >> >>> > >> >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 > >> >> >>> RSQLite_0.11.4 **** > >> >> >>> > >> >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 > >> >> >>> tools_3.0.1 **** > >> >> >>> > >> >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** > >> >> >>> > >> >> >>>> ** ** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> Thanks again,**** > >> >> >>> > >> >> >>> Sean**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> *From:* Michael Lawrence [mailto:lawrence.michael@gene.com] > >> >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM > >> >> >>> *To:* Taylor, Sean D > >> >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; > >> >> >>> bioconductor@r-project.org > >> >> >>> > >> >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for > >> >> >>> VariantTools**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D > >> >> >>> <sdtaylor@fhcrc.org> > >> >> >>> wrote:**** > >> >> >>> > >> >> >>> Thanks Michael, **** > >> >> >>> > >> >> >>> This is an interesting idea. Usually, we resolve PCR/optical > >> >> >>> duplicates > >> >> >>> with the Picard MarkDuplicates command, which just chooses the > read > >> >> >>> with > >> >> >>> the highest average quality. This approximates what you want, I > >> >> >>> think. > >> >> >>> *** > >> >> >>> * > >> >> >>> > >> >> >>> Is the Picard MarkDuplicates command run by default? Or is this a > >> >> >>> separate > >> >> >>> command from a different package?**** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> It's a separate package, basically the Java implementation of > >> >> >>> samtools; > >> >> >>> you might that first and see how it improves your error rates, > >> >> >>> before > >> >> >>> taking a more complicated approach.**** > >> >> >>> > >> >> >>> Also, I find it hard to believe that sequence errors would be > >> >> >>> causing > >> >> >>> you trouble at 10%, as long as you're filtering for quality and > >> >> >>> have > >> >> >>> decent > >> >> >>> coverage. PCR might in limited cases, and Picard effectively > takes > >> >> >>> care of > >> >> >>> that.**** > >> >> >>> > >> >> >>> 10% is fine, but we have problems at 1% and lower. I think even > the > >> >> >>> defaults in tallyVariants() use 1% as a cutoff if I’m not > >> >> >>> mistaken.**** > >> >> >>> > >> >> >>> **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues > >> >> >>> around > >> >> >>> 1%. > >> >> >>> Calling at such a low frequency is kind of ambitious.**** > >> >> >>> > >> >> >>> **** > >> >> >>> > >> >> >>> What you want is to specify the cycleBreaks argument to > >> >> >>> VariantTallyParam. It allows you to define bins by cycle. So if > you > >> >> >>> made > >> >> >>> bins for all 100bp, you could do things like generate a consensus > >> >> >>> by > >> >> >>> read > >> >> >>> family. A family corresponding to position X would be cycle bin > #1 > >> >> >>> at > >> >> >>> X, > >> >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with > the > >> >> >>> highest > >> >> >>> count. Might be tricky to implement efficiently for the whole > >> >> >>> genome. > >> >> >>> Of > >> >> >>> course, this assumes you've got single end reads, otherwise this > >> >> >>> won't > >> >> >>> work, because the mate is not considered.**** > >> >> >>> > >> >> >>> Cool, I will try that out, that may be just what I was looking > for. > >> >> >>> It > >> >> >>> might be tricky for the whole genome, but we are restricting > >> >> >>> ourselves > >> >> >>> to > >> >> >>> just the mitochondrial genome. Still gives us several thousand > >> >> >>> start > >> >> >>> sites > >> >> >>> to work from but should be easier than the whole genome. As for > >> >> >>> single > >> >> >>> end > >> >> >>> reads, can I just filter on the strand (i.e. ‘+’ or ‘-‘)? Or > will I > >> >> >>> have to > >> >> >>> perform separate alignments for each of the paired reads?**** > >> >> >>> > >> >> >>> **** > >> >> >>> > >> >> >>> ** ** > >> >> >>> > >> >> >>> Now that I think about it, you should be fine just considering > the > >> >> >>> alignments individually (as if they were unpaired).**** > >> >> >>> > >> >> >>> **** > >> >> >>> > >> >> >>> ** ** > >> >> >> > >> >> >> [[alternative HTML version deleted]] > >> >> >> > >> >> >> > >> >> >> > >> >> >> _______________________________________________ > >> >> >> Bioconductor mailing list > >> >> >> Bioconductor@r-project.org > >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> >> Search the archives: > >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > > >> >> > > >> >> > -- > >> >> > Computational Biology / Fred Hutchinson Cancer Research Center > >> >> > 1100 Fairview Ave. N. > >> >> > PO Box 19024 Seattle, WA 98109 > >> >> > > >> >> > Location: Arnold Building M1 B861 > >> >> > Phone: (206) 667-2793 > >> > > >> > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I should clarify that this does not affect users of the tarball. Just people who grab it from svn, which screws up the mtimes. Those are typically developers, so it usually doesn't matter. Anyway, the problem should now be resolved by using the automake maintainer mode macro which lets me pass --disable-maintainer-mode to the gstruct/gmap configure scripts. That configures the makefiles to ignore the dependencies of the maintainer generated files. The latest revision in svn (1.3.4) should build for people. On Fri, Aug 2, 2013 at 12:25 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > I am completely baffled. Usually automake is used once to generate > portable Makefiles etc. on a single machine. These files are then > distributed. No user should ever have to run automake thermselves; it is > only re-run when the source files are modified. > > Kasper > > > On Fri, Aug 2, 2013 at 2:36 PM, Dan Tenenbaum <dtenenba@fhcrc.org> wrote: > >> On Fri, Aug 2, 2013 at 11:12 AM, Michael Lawrence >> <lawrence.michael@gene.com> wrote: >> > The problem is pretty simple, the mtimes are arbitrary from any VCS, and >> > autotools of course relies on those. When the same version of the tools >> are >> > not available, things break. Ideally, the user would not need autotools >> AT >> > ALL to build gmapR, but without mtime preservation, we can't avoid it. >> > >> > There are some sub-optimal solutions to this problem, like telling svn >> on >> > the client side to use commit times for the mtimes (requires everyone to >> > modify their svn config), or tell configure to ignore the mtimes for >> > automake-generated files (would be a pain for the maintainers at least, >> who >> > need the files regenerated when changed). The final one is to pick some >> > version of autotools that everyone has and generate everything with that >> > version, living with the autotools requirement and inefficiencies of >> > arbitrary regeneration. >> > >> > We can try the last one. The cluster has automake 1.10; how about you >> guys? >> > >> > >> >> 1.11.3. Somehow in the last go-round I thought you were using a higher >> version. >> >> Dan >> >> >> > >> > >> > >> > On Fri, Aug 2, 2013 at 10:18 AM, Dan Tenenbaum <dtenenba@fhcrc.org> >> wrote: >> >> >> >> On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence >> >> <lawrence.michael@gene.com> wrote: >> >> > I will fix this today out of necessity to get it building on one of >> our >> >> > clusters. It's not going to be fun. >> >> > >> >> >> >> Thanks! If the solution could work with standard versions of automake >> >> available on Ubuntu 12.04LTS that would be great, and mean that our >> >> build systems would not need any special tweaking. >> >> >> >> Dan >> >> >> >> >> >> > >> >> > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor@fhcrc.org> >> >> > wrote: >> >> >> >> >> >> Martin, it always makes me feel better when I see you scratching >> your >> >> >> head >> >> >> too! ;) >> >> >> >> >> >> Sean >> >> >> >> >> >> Sent from my iPod >> >> >> >> >> >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan@fhcrc.org> >> wrote: >> >> >> >> >> >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: >> >> >> >> The necessary update to VariantTools will propagate soon to >> devel. >> >> >> >> Or >> >> >> >> you >> >> >> >> can grab it from svn. >> >> >> > >> >> >> > This isn't building in devel; does it require something special >> with >> >> >> > gmapR? Dan and I looked at this (??) for the conference, but were >> not >> >> >> > quite >> >> >> > able to master the automake foo required to get off the ground. >> >> >> > >> >> >> > Martin >> >> >> > >> >> >> >> >> >> >> >> Michael >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D < >> sdtaylor@fhcrc.org> >> >> >> >> wrote: >> >> >> >> >> >> >> >>> In order to work through some of the code, I installed the >> devel >> >> >> >>> version >> >> >> >>> of R and updated all the packages. Now when I run >> tallyVariants() I >> >> >> >>> get the >> >> >> >>> following error message:**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> >> >> >>> **** >> >> >> >>> >> >> >> >>> object 'castList' not found**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>>> traceback()**** >> >> >> >>> >> >> >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** >> >> >> >>> >> >> >> >>> 11: IRanges:::castList**** >> >> >> >>> >> >> >> >>> 10: safe_mclapply(ind, function(i, ...) {**** >> >> >> >>> >> >> >> >>> FUN(gr[i], ...)**** >> >> >> >>> >> >> >> >>> }, ...)**** >> >> >> >>> >> >> >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = >> mc.cores)**** >> >> >> >>> >> >> >> >>> 8: .local(x, ...)**** >> >> >> >>> >> >> >> >>> 7: tallyVariants(x, tally.param)**** >> >> >> >>> >> >> >> >>> 6: tallyVariants(x, tally.param)**** >> >> >> >>> >> >> >> >>> 5: .local(x, ...)**** >> >> >> >>> >> >> >> >>> 4: callVariants(BamFile(x), ...)**** >> >> >> >>> >> >> >> >>> 3: callVariants(BamFile(x), ...)**** >> >> >> >>> >> >> >> >>> 2: callVariants(destination, tally.param)**** >> >> >> >>> >> >> >> >>> 1: callVariants(destination, tally.param)**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> Perhaps this is something missing/changed in the devel version >> of >> >> >> >>> IRanges? >> >> >> >>> **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> Updated sessionInfo() below:**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> R version 3.0.1 (2013-05-16)**** >> >> >> >>> >> >> >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> locale:**** >> >> >> >>> >> >> >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** >> >> >> >>> >> >> >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** >> >> >> >>> >> >> >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** >> >> >> >>> >> >> >> >>> [7] LC_PAPER=C LC_NAME=C **** >> >> >> >>> >> >> >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** >> >> >> >>> >> >> >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> attached base packages:**** >> >> >> >>> >> >> >> >>> [1] grid parallel stats graphics grDevices utils >> >> >> >>> datasets * >> >> >> >>> *** >> >> >> >>> >> >> >> >>> [8] methods base **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> other attached packages:**** >> >> >> >>> >> >> >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 >> >> >> >>> lattice_0.20-15 **** >> >> >> >>> >> >> >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 >> >> >> >>> ade4_1.5-2 **** >> >> >> >>> >> >> >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 >> >> >> >>> VariantAnnotation_1.7.34**** >> >> >> >>> >> >> >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 >> >> >> >>> GenomicRanges_1.13.33 **** >> >> >> >>> >> >> >> >>> [13] XVector_0.1.0 IRanges_1.19.18 >> >> >> >>> BiocGenerics_0.7.3 **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> loaded via a namespace (and not attached):**** >> >> >> >>> >> >> >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 >> biomaRt_2.17.2 >> >> >> >>> **** >> >> >> >>> >> >> >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 >> >> >> >>> **** >> >> >> >>> >> >> >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 >> >> >> >>> Matrix_1.0-12 **** >> >> >> >>> >> >> >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 >> >> >> >>> RSQLite_0.11.4 **** >> >> >> >>> >> >> >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 >> >> >> >>> tools_3.0.1 **** >> >> >> >>> >> >> >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** >> >> >> >>> >> >> >> >>>> ** ** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> Thanks again,**** >> >> >> >>> >> >> >> >>> Sean**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> *From:* Michael Lawrence [mailto:lawrence.michael@gene.com] >> >> >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM >> >> >> >>> *To:* Taylor, Sean D >> >> >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; >> >> >> >>> bioconductor@r-project.org >> >> >> >>> >> >> >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for >> >> >> >>> VariantTools**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D >> >> >> >>> <sdtaylor@fhcrc.org> >> >> >> >>> wrote:**** >> >> >> >>> >> >> >> >>> Thanks Michael, **** >> >> >> >>> >> >> >> >>> This is an interesting idea. Usually, we resolve PCR/optical >> >> >> >>> duplicates >> >> >> >>> with the Picard MarkDuplicates command, which just chooses the >> read >> >> >> >>> with >> >> >> >>> the highest average quality. This approximates what you want, I >> >> >> >>> think. >> >> >> >>> *** >> >> >> >>> * >> >> >> >>> >> >> >> >>> Is the Picard MarkDuplicates command run by default? Or is this >> a >> >> >> >>> separate >> >> >> >>> command from a different package?**** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> It's a separate package, basically the Java implementation of >> >> >> >>> samtools; >> >> >> >>> you might that first and see how it improves your error rates, >> >> >> >>> before >> >> >> >>> taking a more complicated approach.**** >> >> >> >>> >> >> >> >>> Also, I find it hard to believe that sequence errors would >> be >> >> >> >>> causing >> >> >> >>> you trouble at 10%, as long as you're filtering for quality and >> >> >> >>> have >> >> >> >>> decent >> >> >> >>> coverage. PCR might in limited cases, and Picard effectively >> takes >> >> >> >>> care of >> >> >> >>> that.**** >> >> >> >>> >> >> >> >>> 10% is fine, but we have problems at 1% and lower. I think even >> the >> >> >> >>> defaults in tallyVariants() use 1% as a cutoff if I’m not >> >> >> >>> mistaken.**** >> >> >> >>> >> >> >> >>> **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues >> >> >> >>> around >> >> >> >>> 1%. >> >> >> >>> Calling at such a low frequency is kind of ambitious.**** >> >> >> >>> >> >> >> >>> **** >> >> >> >>> >> >> >> >>> What you want is to specify the cycleBreaks argument to >> >> >> >>> VariantTallyParam. It allows you to define bins by cycle. So if >> you >> >> >> >>> made >> >> >> >>> bins for all 100bp, you could do things like generate a >> consensus >> >> >> >>> by >> >> >> >>> read >> >> >> >>> family. A family corresponding to position X would be cycle bin >> #1 >> >> >> >>> at >> >> >> >>> X, >> >> >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with >> the >> >> >> >>> highest >> >> >> >>> count. Might be tricky to implement efficiently for the whole >> >> >> >>> genome. >> >> >> >>> Of >> >> >> >>> course, this assumes you've got single end reads, otherwise this >> >> >> >>> won't >> >> >> >>> work, because the mate is not considered.**** >> >> >> >>> >> >> >> >>> Cool, I will try that out, that may be just what I was looking >> for. >> >> >> >>> It >> >> >> >>> might be tricky for the whole genome, but we are restricting >> >> >> >>> ourselves >> >> >> >>> to >> >> >> >>> just the mitochondrial genome. Still gives us several thousand >> >> >> >>> start >> >> >> >>> sites >> >> >> >>> to work from but should be easier than the whole genome. As for >> >> >> >>> single >> >> >> >>> end >> >> >> >>> reads, can I just filter on the strand (i.e. ‘+’ or ‘-‘)? Or >> will I >> >> >> >>> have to >> >> >> >>> perform separate alignments for each of the paired reads?**** >> >> >> >>> >> >> >> >>> **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >>> >> >> >> >>> Now that I think about it, you should be fine just considering >> the >> >> >> >>> alignments individually (as if they were unpaired).**** >> >> >> >>> >> >> >> >>> **** >> >> >> >>> >> >> >> >>> ** ** >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> >> Bioconductor mailing list >> >> >> >> Bioconductor@r-project.org >> >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> >> Search the archives: >> >> >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Computational Biology / Fred Hutchinson Cancer Research Center >> >> >> > 1100 Fairview Ave. N. >> >> >> > PO Box 19024 Seattle, WA 98109 >> >> >> > >> >> >> > Location: Arnold Building M1 B861 >> >> >> > Phone: (206) 667-2793 >> >> > >> >> > >> > >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
fwiw, automake1.14 works, while 1.10 did not, but is the default on most modern Linuces I probably should have mentioned that this was a pain but overcome by running autoconf on my end... I was excited enough about VariantTools to bite the bullet and patch it. oopsie On Fri, Aug 2, 2013 at 11:12 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > The problem is pretty simple, the mtimes are arbitrary from any VCS, and > autotools of course relies on those. When the same version of the tools are > not available, things break. Ideally, the user would not need autotools AT > ALL to build gmapR, but without mtime preservation, we can't avoid it. > > There are some sub-optimal solutions to this problem, like telling svn on > the client side to use commit times for the mtimes (requires everyone to > modify their svn config), or tell configure to ignore the mtimes for > automake-generated files (would be a pain for the maintainers at least, who > need the files regenerated when changed). The final one is to pick some > version of autotools that everyone has and generate everything with that > version, living with the autotools requirement and inefficiencies of > arbitrary regeneration. > > We can try the last one. The cluster has automake 1.10; how about you guys? > > > > > > On Fri, Aug 2, 2013 at 10:18 AM, Dan Tenenbaum <dtenenba@fhcrc.org> wrote: > > > On Fri, Aug 2, 2013 at 7:27 AM, Michael Lawrence > > <lawrence.michael@gene.com> wrote: > > > I will fix this today out of necessity to get it building on one of our > > > clusters. It's not going to be fun. > > > > > > > Thanks! If the solution could work with standard versions of automake > > available on Ubuntu 12.04LTS that would be great, and mean that our > > build systems would not need any special tweaking. > > > > Dan > > > > > > > > > > On Fri, Aug 2, 2013 at 7:08 AM, Taylor, Sean D <sdtaylor@fhcrc.org> > > wrote: > > >> > > >> Martin, it always makes me feel better when I see you scratching your > > head > > >> too! ;) > > >> > > >> Sean > > >> > > >> Sent from my iPod > > >> > > >> On Aug 1, 2013, at 6:02 PM, "Martin Morgan" <mtmorgan@fhcrc.org> > wrote: > > >> > > >> > On 07/16/2013 05:40 PM, Michael Lawrence wrote: > > >> >> The necessary update to VariantTools will propagate soon to devel. > Or > > >> >> you > > >> >> can grab it from svn. > > >> > > > >> > This isn't building in devel; does it require something special with > > >> > gmapR? Dan and I looked at this (??) for the conference, but were > not > > quite > > >> > able to master the automake foo required to get off the ground. > > >> > > > >> > Martin > > >> > > > >> >> > > >> >> Michael > > >> >> > > >> >> > > >> >> On Tue, Jul 16, 2013 at 3:27 PM, Taylor, Sean D < > sdtaylor@fhcrc.org> > > >> >> wrote: > > >> >> > > >> >>> In order to work through some of the code, I installed the devel > > >> >>> version > > >> >>> of R and updated all the packages. Now when I run tallyVariants() > I > > >> >>> get the > > >> >>> following error message:**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > > **** > > >> >>> > > >> >>> object 'castList' not found**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>>> traceback()**** > > >> >>> > > >> >>> 12: get(name, envir = asNamespace(pkg), inherits = FALSE)**** > > >> >>> > > >> >>> 11: IRanges:::castList**** > > >> >>> > > >> >>> 10: safe_mclapply(ind, function(i, ...) {**** > > >> >>> > > >> >>> FUN(gr[i], ...)**** > > >> >>> > > >> >>> }, ...)**** > > >> >>> > > >> >>> 9: applyByChromosome(si, bam_tally_region, mc.cores = > mc.cores)**** > > >> >>> > > >> >>> 8: .local(x, ...)**** > > >> >>> > > >> >>> 7: tallyVariants(x, tally.param)**** > > >> >>> > > >> >>> 6: tallyVariants(x, tally.param)**** > > >> >>> > > >> >>> 5: .local(x, ...)**** > > >> >>> > > >> >>> 4: callVariants(BamFile(x), ...)**** > > >> >>> > > >> >>> 3: callVariants(BamFile(x), ...)**** > > >> >>> > > >> >>> 2: callVariants(destination, tally.param)**** > > >> >>> > > >> >>> 1: callVariants(destination, tally.param)**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> Perhaps this is something missing/changed in the devel version of > > >> >>> IRanges? > > >> >>> **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> Updated sessionInfo() below:**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> R version 3.0.1 (2013-05-16)**** > > >> >>> > > >> >>> Platform: x86_64-unknown-linux-gnu (64-bit)**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> locale:**** > > >> >>> > > >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C **** > > >> >>> > > >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 **** > > >> >>> > > >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 **** > > >> >>> > > >> >>> [7] LC_PAPER=C LC_NAME=C **** > > >> >>> > > >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C **** > > >> >>> > > >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> attached base packages:**** > > >> >>> > > >> >>> [1] grid parallel stats graphics grDevices utils > > >> >>> datasets * > > >> >>> *** > > >> >>> > > >> >>> [8] methods base **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> other attached packages:**** > > >> >>> > > >> >>> [1] BiocInstaller_1.11.3 latticeExtra_0.6-24 > > >> >>> lattice_0.20-15 **** > > >> >>> > > >> >>> [4] RColorBrewer_1.0-5 genoPlotR_0.8 > > >> >>> ade4_1.5-2 **** > > >> >>> > > >> >>> [7] gmapR_1.3.2 VariantTools_1.3.2 > > >> >>> VariantAnnotation_1.7.34**** > > >> >>> > > >> >>> [10] Rsamtools_1.13.24 Biostrings_2.29.13 > > >> >>> GenomicRanges_1.13.33 **** > > >> >>> > > >> >>> [13] XVector_0.1.0 IRanges_1.19.18 > > >> >>> BiocGenerics_0.7.3 **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> loaded via a namespace (and not attached):**** > > >> >>> > > >> >>> [1] AnnotationDbi_1.23.16 Biobase_2.20.0 biomaRt_2.17.2 > > >> >>> **** > > >> >>> > > >> >>> [4] bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-7 > > >> >>> **** > > >> >>> > > >> >>> [7] GenomicFeatures_1.13.19 graph_1.39.3 > > >> >>> Matrix_1.0-12 **** > > >> >>> > > >> >>> [10] RBGL_1.37.2 RCurl_1.95-4.1 > > >> >>> RSQLite_0.11.4 **** > > >> >>> > > >> >>> [13] rtracklayer_1.20.2 stats4_3.0.1 > > >> >>> tools_3.0.1 **** > > >> >>> > > >> >>> [16] XML_3.96-1.1 zlibbioc_1.6.0 **** > > >> >>> > > >> >>>> ** ** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> Thanks again,**** > > >> >>> > > >> >>> Sean**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> *From:* Michael Lawrence [mailto:lawrence.michael@gene.com] > > >> >>> *Sent:* Saturday, July 13, 2013 1:59 PM > > >> >>> *To:* Taylor, Sean D > > >> >>> *Cc:* Michael Lawrence; Pages, Herve; Obenchain, Valerie J; > > >> >>> bioconductor@r-project.org > > >> >>> > > >> >>> *Subject:* Re: [BioC] Filtering BAM files by start position for > > >> >>> VariantTools**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> On Sat, Jul 13, 2013 at 1:50 PM, Taylor, Sean D < > sdtaylor@fhcrc.org > > > > > >> >>> wrote:**** > > >> >>> > > >> >>> Thanks Michael, **** > > >> >>> > > >> >>> This is an interesting idea. Usually, we resolve PCR/optical > > >> >>> duplicates > > >> >>> with the Picard MarkDuplicates command, which just chooses the > read > > >> >>> with > > >> >>> the highest average quality. This approximates what you want, I > > think. > > >> >>> *** > > >> >>> * > > >> >>> > > >> >>> Is the Picard MarkDuplicates command run by default? Or is this a > > >> >>> separate > > >> >>> command from a different package?**** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> It's a separate package, basically the Java implementation of > > >> >>> samtools; > > >> >>> you might that first and see how it improves your error rates, > > before > > >> >>> taking a more complicated approach.**** > > >> >>> > > >> >>> Also, I find it hard to believe that sequence errors would be > > >> >>> causing > > >> >>> you trouble at 10%, as long as you're filtering for quality and > have > > >> >>> decent > > >> >>> coverage. PCR might in limited cases, and Picard effectively takes > > >> >>> care of > > >> >>> that.**** > > >> >>> > > >> >>> 10% is fine, but we have problems at 1% and lower. I think even > the > > >> >>> defaults in tallyVariants() use 1% as a cutoff if I’m not > > >> >>> mistaken.**** > > >> >>> > > >> >>> **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> It uses a ~4% cutoff. And yes, you'll start running into issues > > around > > >> >>> 1%. > > >> >>> Calling at such a low frequency is kind of ambitious.**** > > >> >>> > > >> >>> **** > > >> >>> > > >> >>> What you want is to specify the cycleBreaks argument to > > >> >>> VariantTallyParam. It allows you to define bins by cycle. So if > you > > >> >>> made > > >> >>> bins for all 100bp, you could do things like generate a consensus > by > > >> >>> read > > >> >>> family. A family corresponding to position X would be cycle bin #1 > > at > > >> >>> X, > > >> >>> cycle bin #2 at X+1, etc. Then just pick the alt (or ref) with the > > >> >>> highest > > >> >>> count. Might be tricky to implement efficiently for the whole > > genome. > > >> >>> Of > > >> >>> course, this assumes you've got single end reads, otherwise this > > won't > > >> >>> work, because the mate is not considered.**** > > >> >>> > > >> >>> Cool, I will try that out, that may be just what I was looking > for. > > It > > >> >>> might be tricky for the whole genome, but we are restricting > > ourselves > > >> >>> to > > >> >>> just the mitochondrial genome. Still gives us several thousand > start > > >> >>> sites > > >> >>> to work from but should be easier than the whole genome. As for > > single > > >> >>> end > > >> >>> reads, can I just filter on the strand (i.e. ‘+’ or ‘-‘)? Or will > I > > >> >>> have to > > >> >>> perform separate alignments for each of the paired reads?**** > > >> >>> > > >> >>> **** > > >> >>> > > >> >>> ** ** > > >> >>> > > >> >>> Now that I think about it, you should be fine just considering the > > >> >>> alignments individually (as if they were unpaired).**** > > >> >>> > > >> >>> **** > > >> >>> > > >> >>> ** ** > > >> >> > > >> >> [[alternative HTML version deleted]] > > >> >> > > >> >> > > >> >> > > >> >> _______________________________________________ > > >> >> Bioconductor mailing list > > >> >> Bioconductor@r-project.org > > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> >> Search the archives: > > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > > >> > > > >> > -- > > >> > Computational Biology / Fred Hutchinson Cancer Research Center > > >> > 1100 Fairview Ave. N. > > >> > PO Box 19024 Seattle, WA 98109 > > >> > > > >> > Location: Arnold Building M1 B861 > > >> > Phone: (206) 667-2793 > > > > > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6