readPileup() Error
1
0
Entering edit mode
Ron Ophir ▴ 60
@ron-ophir-3919
Last seen 9.6 years ago
Dear BMMs, The session information is listed below. When I read a pileup file (readPileup) I got as an output of samtools 0.1.18, I get the following error Snp = readPileup(pileupFile,variant="SNP") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '^<,' ---------------------------------------------------------------------- -- ------ The pileup format in the example of readPileup() function is 6 33701 A G 21 21 12 35 ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2 The pileup format at samtools web pages is seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& and in my pileup is Cs1,6RhaT 332 A 9 ,,,..^;,^E,^Z.^Z. DCCFFCDCC It seems to me that the error is obvious, however I don't know whether it is a bug or a feature. If it is a feature how to run readPileup correctly to be able reading latest pileup format? Thanks, Ron Abbreviation BMM bioconductor mailing-list member sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.6.1 Biostrings_2.22.0 GenomicRanges_1.6.2 [4] IRanges_1.12.1 loaded via a namespace (and not attached): [1] bitops_1.0-4.1 BSgenome_1.22.0 RCurl_1.7-0 rtracklayer_1.14.1 [5] tools_2.14.0 XML_3.4-3 zlibbioc_1.0.0 This mail was sent via Mail-SeCure System.
• 1.3k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
I think I can confirm this using the ex1.pileup in the examples folder of the samtools svn distro, revision 998, which says it is 0.1.18 -- BUT READ ON > Z = readPileup("ex1.pileup", variant="SNP") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer', got '^~.' Enter a frame number, or 0 to exit 1: readPileup("ex1.pileup", variant = "SNP") 2: readPileup("ex1.pileup", variant = "SNP") 3: callGeneric(conn, ...) 4: eval(call, sys.frame(sys.parent())) 5: eval(expr, envir, enclos) 6: readPileup(conn, variant = "SNP") 7: readPileup(conn, variant = "SNP") 8: .local(file, ...) 9: .readPileup_SNP(file = file, ..., variant = variant) 10: .readPileup_table(file, colClasses, ...) 11: read.table(conn, colClasses = colClasses, col.names = names(colClasses), se 12: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, nmax = I believe the readPileup function was a very early contribution as Rsamtools emerged. Note that its man page references pileup -cv as the generator for input, but that samtools command is removed in favor of mpileup. So at best readPileup is retained for legacy applications. Now, for pileup analysis with Rsamtools, you should use the PileupFile() interface direct to BAM and indices, and applyPileups() to extract the base frequency or other functionals of pileups. Do not create the pileups separately with Rsamtools, applyPileups() will compute them internally. On Thu, Nov 10, 2011 at 3:10 AM, Ron Ophir <ron@volcani.agri.gov.il> wrote: > Dear BMMs, > > The session information is listed below. > When I read a pileup file (readPileup) I got as an output of samtools > 0.1.18, I get the following error > > Snp = readPileup(pileupFile,variant="SNP") > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > scan() expected 'an integer', got '^<,' > > -------------------------------------------------------------------- ---- > ------ > > The pileup format in the example of readPileup() function is > > 6 33701 A G 21 21 12 35 > ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg > ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2 > > The pileup format at samtools web pages is > > seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& > > and in my pileup is > > Cs1,6RhaT 332 A 9 ,,,..^;,^E,^Z.^Z. > DCCFFCDCC > > It seems to me that the error is obvious, however I don't know whether > it is a bug or a feature. If it is a feature how to run readPileup > correctly to be able reading latest pileup format? > > Thanks, > Ron > > Abbreviation > > BMM bioconductor mailing-list member > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] Rsamtools_1.6.1 Biostrings_2.22.0 GenomicRanges_1.6.2 > [4] IRanges_1.12.1 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.1 BSgenome_1.22.0 RCurl_1.7-0 > rtracklayer_1.14.1 > [5] tools_2.14.0 XML_3.4-3 zlibbioc_1.0.0 > > > This mail was sent via Mail-SeCure System. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 11/10/2011 02:41 AM, Vincent Carey wrote: > I think I can confirm this using the ex1.pileup in the examples folder of > the samtools svn distro, revision 998, > which says it is 0.1.18 -- BUT READ ON > >> Z = readPileup("ex1.pileup", variant="SNP") > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > scan() expected 'an integer', got '^~.' > > Enter a frame number, or 0 to exit > > 1: readPileup("ex1.pileup", variant = "SNP") > 2: readPileup("ex1.pileup", variant = "SNP") > 3: callGeneric(conn, ...) > 4: eval(call, sys.frame(sys.parent())) > 5: eval(expr, envir, enclos) > 6: readPileup(conn, variant = "SNP") > 7: readPileup(conn, variant = "SNP") > 8: .local(file, ...) > 9: .readPileup_SNP(file = file, ..., variant = variant) > 10: .readPileup_table(file, colClasses, ...) > 11: read.table(conn, colClasses = colClasses, col.names = > names(colClasses), se > 12: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, > nmax = > > > I believe the readPileup function was a very early contribution as > Rsamtools emerged. Note that its man page references pileup -cv as the > generator for input, but that samtools command is removed in favor of > mpileup. So at best readPileup is retained for legacy applications. > > Now, for pileup analysis with Rsamtools, you should use the PileupFile() > interface direct to BAM and indices, and applyPileups() to extract the base > frequency or other functionals of pileups. Do not create the pileups > separately with Rsamtools, applyPileups() will compute them internally. > > On Thu, Nov 10, 2011 at 3:10 AM, Ron Ophir<ron at="" volcani.agri.gov.il=""> wrote: > >> Dear BMMs, >> >> The session information is listed below. >> When I read a pileup file (readPileup) I got as an output of samtools >> 0.1.18, I get the following error >> >> Snp = readPileup(pileupFile,variant="SNP") >> >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, : >> scan() expected 'an integer', got '^<,' >> >> ------------------------------------------------------------------- ----- >> ------ >> >> The pileup format in the example of readPileup() function is >> >> 6 33701 A G 21 21 12 35 >> ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg >> ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2 >> >> The pileup format at samtools web pages is >> >> seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+.<<<+;<<<<<<<<<<<=<;<;7<& >> >> and in my pileup is >> >> Cs1,6RhaT 332 A 9 ,,,..^;,^E,^Z.^Z. >> DCCFFCDCC >> >> It seems to me that the error is obvious, however I don't know whether >> it is a bug or a feature. If it is a feature how to run readPileup >> correctly to be able reading latest pileup format? Hi Ron, Vince -- Not exactly sure where this stands. From Ron's post and as Vince indicates I think the problem is that Ron's pileup file does not conform to what readPileups requires (produced with samtools pileup -cv ...), and furthermore that this file format is not produced by current samtools. Vince mentions applyPileup; another route from mpileup is the -g option to generate BCF files, and then indexBcf / scanBcf. For instance in the samtools/examples folder > library(Rsamtools) > indexBcf("ex1.bcf") > res = scanBcf("ex1.bcf") The output format is 'scan'-like so not 100% user friendly; see example(scanBcf). Vince mentions ex1.pileup from samtools as a reproducible example. This is generated by running make in samtools/examples, and with that svn revision produces a zero-length file (since pileup is no longer supported). Instructions to create the file are still in samtools/examples/Makefile, but use the options -cf so would not be expected to produce a readPileup-compatible file. There is example(readPileup) which points to 'pileup.txt' and which parses without error. library(Rsamtools) example(readPileup) readPileup(fl, variant="SNP") Martin >> >> Thanks, >> Ron >> >> Abbreviation >> >> BMM bioconductor mailing-list member >> >> >> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> >> other attached packages: >> [1] Rsamtools_1.6.1 Biostrings_2.22.0 GenomicRanges_1.6.2 >> [4] IRanges_1.12.1 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-4.1 BSgenome_1.22.0 RCurl_1.7-0 >> rtracklayer_1.14.1 >> [5] tools_2.14.0 XML_3.4-3 zlibbioc_1.0.0 >> >> >> This mail was sent via Mail-SeCure System. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
Thanks Martin and Vincent for your responses, To summarize readPileup() function related to an old format of samtools which is obsolete. There are two alternative pathways: 1. is self parsing Bam file using applyPileups(bamFile,FUN=parsingFun(),param) function. This way is good for more than one purposes of bam alignment. For example having the coverage of each site on the reference. 2. using the scanBcf() function that let me fetching variation (snp and indels), which are imputed by bcftools. This gave an ideas what would be my next steps. Ron -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Friday, November 11, 2011 5:28 PM To: Vincent Carey Cc: Ron Ophir; bioconductor at r-project.org Subject: Re: [BioC] readPileup() Error On 11/10/2011 02:41 AM, Vincent Carey wrote: > I think I can confirm this using the ex1.pileup in the examples folder of > the samtools svn distro, revision 998, > which says it is 0.1.18 -- BUT READ ON > >> Z = readPileup("ex1.pileup", variant="SNP") > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > scan() expected 'an integer', got '^~.' > > Enter a frame number, or 0 to exit > > 1: readPileup("ex1.pileup", variant = "SNP") > 2: readPileup("ex1.pileup", variant = "SNP") > 3: callGeneric(conn, ...) > 4: eval(call, sys.frame(sys.parent())) > 5: eval(expr, envir, enclos) > 6: readPileup(conn, variant = "SNP") > 7: readPileup(conn, variant = "SNP") > 8: .local(file, ...) > 9: .readPileup_SNP(file = file, ..., variant = variant) > 10: .readPileup_table(file, colClasses, ...) > 11: read.table(conn, colClasses = colClasses, col.names = > names(colClasses), se > 12: scan(file = file, what = what, sep = sep, quote = quote, dec = dec, > nmax = > > > I believe the readPileup function was a very early contribution as > Rsamtools emerged. Note that its man page references pileup -cv as the > generator for input, but that samtools command is removed in favor of > mpileup. So at best readPileup is retained for legacy applications. > > Now, for pileup analysis with Rsamtools, you should use the PileupFile() > interface direct to BAM and indices, and applyPileups() to extract the base > frequency or other functionals of pileups. Do not create the pileups > separately with Rsamtools, applyPileups() will compute them internally. > > On Thu, Nov 10, 2011 at 3:10 AM, Ron Ophir<ron at="" volcani.agri.gov.il=""> wrote: > >> Dear BMMs, >> >> The session information is listed below. >> When I read a pileup file (readPileup) I got as an output of samtools >> 0.1.18, I get the following error >> >> Snp = readPileup(pileupFile,variant="SNP") >> >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, : >> scan() expected 'an integer', got '^<,' >> >> ---------------------------------------------------------------------- -- >> ------ >> >> The pileup format in the example of readPileup() function is >> >> 6 33701 A G 21 21 12 35 >> ,,,,$,,,,.,,,,.,..,..g.,,gGggggGgggg >> ??@AA3>=3((>A?<-9<-8?9A>87;%6148+=2 >> >> The pileup format at samtools web pages is >> >> seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+.<<<+;<<<<<<<<<<<=<;<;7<& >> >> and in my pileup is >> >> Cs1,6RhaT 332 A 9 ,,,..^;,^E,^Z.^Z. >> DCCFFCDCC >> >> It seems to me that the error is obvious, however I don't know whether >> it is a bug or a feature. If it is a feature how to run readPileup >> correctly to be able reading latest pileup format? Hi Ron, Vince -- Not exactly sure where this stands. From Ron's post and as Vince indicates I think the problem is that Ron's pileup file does not conform to what readPileups requires (produced with samtools pileup -cv ...), and furthermore that this file format is not produced by current samtools. Vince mentions applyPileup; another route from mpileup is the -g option to generate BCF files, and then indexBcf / scanBcf. For instance in the samtools/examples folder > library(Rsamtools) > indexBcf("ex1.bcf") > res = scanBcf("ex1.bcf") The output format is 'scan'-like so not 100% user friendly; see example(scanBcf). Vince mentions ex1.pileup from samtools as a reproducible example. This is generated by running make in samtools/examples, and with that svn revision produces a zero-length file (since pileup is no longer supported). Instructions to create the file are still in samtools/examples/Makefile, but use the options -cf so would not be expected to produce a readPileup-compatible file. There is example(readPileup) which points to 'pileup.txt' and which parses without error. library(Rsamtools) example(readPileup) readPileup(fl, variant="SNP") Martin >> >> Thanks, >> Ron >> >> Abbreviation >> >> BMM bioconductor mailing-list member >> >> >> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> >> other attached packages: >> [1] Rsamtools_1.6.1 Biostrings_2.22.0 GenomicRanges_1.6.2 >> [4] IRanges_1.12.1 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-4.1 BSgenome_1.22.0 RCurl_1.7-0 >> rtracklayer_1.14.1 >> [5] tools_2.14.0 XML_3.4-3 zlibbioc_1.0.0 >> >> >> This mail was sent via Mail-SeCure System. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 This mail was received via Mail-SeCure System. ----- No virus found in this message. Checked by AVG - www.avg.com 11/11/11 This mail was sent via Mail-SeCure System.
ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6