FastqStreamer
1
0
Entering edit mode
Marcus Davy ▴ 390
@marcus-davy-5153
Last seen 6.1 years ago
Hi, I have had a look at FastqStreamer to stream in successive subsets of a Fastq file. My question is whether you can change the number of records to stream on the fly rather than having to stream 'n' records each time. For example, I might want to pull in records corresponding to each Illumina tile from the indices fetched within the Fastq header information, or just fetch a certain tile with a record index range m:n which does not nessarily start at m=1 within the Fastq file. sp <- SolexaPath(system.file('extdata', package='ShortRead')) fl <- file.path(analysisPath(sp), "s_1_sequence.txt") length(readFastq(f)) [1] 256 ## This fails as n is expected to be a constant amount of streamed records f <- FastqStreamer(fl, c(100, 50, 100, 6)) Error in FastqStreamer(fl, c(100, 50, 100, 6)) : 'n' must be finite and >= 0 To fetch a certain tile can you alter the 'added' field position similar to 'seek' in perl so you can grab only that index range of the Fastq file without having to go through a while loop? f <- FastqStreamer(fl, 50) print(f) class: FastqStreamer file: s_1_sequence.txt status: n=50 current=0 added=0 total=0 ## <- I want to change the 'current/added fields' cheers, Marcus [[alternative HTML version deleted]]
GO GO • 2.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
On 05/24/2012 05:19 PM, Marcus Davy wrote: > Hi, > > I have had a look at FastqStreamer to stream in successive subsets of a > Fastq file. > > > My question is whether you can change the number of records to stream on > the fly rather than having to stream 'n' records each time. > > > For example, I might want to pull in records corresponding to each Illumina > tile from the indices fetched within the Fastq header information, Hi Marcus -- this isn't possible at the moment, but I'm giving this (and the ability to pull out specific id's) some thought. Along the lines of an IRanges() argument with start and end being the parts of the fastq file to retrieve, and with 'yield' returning the next range's worth of data. Martin > > or just fetch a certain tile with a record index range m:n which does not > nessarily start at m=1 within the Fastq file. > > > sp<- SolexaPath(system.file('extdata', package='ShortRead')) > > fl<- file.path(analysisPath(sp), "s_1_sequence.txt") > > length(readFastq(f)) > > [1] 256 > > > ## This fails as n is expected to be a constant amount of streamed records > > f<- FastqStreamer(fl, c(100, 50, 100, 6)) > > Error in FastqStreamer(fl, c(100, 50, 100, 6)) : > > 'n' must be finite and>= 0 > > > > To fetch a certain tile can you alter the 'added' field position similar to > 'seek' in perl so you can grab only that index range of the Fastq file > without having to go through a while loop? > > > f<- FastqStreamer(fl, 50) > > print(f) > > class: FastqStreamer > > file: s_1_sequence.txt > > status: n=50 current=0 added=0 total=0 ##<- I want to change the > 'current/added fields' > > > > cheers, > > > Marcus > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin, thanks for looking into this, I think it would enhance FastqStreamers flexibility to be able to fetch any specified ranges of a Fastq file. The IRanges approach is similar to my thoughts, with width by default (either constant 'n' or variable length using vector recycling), or start, and end indexes selected. cheers, Marcus On Sat, May 26, 2012 at 1:11 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 05/24/2012 05:19 PM, Marcus Davy wrote: > >> Hi, >> >> I have had a look at FastqStreamer to stream in successive subsets of a >> Fastq file. >> >> >> My question is whether you can change the number of records to stream on >> the fly rather than having to stream 'n' records each time. >> >> >> For example, I might want to pull in records corresponding to each >> Illumina >> tile from the indices fetched within the Fastq header information, >> > > Hi Marcus -- this isn't possible at the moment, but I'm giving this (and > the ability to pull out specific id's) some thought. Along the lines of an > IRanges() argument with start and end being the parts of the fastq file to > retrieve, and with 'yield' returning the next range's worth of data. > > Martin > > >> or just fetch a certain tile with a record index range m:n which does not >> nessarily start at m=1 within the Fastq file. >> >> >> sp<- SolexaPath(system.file('**extdata', package='ShortRead')) >> >> fl<- file.path(analysisPath(sp), "s_1_sequence.txt") >> >> length(readFastq(f)) >> >> [1] 256 >> >> >> ## This fails as n is expected to be a constant amount of streamed records >> >> f<- FastqStreamer(fl, c(100, 50, 100, 6)) >> >> Error in FastqStreamer(fl, c(100, 50, 100, 6)) : >> >> 'n' must be finite and>= 0 >> >> >> >> To fetch a certain tile can you alter the 'added' field position similar >> to >> 'seek' in perl so you can grab only that index range of the Fastq file >> without having to go through a while loop? >> >> >> f<- FastqStreamer(fl, 50) >> >> print(f) >> >> class: FastqStreamer >> >> file: s_1_sequence.txt >> >> status: n=50 current=0 added=0 total=0 ##<- I want to change the >> 'current/added fields' >> >> >> >> cheers, >> >> >> Marcus >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 05/25/2012 02:41 PM, Marcus Davy wrote: > Hi Martin, > thanks for looking into this, I think it would enhance FastqStreamers > flexibility to be able to fetch any specified ranges of a Fastq file. > > The IRanges approach is similar to my thoughts, with width by default > (either constant 'n' or variable length using vector recycling), or > start, and end indexes selected. I updated ShortRead 1.15.7 in devel to allow FastqStreamer to accept an IRanges object and yield() corresponding records in the fastq file; see ?FastqStreamer. Martin > > cheers, > > Marcus > > > On Sat, May 26, 2012 at 1:11 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> <mailto:mtmorgan at="" fhcrc.org="">> wrote: > > On 05/24/2012 05:19 PM, Marcus Davy wrote: > > Hi, > > I have had a look at FastqStreamer to stream in successive > subsets of a > Fastq file. > > > My question is whether you can change the number of records to > stream on > the fly rather than having to stream 'n' records each time. > > > For example, I might want to pull in records corresponding to > each Illumina > tile from the indices fetched within the Fastq header information, > > > Hi Marcus -- this isn't possible at the moment, but I'm giving this > (and the ability to pull out specific id's) some thought. Along the > lines of an IRanges() argument with start and end being the parts of > the fastq file to retrieve, and with 'yield' returning the next > range's worth of data. > > Martin > > > or just fetch a certain tile with a record index range m:n which > does not > nessarily start at m=1 within the Fastq file. > > > sp<- SolexaPath(system.file('__extdata', package='ShortRead')) > > fl<- file.path(analysisPath(sp), "s_1_sequence.txt") > > length(readFastq(f)) > > [1] 256 > > > ## This fails as n is expected to be a constant amount of > streamed records > > f<- FastqStreamer(fl, c(100, 50, 100, 6)) > > Error in FastqStreamer(fl, c(100, 50, 100, 6)) : > > 'n' must be finite and>= 0 > > > > To fetch a certain tile can you alter the 'added' field position > similar to > 'seek' in perl so you can grab only that index range of the > Fastq file > without having to go through a while loop? > > > f<- FastqStreamer(fl, 50) > > print(f) > > class: FastqStreamer > > file: s_1_sequence.txt > > status: n=50 current=0 added=0 total=0 ##<- I want to change the > 'current/added fields' > > > > cheers, > > > Marcus > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 <tel:206%20667-2793> > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6