Entering edit mode
Marcus Davy
▴
390
@marcus-davy-5153
Last seen 6.7 years ago
Hi,
there appears to be a problem in FastqSampler() which seems to sample
the
first read at random, but then the next n-1 reads
are always the same, so the next sample obtained with yield() is not
independent. Is this a bug, or my code implementation wrong?
## From example(FastqSampler)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")
f <- FastqFile(fl)
rfq <- readFastq(f)
set.seed(1)
f <- FastqSampler(fl, 50, readerBlockSize=1e8, verbose=TRUE)
s1 <- yield(f) # sample of size n=50
set.seed(2)
f <- FastqSampler(fl, 50, readerBlockSize=1e8, verbose=TRUE)
s2 <- yield(f) # independent sample of size 50
## Intersection length between samples always 50-1
length(intersect(id(s1), id(s2)))
## First read is different, the next 2:n reads are the same
head(sread(s1))
head(sread(s2))
close(f)
> head(sread(s1))
A DNAStringSet instance of length 6
width seq
[1] 36 GTCTGGAAACGTACGGATTGTTCAGTAACTTTACTC # Different
[2] 36 GATTTCTTACCTATTAGTGGTTGAACAGCATCGGAC #Same
[3] 36 GCGGTGGTCTATAGTGTTATTAATATCAATTTGGGT # ...
[4] 36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA # ...
[5] 36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT # ...
[6] 36 GTTCTCTAAAAACCATTTTTCGTCCCCTTCGGGGCG Same
> head(sread(s2))
A DNAStringSet instance of length 6
width seq
[1] 36 GTTTACGAATTAAATCGAAGTGGACTGCTGGGGGGA # Different
[2] 36 GATTTCTTACCTATTAGTGGTTGAACAGCATCGGAC #Same
[3] 36 GCGGTGGTCTATAGTGTTATTAATATCAATTTGGGT # ...
[4] 36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA # ...
[5] 36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT # ...
[6] 36 GTTCTCTAAAAACCATTTTTCGTCCCCTTCGGGGCG #Same
Marcus
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.14.2 latticeExtra_0.6-19 RColorBrewer_1.0-5
[4] Rsamtools_1.8.4 lattice_0.20-6 Biostrings_2.24.1
[7] GenomicRanges_1.8.3 IRanges_1.14.2 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.0 hwriter_1.3
stats4_2.15.0
[6] tools_2.15.0 zlibbioc_1.2.0
[[alternative HTML version deleted]]