Search
Question: FastqSampler with trimmed paired-end reads
1
gravatar for Arne Muller
2.4 years ago by
Arne Muller20
United States
Arne Muller20 wrote:

Hello,

I came across a problems with FastqSampler from the ShortRead package. I get non-matching pairs/mates (reads that don;t belong to each other)  when doing:

f1 <- FastqSampler("file1.fastq", n=1e6)
f2 <- FastqSampler("file2.fastq", n=1e6)

set.seed(123L); p1 <- yield(f1)
set.seed(123L); p2 <- yield(f2)

(as suggested by Martin in https://www.biostars.org/p/6544/ )

The problem is that my reads are already post-processed (our core facility already removed a random 0-8 base oligo from the reads that was used to generate cluster diversity), and this results in reads of randomly different lengths in the two mate files, though both files have the same number of reads in the same order.

It seems that a random file pointer is used from which the next read entry is accessed in both paired end files, and if reads have randomly different length this results in non-matching pairs.

What's the best way to sample such paired-end files?

 

    thanks for your help,

 

    Arne

ADD COMMENTlink modified 2.4 years ago by Martin Morgan ♦♦ 20k • written 2.4 years ago by Arne Muller20
0
gravatar for Martin Morgan
2.4 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

You're right, the sampler has a subtle assumption that the records have identical length; I'll try to fix that. A different solution is

require(GenomicFiles)
n <- 1e6
set.seed(123L);
​p1 <- reduceByYield(FastqStreamer("file1.fastq", n),
                    YIELD = ShortRead::yield, 
                    REDUCE = GenomicFiles::REDUCEsampler(n))

but this requires ShortRead version 1.27.5; for earlier versions, copy and paste into your R session after requiring the ShortRead package

setReplaceMethod("[", c("ShortReadQ", "ANY", "missing", "ShortReadQ"),
    function(x, i, j, ..., value)
{
    sread <- sread(x); sread[i] <- sread(value)
    quality <- quality(quality(x)); quality[i] <- quality(quality(value))
    id <- id(x); id[i] <- id(value)
    initialize(x, sread=sread, id=id,
               quality=initialize(quality(x), quality=quality))
})

 

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 121 users visited in the last hour