Entering edit mode
                    kent.riemondy
        
    
        ▴
    
    20
        @kentriemondy-14219
        Last seen 15 months ago
        Denver, University of Colorado Anschutz…
    I'm working with Nanopore direct RNA-seq data which generates FASTQs with U bases. Is there a Bioconductor package that supports reading these files? I've tried using methods from ShortRead, but get errors due to the U bases. Thanks in advance.
suppressPackageStartupMessages(library(ShortRead))
fq <- tempfile()
u_fq_txt <- paste(c("@readid", 
                     "UCGA",
                     "+",
                     "]]]]"), 
                   collapse = "\n")
writeLines(u_fq_txt, fq)
strm <- FastqStreamer(fq)
yield(strm)
#> Error in x$yield(...): _DNAencode(): invalid DNAString input character: 'U' (byte value 85)
readFastq(fq)
#> Error: Input/Output
#>   file(s):
#>     /var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpwZ8EOi/file4dee4320c214
#>   message: invalid character '
t_fq_txt <- paste(c("@readid", 
                    "TCGA",
                    "+",
                    "]]]]"), 
                  collapse = "\n")
writeLines(t_fq_txt, fq)
strm <- FastqStreamer(fq)
yield(strm)
#> class: ShortReadQ
#> length: 1 reads; width: 4 cycles
readFastq(fq)
#> class: ShortReadQ
#> length: 1 reads; width: 4 cycles
unlink(fq)

Or alternatively
The original use case was to concatenate multiple fastqs into a single fastq, while also converting the U's to T's for compatibility with downstream tools. The streaming functionality of
FastqStreamerseemed like a good approach to keep the memory usage low while converting each fastq. I could use unix tools ( e.g.catandawk), but was curious to see how to do it using bioconductor tools.Your response helped point me in the right direction. I used a
BStringSetinitially to allow for Us to be converted to Ts, and subsequently coerced to aDNAStringSet. I could then generate aShortReadQobject and write records to disk withwriteFastq. Probably not the most efficient but worked for my initial use case.Here's my approach, that allows for limiting the # of lines read at a time, in case it is useful to anyone.
Created on 2022-04-20 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)