Hello,
I am looking to read every sequence of a fastq file, insert a specific set of nucleotides at the beginning of each sequence. Then save that file into a new fastq file.
I am using the shortRead package for that, but am not sure how to overwrite the new sequence. This is what I have so far.
library("ShortRead") input_1 <- readFastq("Sample_1.fastq") #input forward read output_1 <- input_1 #making a copy
insertNuc <- "TTTTT" #to be inserted into sequence iCount = 1 while(iCount <= length(input_1)) { seq1 <- sread(input_1)[iCount] #read sequence one by one seq1New <- paste(insertNuc,seq1,sep="") #insert nucleotides sread(output_1)[iCount] <- seq1New # write back new sequence to fastq file ???? iCount <- iCount + 1 } # write fastq writeFastq(output_1, destination, "output_1")
I need help with
1) write back new sequence to fastq file in same order (basically I need to replace old sequence with new sequence)
2) I am using loops to read through entire fastq file. If anyone knows a faster way to do this, with apply() or anything other packages , I would appreciate any tips on this.
Thanks, K
Wow...incredible !
Thank you every much , I will try this out.
Is there any specific reason you replaced the quality of the sequence in addition to the sequence ? I'm guessing its just an example. ?
Thanks !!
Fastq files require that the width of the sequence is the same as the width of the quality string, so the quality strings have to be transformed with the same geometry as the sequences.
Didn't know that, thank you for the explanation ,
Hello,
So I tried out the example and it works perfectly. But when I try out this exact same code on my input fastq file, this is the error I get when I try : readFastq(fout)
Because its something to do with the line "fout = tempfile()", I tried this:
The code now runs fine, but there is no output into "Abc.fastq" file. Any idea why this is happening ?
Thank you,
Do you mean that the example runs successfully _and_ there is an output file, but your actual data file does not produce an output file? The code to reduceByYield returns the number of records processed; does that number correspond to the number of records in the input file?
Likely 'gibberish' is because the file is compressed. See
?writeFastq
and add 'compress=FALSE
' towriteFastq(x, fout, "a", compress=FALSE)
.'list()` means that no records were processed. Once a stream is 'used', then subsequent calls to
yield()
produce no output. I revised my response to include a more fool-proof way of always using a fresh stream withreduceByYield(FastqStreamer(fl), YIELD, MAP, fout=fout)
. If it still indicates 'list()', then I guessyield(FastqStreamer(fl))
produces a ShortReadQ object with 0 records? If so, I don't know what the problem is; maybe you could try with a subset of your data,tmp = tempfile(); writeLines(readLines(fl, 1000), tmp); fout=tempfile(); reduceByYield(FastqStreamer(tmp), YIELD, MAP)
.Thank you, that worked.
Also closing the file connection also helped.