Search
Question: convert fastq file to fasta using shortread package
0
3.6 years ago by
United States
kelvinfrog750 wrote:

This might be a very simple question. I have some Illumina fastq file and I want to convert to fasta file. Can someone show me how to convert fastq to fasta file. When I use writeFasta(sample_1.fq), readFastq(sample_1.fq), sread(sample_1.fq), I got error message like these:

Error in writeFasta(sample_1.fq, file) :
error in evaluating the argument 'object' in selecting a method for function 'writeFasta': Error: object 'sample_1.fq' not found

error in evaluating the argument 'dirPath' in selecting a method for function 'readFastq': Error: object 'sample_1.fq' not found

error in evaluating the argument 'object' in selecting a method for function 'sread': Error: object 'sample_1.fq' not found

Do I need to convert the fastq file to something first before I can convert to fasta? Thanks in advance.

modified 3.6 years ago by Martin Morgan ♦♦ 22k • written 3.6 years ago by kelvinfrog750
0
3.6 years ago by
United States
James W. MacDonald48k wrote:

Whenever you get an error like

Error: object 'sample_1.fq' not found

it means that R can't find that file. And the reason it can't find it is because the file isn't in R's working directory, so you have to tell R where it is (it only looks in the working dir otherwise). The help page for readFastq() says this:

Usage:

## S4 method for signature 'character'

writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)

Arguments:

dirPath: A character vector (or other object; see methods defined on
this generic) giving the directory path (relative or
absolute) or single file name of FASTQ files to be read.

pattern: The (‘grep’-style) pattern describing file names to be read.
The default (‘character(0)’) results in (attempted) input of
all files in the directory.

So you either have to have the FASTQ file in your working directory, or you need to specify the directory, and then give a grep style pattern so readFastq() can find the file. So something like

fstq <- readFastq(<path to file dir>, "sample_1,fq")

would work. You could also just wrap it all up in one call

writeFasta(readFastq(<path to file dir>, "sample_1.fq"), "sample_1.fa")

Without having any timings for this sort of thing, I have no idea how efficient this would be on a large file. But I would have to imagine something like seqtk would be faster (you can find that using google).

0
3.6 years ago by
United States
James W. MacDonald48k wrote:

Ugh. I missed something in your post.

The error you are getting is because you are passing in what R thinks is an object name, when you should in fact be passing in a character string.

Does

fstq <- readFastq("sample_1.fq")

work?

0
3.6 years ago by
United States
kelvinfrog750 wrote:

Oh, it works now. Thanks a lot. BTW, do you know if R can do fastq quality filtering on illimina sequence?  The one I have been using is the fastq quality trimmer from FASTX. I wonder if R has its own function.

Try ShortRead::trim<tab> to see some possibilities, e.g., trimTails, trimTailw. How specifically were you hoping to trim?

You mean you wonder if the ShortRead package has a function for doing FASTQ quality filtering on Illumina sequences right? Did you read the ShortRead vignette? I think it deals a lot with read quality. As long as you can load your reads in a ShortRead object, it shouldn't really matter where the reads are coming from.

H.

Great. I will look into that. Thanks.

0
3.6 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

If the files are large then you'll want to 'stream' over them.

fin = file.choose()
fout = "my.fasta"
fq = FastqStreamer(fin)
repeat {
sr = yield(fq)
if (length(sr) == 0) break
writeFasta(sr, fout, mode="a")
}

I think this will not be particularly slow. A variant is

library(GenomicFiles)

​YIELD = function(X, ...) yield(X)
MAP = function(X, ..., FOUT) {
writeFasta(X, FOUT, "a")
message(length(X))
length(X)
}

fq = FastqStreamer(fin, 100)
reduceByYield(fq, YIELD, MAP, FOUT=fout)