convert fastq file to fasta using shortread package
4
0
Entering edit mode
@kelvinfrog75-7494
Last seen 7.5 years ago
United States

This might be a very simple question. I have some Illumina fastq file and I want to convert to fasta file. Can someone show me how to convert fastq to fasta file. When I use writeFasta(sample_1.fq), readFastq(sample_1.fq), sread(sample_1.fq), I got error message like these:

Error in writeFasta(sample_1.fq, file) : 
  error in evaluating the argument 'object' in selecting a method for function 'writeFasta': Error: object 'sample_1.fq' not found

Error in readFastq(sample_1.fq) : 
  error in evaluating the argument 'dirPath' in selecting a method for function 'readFastq': Error: object 'sample_1.fq' not found

Error in sread(sample_1.fq) : 
  error in evaluating the argument 'object' in selecting a method for function 'sread': Error: object 'sample_1.fq' not found

Do I need to convert the fastq file to something first before I can convert to fasta? Thanks in advance.

illumina shortread • 4.9k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 25 minutes ago
United States

Whenever you get an error like

Error: object 'sample_1.fq' not found

it means that R can't find that file. And the reason it can't find it is because the file isn't in R's working directory, so you have to tell R where it is (it only looks in the working dir otherwise). The help page for readFastq() says this:

Usage:

     readFastq(dirPath, pattern=character(0), ...)
     ## S4 method for signature 'character'
     readFastq(dirPath, pattern=character(0), ..., withIds=TRUE)
     
     writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)
     
Arguments:

 dirPath: A character vector (or other object; see methods defined on
          this generic) giving the directory path (relative or
          absolute) or single file name of FASTQ files to be read.

 pattern: The (‘grep’-style) pattern describing file names to be read.
          The default (‘character(0)’) results in (attempted) input of
          all files in the directory.

So you either have to have the FASTQ file in your working directory, or you need to specify the directory, and then give a grep style pattern so readFastq() can find the file. So something like

fstq <- readFastq(<path to file dir>, "sample_1,fq")

would work. You could also just wrap it all up in one call

writeFasta(readFastq(<path to file dir>, "sample_1.fq"), "sample_1.fa")

Without having any timings for this sort of thing, I have no idea how efficient this would be on a large file. But I would have to imagine something like seqtk would be faster (you can find that using google).

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 25 minutes ago
United States

Ugh. I missed something in your post.

The error you are getting is because you are passing in what R thinks is an object name, when you should in fact be passing in a character string.

Does

fstq <- readFastq("sample_1.fq")

work?

ADD COMMENT
0
Entering edit mode
@kelvinfrog75-7494
Last seen 7.5 years ago
United States

Oh, it works now. Thanks a lot. BTW, do you know if R can do fastq quality filtering on illimina sequence?  The one I have been using is the fastq quality trimmer from FASTX. I wonder if R has its own function. 

ADD COMMENT
0
Entering edit mode

Try ShortRead::trim<tab> to see some possibilities, e.g., trimTails, trimTailw. How specifically were you hoping to trim?

ADD REPLY
0
Entering edit mode

You mean you wonder if the ShortRead package has a function for doing FASTQ quality filtering on Illumina sequences right? Did you read the ShortRead vignette? I think it deals a lot with read quality. As long as you can load your reads in a ShortRead object, it shouldn't really matter where the reads are coming from.

H.

ADD REPLY
0
Entering edit mode

Great. I will look into that. Thanks.

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 13 days ago
United States

If the files are large then you'll want to 'stream' over them.

fin = file.choose()
fout = "my.fasta"
fq = FastqStreamer(fin)
repeat {
    sr = yield(fq)
    if (length(sr) == 0) break
    writeFasta(sr, fout, mode="a")
}

I think this will not be particularly slow. A variant is

library(GenomicFiles)

​YIELD = function(X, ...) yield(X)
MAP = function(X, ..., FOUT) {
    writeFasta(X, FOUT, "a")
    message(length(X))
    length(X)
}

fq = FastqStreamer(fin, 100)
reduceByYield(fq, YIELD, MAP, FOUT=fout)
ADD COMMENT

Login before adding your answer.

Traffic: 451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6