Search
Question: convert fastq file to fasta using shortread package
0
gravatar for kelvinfrog75
2.7 years ago by
United States
kelvinfrog750 wrote:

This might be a very simple question. I have some Illumina fastq file and I want to convert to fasta file. Can someone show me how to convert fastq to fasta file. When I use writeFasta(sample_1.fq), readFastq(sample_1.fq), sread(sample_1.fq), I got error message like these:

Error in writeFasta(sample_1.fq, file) : 
  error in evaluating the argument 'object' in selecting a method for function 'writeFasta': Error: object 'sample_1.fq' not found

Error in readFastq(sample_1.fq) : 
  error in evaluating the argument 'dirPath' in selecting a method for function 'readFastq': Error: object 'sample_1.fq' not found

Error in sread(sample_1.fq) : 
  error in evaluating the argument 'object' in selecting a method for function 'sread': Error: object 'sample_1.fq' not found

Do I need to convert the fastq file to something first before I can convert to fasta? Thanks in advance.

ADD COMMENTlink modified 2.7 years ago by Martin Morgan ♦♦ 20k • written 2.7 years ago by kelvinfrog750
0
gravatar for James W. MacDonald
2.7 years ago by
United States
James W. MacDonald45k wrote:

Whenever you get an error like

Error: object 'sample_1.fq' not found

it means that R can't find that file. And the reason it can't find it is because the file isn't in R's working directory, so you have to tell R where it is (it only looks in the working dir otherwise). The help page for readFastq() says this:

Usage:

     readFastq(dirPath, pattern=character(0), ...)
     ## S4 method for signature 'character'
     readFastq(dirPath, pattern=character(0), ..., withIds=TRUE)
     
     writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)
     
Arguments:

 dirPath: A character vector (or other object; see methods defined on
          this generic) giving the directory path (relative or
          absolute) or single file name of FASTQ files to be read.

 pattern: The (‘grep’-style) pattern describing file names to be read.
          The default (‘character(0)’) results in (attempted) input of
          all files in the directory.

So you either have to have the FASTQ file in your working directory, or you need to specify the directory, and then give a grep style pattern so readFastq() can find the file. So something like

fstq <- readFastq(<path to file dir>, "sample_1,fq")

would work. You could also just wrap it all up in one call

writeFasta(readFastq(<path to file dir>, "sample_1.fq"), "sample_1.fa")

Without having any timings for this sort of thing, I have no idea how efficient this would be on a large file. But I would have to imagine something like seqtk would be faster (you can find that using google).

ADD COMMENTlink written 2.7 years ago by James W. MacDonald45k
0
gravatar for James W. MacDonald
2.7 years ago by
United States
James W. MacDonald45k wrote:

Ugh. I missed something in your post.

The error you are getting is because you are passing in what R thinks is an object name, when you should in fact be passing in a character string.

Does

fstq <- readFastq("sample_1.fq")

work?

ADD COMMENTlink written 2.7 years ago by James W. MacDonald45k
0
gravatar for kelvinfrog75
2.7 years ago by
United States
kelvinfrog750 wrote:

Oh, it works now. Thanks a lot. BTW, do you know if R can do fastq quality filtering on illimina sequence?  The one I have been using is the fastq quality trimmer from FASTX. I wonder if R has its own function. 

ADD COMMENTlink written 2.7 years ago by kelvinfrog750

Try ShortRead::trim<tab> to see some possibilities, e.g., trimTails, trimTailw. How specifically were you hoping to trim?

ADD REPLYlink written 2.7 years ago by Martin Morgan ♦♦ 20k

You mean you wonder if the ShortRead package has a function for doing FASTQ quality filtering on Illumina sequences right? Did you read the ShortRead vignette? I think it deals a lot with read quality. As long as you can load your reads in a ShortRead object, it shouldn't really matter where the reads are coming from.

H.

ADD REPLYlink written 2.7 years ago by Hervé Pagès ♦♦ 13k

Great. I will look into that. Thanks.

ADD REPLYlink written 2.7 years ago by kelvinfrog750
0
gravatar for Martin Morgan
2.7 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

If the files are large then you'll want to 'stream' over them.

fin = file.choose()
fout = "my.fasta"
fq = FastqStreamer(fin)
repeat {
    sr = yield(fq)
    if (length(sr) == 0) break
    writeFasta(sr, fout, mode="a")
}

I think this will not be particularly slow. A variant is

library(GenomicFiles)

​YIELD = function(X, ...) yield(X)
MAP = function(X, ..., FOUT) {
    writeFasta(X, FOUT, "a")
    message(length(X))
    length(X)
}

fq = FastqStreamer(fin, 100)
reduceByYield(fq, YIELD, MAP, FOUT=fout)
ADD COMMENTlink written 2.7 years ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 154 users visited in the last hour