Question

No method for coercing this S4 class to a vector, as trying to import my own data rather than using SRA in the workflow RnaSeqGeneEdgeRQL

1

Entering edit mode

Raito92 ▴ 60

@raito92-20399

Last seen 3.6 years ago

Italy

Hello again, I'm working on the RnaSeqGeneEdgeRQL workflow at the moment, trying to edit it so that it uses my own data rather than importing published RNASeq data from SRA...

I was wondering if there is any easier way to do it.

The original instructions to create the vector were: (If I got it right, basically, I have to act so that my all.fastq object contains my own data rather than using SRA ones...)

   ##Just creating a targets file 

    targetsFile <- system.file("extdata", "targets.txt",
                               package="RnaSeqGeneEdgeRQL")
    targets <- read.delim(targetsFile, stringsAsFactors=FALSE)
    targets

##Retrieving data


    for (sra in targets$SRA) system(paste("fastq-dump", sra)) 
    all.fastq <- paste0(targets$SRA, ".fastq")

 ##Here is the next original function which requires the all.fastq object

    all.bam <- sub(".fastq", ".bam", all.fastq)
    align(index="mm10", readfile1=all.fastq, input_format="FASTQ", output_file=all.bam)

I looked for a proper solution online, then I tried with the readFastq function from ShortRead package, as follows:

fastqPath <- list.files("/home/genomica/DATA/dwarf/fastq", pattern = ".fastq$", full = TRUE)
reads <- readFastq(fastqPath)

And then:

all.bam <- sub(".fastq", ".bam", reads)
align(index="FASTA", readfile1=reads, input_format="FASTQ", 
      output_file=all.bam)

That's what I did, also visualizing the contents of the reads and fastqPath objects.

https://i.ibb.co/z22bggQ/Screen-Supporto.png

It looks as if I managed to create the desider object, but in the wrong format (belonging to the ShortRead class rather than the required vector). As a matter of fact, when trying to go further, I get the No method for coercing this S4 class to a vector

How can I overcome this issue and use the pipeline with a proper vector object containing local data?

Here is my sessioninfo() output:

https://i.ibb.co/3dmsRrK/Session-Info1.png

https://i.ibb.co/R4q9LWS/Session-Info2.png

rnaseqgeneedgerql shortread Rsubread • 3.6k views

ADD COMMENT • link updated 6.8 years ago by Aspie96 ▴ 60 • written 6.8 years ago by Raito92 ▴ 60

4

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

I am not sure why you are using the ShortRead package here. If you look at the help for align, it says this:

readfile1: a character vector including names of files that include
          sequence reads to be aligned. For paired-end reads, this
          gives the list of files including first reads in each
          library. File format is FASTQ/FASTA by default. See
           input_format  option for more supported formats.

If you were to look at your reads object it would say something like this

> reads
class: ShortReadQ
length: 256 reads; width: 36 cycles

Which is obviously not a character vector of names of files! Whereas your fastqPath object should be a character vector of file names.

ADD COMMENT • link 6.8 years ago James W. MacDonald 68k

0

Entering edit mode

Hello, first of all thanks for your answer! It's my first time ever using R and a programming language in general, so I may be missing some basic concepts about the required objects.

May you suggest me the right code to go ahead and make the workflow acquire my own data rather than using other scripts from other packages to do the job?

Thanks in advance!

ADD REPLY • link 6.8 years ago Raito92 ▴ 60

0

Entering edit mode

Solved, I just had to pass the fastqPath rather than the reads object. XD Sorry.

ADD REPLY • link 6.8 years ago Raito92 ▴ 60

score 4 · Accepted Answer · 2019-04-07

Hi, Raito.

First, you should call:

buildindex(basename = "index", reference="./Data.fna")

This generate the needed index files and the input must be in FASTA format. We will need these files later. The basename parameter ("index") indicates the prefix of the files.

Then, you call:

fastqPath <- list.files("./fastq", pattern="\\.fastq$", full=TRUE)
fastqPath

This creates an array of all names of all files in the "fastq" directory whose extension is ".fastq".

Then you need to call:

all.bam <- sub("\\.fastq$", ".bam", fastqPath)
all.bam

The array all.bam is identical to fastqPath, except that the extension of the files is ".bam" instead of ".fastq".

Then you can call:

align(index="index", readfile1=fastqPath, input_format="FASTQ", output_file=all.bam)

It is important to understand the meaning of the parameters:

index is the prefix of the index files.
readfile1 is the array of input files (in our case ".fastq" files in the "fastq" directory).
input_format is the format of input files.
output_file is the list of output files. In our case, each output file has the name of an input file, with the exception of the extension (which is ".bam").

The problem with your code is that it doesn't use an array as readfile1 parameter, as it should be.

The full code is:

library(Rsubread)

buildindex(basename = "index", reference="./fastq/GCF_002742605.1_O_europaea_v1_genomic.fna")

fastqPath <- list.files("./fastq", pattern="\\.fastq$", full=TRUE)
fastqPath

all.bam <- sub("\\.fastq$", ".bam", fastqPath)
all.bam
align(index="index", readfile1=fastqPath, input_format="FASTQ", output_file=all.bam)

score 3 · Accepted Answer · 2019-04-04

Dear Raito,

The RnaSeqGeneEdgeRQL workflow is really intended to teach statistical analysis of the RNA-seq counts rather than teaching the alignment steps, although it does give complete code for all the steps.

It seems from your posts that you haven't succeeded yet in doing any alignments, and I'm guessing that you haven't built a genome index yet either. So you are getting stuck at the very first step, which in many ways is the hardest. You can't run align until you have first succeeded in downloading a reference genome and running buildindex.

You might like to see a simple reproducible example of building an index and doing an alignment here: http://bioinf.wehi.edu.au/RNAseqCaseStudy/ . The example uses limma to do the differential expression analysis, but it could just as well have been edgeR.

Alternatively, I get the impression that you are not familiar with R, and in this case you might be more comfortable with doing things at the Unix prompt instead of using R. Here are some simple Subread examples using Unix command syntax: http://bioinf.wehi.edu.au/subread/ and http://bioinf.wehi.edu.au/featureCounts/

Also see the Subread home page at http://subread.sourceforge.net/ and the full Subread Users Guide at http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf

Finally let me say it important to read the documentation for Bioconductor software. If you type help("align") it will tell you everything about the function, as James has hinted to you. With Bioconductor, searching for answers online is not the best way to proceed, except perhaps for searching this Support forum itself.