Question: No method for coercing this S4 class to a vector, as trying to import my own data rather than using SRA in the workflow RnaSeqGeneEdgeRQL
1
7 weeks ago by
Raito9240
Italy
Raito9240 wrote:

Hello again, I'm working on the RnaSeqGeneEdgeRQL workflow at the moment, trying to edit it so that it uses my own data rather than importing published RNASeq data from SRA...

I was wondering if there is any easier way to do it.

The original instructions to create the vector were: (If I got it right, basically, I have to act so that my all.fastq object contains my own data rather than using SRA ones...)

   ##Just creating a targets file

targetsFile <- system.file("extdata", "targets.txt",
package="RnaSeqGeneEdgeRQL")
targets

##Retrieving data

for (sra in targets$SRA) system(paste("fastq-dump", sra)) all.fastq <- paste0(targets$SRA, ".fastq")

##Here is the next original function which requires the all.fastq object

all.bam <- sub(".fastq", ".bam", all.fastq)


I looked for a proper solution online, then I tried with the readFastq function from ShortRead package, as follows:

fastqPath <- list.files("/home/genomica/DATA/dwarf/fastq", pattern = ".fastq", full = TRUE) reads <- readFastq(fastqPath)  And then: all.bam <- sub(".fastq", ".bam", reads) align(index="FASTA", readfile1=reads, input_format="FASTQ", output_file=all.bam)  That's what I did, also visualizing the contents of the reads and fastqPath objects. https://i.ibb.co/z22bggQ/Screen-Supporto.png It looks as if I managed to create the desider object, but in the wrong format (belonging to the ShortRead class rather than the required vector). As a matter of fact, when trying to go further, I get the No method for coercing this S4 class to a vector How can I overcome this issue and use the pipeline with a proper vector object containing local data? Here is my sessioninfo() output: https://i.ibb.co/3dmsRrK/Session-Info1.png https://i.ibb.co/R4q9LWS/Session-Info2.png ADD COMMENTlink modified 6 weeks ago by Aspie9660 • written 7 weeks ago by Raito9240 Answer: No method for coercing this S4 class to a vector, as trying to import my own dat 4 6 weeks ago by Aspie9660 Aspie9660 wrote: Hi, Raito. First, you should call: buildindex(basename = "index", reference="./Data.fna")  This generate the needed index files and the input must be in FASTA format. We will need these files later. The basename parameter ("index") indicates the prefix of the files. Then, you call: fastqPath <- list.files("./fastq", pattern="\\.fastq", full=TRUE)
fastqPath


This creates an array of all names of all files in the "fastq" directory whose extension is ".fastq".

Then you need to call:

all.bam <- sub("\\.fastq", ".bam", fastqPath) all.bam  The array all.bam is identical to fastqPath, except that the extension of the files is ".bam" instead of ".fastq". Then you can call: align(index="index", readfile1=fastqPath, input_format="FASTQ", output_file=all.bam)  It is important to understand the meaning of the parameters: • index is the prefix of the index files. • readfile1 is the array of input files (in our case ".fastq" files in the "fastq" directory). • input_format is the format of input files. • output_file is the list of output files. In our case, each output file has the name of an input file, with the exception of the extension (which is ".bam"). The problem with your code is that it doesn't use an array as readfile1 parameter, as it should be. The full code is: library(Rsubread) buildindex(basename = "index", reference="./fastq/GCF_002742605.1_O_europaea_v1_genomic.fna") fastqPath <- list.files("./fastq", pattern="\\.fastq", full=TRUE)
fastqPath

all.bam <- sub("\\.fastq\$", ".bam", fastqPath)
all.bam

Answer: No method for coercing this S4 class to a vector, as trying to import my own dat
3
7 weeks ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

Dear Raito,

The RnaSeqGeneEdgeRQL workflow is really intended to teach statistical analysis of the RNA-seq counts rather than teaching the alignment steps, although it does give complete code for all the steps.

It seems from your posts that you haven't succeeded yet in doing any alignments, and I'm guessing that you haven't built a genome index yet either. So you are getting stuck at the very first step, which in many ways is the hardest. You can't run align until you have first succeeded in downloading a reference genome and running buildindex.

You might like to see a simple reproducible example of building an index and doing an alignment here: http://bioinf.wehi.edu.au/RNAseqCaseStudy/ . The example uses limma to do the differential expression analysis, but it could just as well have been edgeR.

Alternatively, I get the impression that you are not familiar with R, and in this case you might be more comfortable with doing things at the Unix prompt instead of using R. Here are some simple Subread examples using Unix command syntax: http://bioinf.wehi.edu.au/subread/ and http://bioinf.wehi.edu.au/featureCounts/

Finally let me say it important to read the documentation for Bioconductor software. If you type help("align") it will tell you everything about the function, as James has hinted to you. With Bioconductor, searching for answers online is not the best way to proceed, except perhaps for searching this Support forum itself.

Hello, thanks for you answer and suggestions! Yes, I'm not familiar with R and it's my first time running a workflow.

I succedeed in building the genome index, because I already had a reference genome to use and I got no errors after the process.

However, I was unable to go ahead and make the alignments. This happened because I was unable to adapt the workflow code so that it uses my own RNASeq data rather than downloading from SRA, I know it may be very basic but I really don't know how to edit the code to do so. Can you help me here? Thanks in advance.

I solved the problem!

Answer: No method for coercing this S4 class to a vector, as trying to import my own dat
4
7 weeks ago by
United States
James W. MacDonald50k wrote:

I am not sure why you are using the ShortRead package here. If you look at the help for align, it says this:

readfile1: a character vector including names of files that include
gives the list of files including first reads in each
library. File format is FASTQ/FASTA by default. See
input_format  option for more supported formats.


If you were to look at your reads object it would say something like this

> reads
length: 256 reads; width: 36 cycles


Which is obviously not a character vector of names of files! Whereas your fastqPath object should be a character vector of file names.

Hello, first of all thanks for your answer! It's my first time ever using R and a programming language in general, so I may be missing some basic concepts about the required objects.

May you suggest me the right code to go ahead and make the workflow acquire my own data rather than using other scripts from other packages to do the job?