Question: Using RNA-Seq Workflow using Real Life FASTQ data - and Mouse Reference Genome
0
4.6 years ago by
Australia
greg.mulhearn0 wrote:

I am a new user to R and Bioconductor, so please excuse my inexperience.

I have installed R (Ver: 3.1.3) and Biconductor (Ver: 3.0)  on a Windows-7, 64-bit system, as per the instructions on the Bioconductor website.

I then followed the instructions in the workflow document:  "RNA-Seq workflow: gene-level exploratory analysis and differential expression" from the website, using the sample data that was supplied with the workflow.  I was able to proceed through the whole workflow and completed all the steps.

I am now trying to go through the RNA-Seq workflow again, but this time using some real life experimental data that we have generated, rather than using the sample experimental data provided with the workflow.

We have a some  ".fastq"  and  ".fastq.gz"  files.  I have read the sections at the beginning of the RNA-Seq workflow document, about how to start with FASTQ files and align them to a reference genome etc., and get to a point where I would have real life data in a format similar to the sample experimental data supplied with the RNA-Seq workflow.  I am having trouble understanding how to do this from the instructions in the workflow.  I think I need to generate .BAM files, but I am unclear how to do this.

I would appreciate any advice on how to get get started with our FASTQ file(s) and be able to use these to work through the RNA-Seq workflow.

Our FASTQ files are generated from experiments with mice.  I am also unsure as to where to find a "Mouse reference Genome" to align to.  The RNA-Seq workflow sample data is aligned to a Human Reference Genome.  Any help with this would also be appreciated.

NOTE: I upgraded R from Ver 3.1.3 to Ver: 3.2.0, but when I tried to install the "rnaseqGene" workflow, it said that this is not available for R Ver: 3.2.0.  So I have reverted back to the older environment using R Ver: 3.1.3 and Bioconductor Ver: 3.0.

The message I got when trying to install "rnaseqGene" workflow under R Ver: 3.2.0 is shown below:

>

> workflowInstall("rnaseqGene")

Installing package into ‘C:/Users/gmul3410/Documents/R/win-library/3.2’

(as ‘lib’ is unspecified)

Warning: unable to access index for repository http://bioconductor.org/packages/3.1/workflows/src/contrib

Warning: unable to access index for repository http://bioconductor.org/packages/3.1/workflows/bin/windows/contrib/3.2

Warning message:

package ‘rnaseqGene’ is not available (for R version 3.2.0)

>

Thanks for any help,

Greg

modified 4.6 years ago by James W. MacDonald52k • written 4.6 years ago by greg.mulhearn0
Answer: Using RNA-Seq Workflow using Real Life FASTQ data - and Mouse Reference Genome
0
4.6 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Hi Greg,

There's really a lot to cover here, getting from FASTQ files to aligned reads (BAM files). Note that all the steps used to align the reads in this dataset are also documented in the vignette for the airway package which is linked from the workflow.

I'd highly recommend speaking to someone locally at your institute who has experience with alignment (as well as data processing and storage advice) and who can offer practical advice.

Also you can get a head start by watching the videos in our ongoing and free online course for RNA-seq analysis (of human and model organisms), which covers alignment among other things :

https://www.edx.org/course/case-study-rna-seq-data-analysis-harvardx-ph525-5x

Answer: Using RNA-Seq Workflow using Real Life FASTQ data - and Mouse Reference Genome
0
4.6 years ago by
United States
James W. MacDonald52k wrote:

To add to Michael's comments, you will be much better served by using a Linux-based OS. While you CAN get things to compile and run on Windows, it's much easier to do things on a Linux box (or even Mac OS, although I still think it is easier to get Real Work(tm) done on an OS that isn't primarily designed to make life easier for Hoi Polloi - start flame war in 3, 2, 1...).

If you have lots of RAM on your Windows box, you might consider setting up for dual boot, or maybe a Docker image, or spin up an AMI. I would second Michael's recommendation to find a local person who can help. Learning R and Bioconductor and Linux stuff and the intricacies of RNA-Seq all at the same time is a pretty daunting task.