Question

Using RNA-Seq Workflow using Real Life FASTQ data - and Mouse Reference Genome

0

Entering edit mode

greg.mulhearn • 0

@gregmulhearn-7639

Last seen 6.8 years ago

Australia

I am a new user to R and Bioconductor, so please excuse my inexperience.

I have installed R (Ver: 3.1.3) and Biconductor (Ver: 3.0) on a Windows-7, 64-bit system, as per the instructions on the Bioconductor website.

I then followed the instructions in the workflow document: "RNA-Seq workflow: gene-level exploratory analysis and differential expression" from the website, using the sample data that was supplied with the workflow. I was able to proceed through the whole workflow and completed all the steps.

I am now trying to go through the RNA-Seq workflow again, but this time using some real life experimental data that we have generated, rather than using the sample experimental data provided with the workflow.

We have a some ".fastq" and ".fastq.gz" files. I have read the sections at the beginning of the RNA-Seq workflow document, about how to start with FASTQ files and align them to a reference genome etc., and get to a point where I would have real life data in a format similar to the sample experimental data supplied with the RNA-Seq workflow. I am having trouble understanding how to do this from the instructions in the workflow. I think I need to generate .BAM files, but I am unclear how to do this.

I would appreciate any advice on how to get get started with our FASTQ file(s) and be able to use these to work through the RNA-Seq workflow.

Our FASTQ files are generated from experiments with mice. I am also unsure as to where to find a "Mouse reference Genome" to align to. The RNA-Seq workflow sample data is aligned to a Human Reference Genome. Any help with this would also be appreciated.

NOTE: I upgraded R from Ver 3.1.3 to Ver: 3.2.0, but when I tried to install the "rnaseqGene" workflow, it said that this is not available for R Ver: 3.2.0. So I have reverted back to the older environment using R Ver: 3.1.3 and Bioconductor Ver: 3.0.

The message I got when trying to install "rnaseqGene" workflow under R Ver: 3.2.0 is shown below:

>

> workflowInstall("rnaseqGene")

Installing package into ‘C:/Users/gmul3410/Documents/R/win-library/3.2’

(as ‘lib’ is unspecified)

Warning: unable to access index for repository http://bioconductor.org/packages/3.1/workflows/src/contrib

Warning: unable to access index for repository http://bioconductor.org/packages/3.1/workflows/bin/windows/contrib/3.2

Warning message:

package ‘rnaseqGene’ is not available (for R version 3.2.0)

>

Thanks for any help,

Greg

deseq2 deseqdataset rnaseqgene fastq reference genome • 3.0k views

ADD COMMENT • link updated 9.0 years ago by James W. MacDonald 65k • written 9.0 years ago by greg.mulhearn • 0

score 0 · Answer 1 · 2015-05-11

Hi Greg,

There's really a lot to cover here, getting from FASTQ files to aligned reads (BAM files). Note that all the steps used to align the reads in this dataset are also documented in the vignette for the airway package which is linked from the workflow.

I'd highly recommend speaking to someone locally at your institute who has experience with alignment (as well as data processing and storage advice) and who can offer practical advice.

Also you can get a head start by watching the videos in our ongoing and free online course for RNA-seq analysis (of human and model organisms), which covers alignment among other things :

https://www.edx.org/course/case-study-rna-seq-data-analysis-harvardx-ph525-5x

score 0 · Answer 2 · 2015-05-12

To add to Michael's comments, you will be much better served by using a Linux-based OS. While you CAN get things to compile and run on Windows, it's much easier to do things on a Linux box (or even Mac OS, although I still think it is easier to get Real Work(tm) done on an OS that isn't primarily designed to make life easier for Hoi Polloi - start flame war in 3, 2, 1...).

If you have lots of RAM on your Windows box, you might consider setting up for dual boot, or maybe a Docker image, or spin up an AMI. I would second Michael's recommendation to find a local person who can help. Learning R and Bioconductor and Linux stuff and the intricacies of RNA-Seq all at the same time is a pretty daunting task.