Start to End RNA-Seq with R Bioconductor
Entering edit mode
Last seen 4 months ago

Hi Community,

I apologise if this question is too simple and the answer is found elsewhere. I am a biologist and new to Bioconductor. I would like to do a start to end RNA-Seq analysis using Bioconductor and R, in an RMarkdown notebook, so that it can be reproducible for other lab members and for publication. We have deposited our RNA-Seq reads, as obtained from the sequencing machine, directly in GEO.

Most of the tutorials I've found start from a table with counts and explain differential expression packages/techniques. But the previous steps are not explained (reads to counts, QC, trim reads, mapping with HISAT or similar, BAM/SAM feature counts, counts to genes). I understand that there are some other tools, such as those used in command line or bash, that help in those previous steps. My questions are:

  1. Is there a tutorial in Bioconductor including the very beginning of reading the data from GEO/SRA reads and processing them to get to gene counts? Ideally using entirely Bioconductor packages

  2. If not, is there a tutorial, (in R or Rmarkdown) that calls all the necessary outside packages? (mostly using bioconductor, but if not available, calling from R other packages)

  3. If not, is there a tutorial of reproducible RNA-Seqs using bash and R from sequencing reads to diff expression? (using a mix of command line tools and then switching to bioconductor at the end) Are there similar tutorials / a reference book / recipes for other techniques (scRNAseq, ChIPseq, etc)? I am always looking for everything explained in the same tutorial, from the very beginning of reading and counting reads to performing differential expression.

As an example, I am looking for something similar to this:

The most newbie-friendly tutorial I found is this: Galaxy Training Material which specifies the three steps (1. reads to counts, 2. counts to genes and 3. genes to pathways) and develops them in detail but obviously this is done in the GUI Galaxy. I would like something like this, step by step, but using R/Bioconductor (or if not available other packages).

Thank you very much!!

Bioconductor Workflow • 451 views
Entering edit mode

Just my two cents, but don't start with the habit of doing NGS preprocessing in R. There might be a few packages which wrap around existing command line tools but this is a) not available for all kinds of required work (such as trimming, fastqc, aligners other than subread...), and b) does not scale well as you cannot easily orchestrate whole workflows from inside R. You also still have to setup and compile software outside of R, so I would defninitely just do the whole preprocessing outside of it. I recommend using existing pipelines which do exist, be it SnakePipes or Nextflow/nf-core ones, and then read the required data (usually just the counts) into R for the downstream. The RNA-seq workflow from Bioc is (if you ask me) up to date, and also covers quantification with tools other than traditional aligners, it is worth taking a look.

Entering edit mode
Last seen 2 days ago
United States

There are a couple. Here is one based on DESeq2, which is a bit dated now, and one based on edgeR, which might be a bit dated as well. There is also systemPipeR, which might be of interest to you.

Entering edit mode

I consider the edgeR one to still be current, except for using mm10 instead of the new mm39 mouse genome build released this year. It is the only one of the workflows that is entirely R, including the alignment. The same edgeR workflow run on the latest R and Bioconductor packages is available here.

Entering edit mode
swbarnes2 ★ 1.0k
Last seen 1 day ago
San Diego

I'm not sure what alignment options there are in pure R. RSubread might be it.

But once you get gene counts, EdgeR or DESeq2 are what everyone uses for differential gene expression.


Login before adding your answer.

Traffic: 171 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6