Question: .fastq to .txt conversion for EdgeR package and merging two paired end sequence files
0
gravatar for hamidrezarazzaghian
3.4 years ago by
Canada
hamidrezarazzaghian0 wrote:

Dear all,

I a post-doc at the University of British Columbia, Canada and I'm pretty new to RNA-seq data analysis. I want to do the TMM normalization on my RNA-seq data using EdgeR package in R. I have two questions:

1) How can I convert .fastq files to .txt files to be able to feed them into the EdgeR package?

2) My RNA-seq data are paired sequence .fastq files. What quality control should I do on them and how should I merge them together prior to analysis?

 

Thanks for the help,

Hamid

 

normalization edger fastq tmm txt • 1.3k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by hamidrezarazzaghian0
Answer: .fastq to .txt conversion for EdgeR package and merging two paired end sequence
3
gravatar for James W. MacDonald
3.4 years ago by
United States
James W. MacDonald49k wrote:

You don't feed FASTQ files to edgeR. You first have to align against the genome of your species and then get counts per gene, which is what you then feed into edgeR. For that you could use something like the Rsubread package. It has a User's guide, so I would start there.

ADD COMMENTlink written 3.4 years ago by James W. MacDonald49k
Answer: .fastq to .txt conversion for EdgeR package and merging two paired end sequence
0
gravatar for hamidrezarazzaghian
3.4 years ago by
Canada
hamidrezarazzaghian0 wrote:

Thanks James for the fast reply. Unfortunately is not available in windows-based R. Do you know any other package for this purpose?

Thanks

ADD COMMENTlink written 3.4 years ago by hamidrezarazzaghian0
2

As Martin noted, you can use Rbowtie, but that is for the original bowtie aligner, which doesn't do gapped alignments. If you are doing RNA-Seq you probably want bowtie2, which does do gapped alignments. You can run bowtie2 on Windows, so that is probably the best bet, but you have to run it from the command line, not from within R.

Most aligners assume you are using some sort of Linux variant, so you are sort of hamstrung by the fact that you are on Windows. But Linux is free after all, and it's relatively simple to set up a dual-boot Ubuntu/Windows OS on your comp, so if you are serious that might be something to consider.

One thing about kallisto and sleuth (and salmon or sailfish and sleuth while we are at it). These packages are intended to make comparisons at the transcript level, rather than the gene level. Since part of the alignment process is to infer which transcript a read came from, there is additional uncertainty in your count measurement that you have to account for when fitting a model. This has two downsides. First, that additional uncertainty has a cost, which is reduction in power to detect differences. Second, you shouldn't use something like edgeR or DESeq2 for transcript-level counts because the model they fit doesn't account for that uncertainty, so you have to use something like sleuth (either Lior Pachter's version or the patched version from Rob Patro's group) to fit the model. And sleuth is just a github package now, so you are pretty much on your own if you want to go that route.

As an (apparent) beginner, you are probably better off just getting bowtie2 and going from there.
 

ADD REPLYlink written 3.4 years ago by James W. MacDonald49k

AFAIK "gapped alignments" in bowtie2 means indels, not junctions, so bowtie2 is not suited for RNA-seq. The original bowtie only supported mismatches, no indels.

H.

ADD REPLYlink written 3.4 years ago by Hervé Pagès ♦♦ 13k

Hi Hervé,

Thanks for pointing that out. I naively thought that 'gapped alignment' was more or less a consistently applied term, but obviously not so much.

ADD REPLYlink written 3.4 years ago by James W. MacDonald49k
1

The Rbowtie package wraps (an older?) version of the Bowtie aligner, but probably most people use alignment tools outside R. The airway vignette and differential expression work flow describe overall approaches that go from FASTQ to count matrices via whole-genome alignment. kallisto is a different and fast though not cross-platform approach; see SummarizedExperiment::readKallisto() in addition to the github sleuth package. 

The poster has FASTQ files, but needs alignment (BAM) files before trying to count reads; b.nota's efforts would only be relevant after alignment. Ways to summarize aligned reads to counts across platforms and in R include bamsignals or perhaps GenomicFeatures::summarizeOverlaps().

ADD REPLYlink written 3.4 years ago by Martin Morgan ♦♦ 23k

I counted the reads once in R with a self made script using the libraries: IRanges, GenomicRanges, and Rsamtools. However, if you are pretty new to RNA-seq I would not recommend to try this yourself. It was pretty hard to do this.

I think the easiest way for you to get your counts is to install a virual machine with Ubuntu and try featureCount in Rsubread there.

 

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by b.nota320
Answer: .fastq to .txt conversion for EdgeR package and merging two paired end sequence
0
gravatar for hamidrezarazzaghian
3.4 years ago by
Canada
hamidrezarazzaghian0 wrote:

Thanks everyone for all the help.

ADD COMMENTlink written 3.4 years ago by hamidrezarazzaghian0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 135 users visited in the last hour