Search
Question: GC content correction in salmon/deseq2 workflow for single-end RNAseq data
0
gravatar for maya.kappil
14 days ago by
maya.kappil0 wrote:

Hello,

My understanding is that it is possible to add a flag for GC content correction when quantifying reads using salmon but that this is really meant for paired-end data.  I'm wondering about what options there are to perform GC content correction on single-end RNA-seq data and to what extent it would be recommended to perform such a correction prior to differential gene expression analysis? 

To this wend, I was wondering whether it makes sense/would be recommended to perform GC content correction using the EDASeq R package prior to normalization and differential gene expression analysis using the DESeq2 workflow. 

In a way, I was thinking that since each gene is compared to itself across samples in DGE analysis, the GC content differences across genes may not play an important role, but was not sure if this is correct.

Many thanks!

Maya

ADD COMMENTlink modified 14 days ago by Michael Love20k • written 14 days ago by maya.kappil0
1
gravatar for Michael Love
14 days ago by
Michael Love20k
United States
Michael Love20k wrote:

You are correct that the GC content flag in Salmon relies on paired-end reads to determine the fragment sequence content. There was some effort to extend to single-end reads but I don't know how far that got in testing.

Are you interested in splicing, or just DGE? 

Also, I recommend to always run FASTQC followed by MultiQC for all RNA-seq datasets. They have a module that will plot the GC content curves for all samples. If they look similar and representative, then for DGE you could skip GC content correction. If they look different, then for DGE you could use EDASeq or cqn to create normalizationFactors for DESeq2 (again, with paired-end, it's easier to use Salmon's GC correction). We have some code in the DESeq2 vignette how to add these.

ADD COMMENTlink written 14 days ago by Michael Love20k

Great, thanks!   I do have the MultiQC output and will review them for the GC content curves. Right now, I'm doing DGE but would also like to look at differential transcript usage using something like the DRIMSeq package. 

ADD REPLYlink written 14 days ago by maya.kappil0
1

If the GC curves look similar across samples and are roughly representative of the transcriptome (e.g. there are some 30% GC reads and some 70% GC reads), you can skip GC content correction. I didn't do extensive testing of the single end version.

ADD REPLYlink written 14 days ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 240 users visited in the last hour