Question

GC content correction in salmon/deseq2 workflow for single-end RNAseq data

0

Entering edit mode

maya.kappil ▴ 30

@mayakappil-18569

Last seen 6.4 years ago

Hello,

My understanding is that it is possible to add a flag for GC content correction when quantifying reads using salmon but that this is really meant for paired-end data. I'm wondering about what options there are to perform GC content correction on single-end RNA-seq data and to what extent it would be recommended to perform such a correction prior to differential gene expression analysis?

To this wend, I was wondering whether it makes sense/would be recommended to perform GC content correction using the EDASeq R package prior to normalization and differential gene expression analysis using the DESeq2 workflow.

In a way, I was thinking that since each gene is compared to itself across samples in DGE analysis, the GC content differences across genes may not play an important role, but was not sure if this is correct.

Many thanks!

Maya

salmon edaseq rna-seq deseq2 • 2.8k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 43k • written 7.2 years ago by maya.kappil ▴ 30

score 1 · Accepted Answer · 2018-11-28

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

You are correct that the GC content flag in Salmon relies on paired-end reads to determine the fragment sequence content. There was some effort to extend to single-end reads but I don't know how far that got in testing.

Are you interested in splicing, or just DGE?

Also, I recommend to always run FASTQC followed by MultiQC for all RNA-seq datasets. They have a module that will plot the GC content curves for all samples. If they look similar and representative, then for DGE you could skip GC content correction. If they look different, then for DGE you could use EDASeq or cqn to create normalizationFactors for DESeq2 (again, with paired-end, it's easier to use Salmon's GC correction). We have some code in the DESeq2 vignette how to add these.

ADD COMMENT • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Great, thanks! I do have the MultiQC output and will review them for the GC content curves. Right now, I'm doing DGE but would also like to look at differential transcript usage using something like the DRIMSeq package.

ADD REPLY • link 7.2 years ago maya.kappil ▴ 30

1

Entering edit mode

If the GC curves look similar across samples and are roughly representative of the transcriptome (e.g. there are some 30% GC reads and some 70% GC reads), you can skip GC content correction. I didn't do extensive testing of the single end version.

ADD REPLY • link 7.2 years ago Michael Love 43k