GC content correction in salmon/deseq2 workflow for single-end RNAseq data
1
0
Entering edit mode
maya.kappil ▴ 30
@mayakappil-18569
Last seen 5.3 years ago

Hello,

My understanding is that it is possible to add a flag for GC content correction when quantifying reads using salmon but that this is really meant for paired-end data.  I'm wondering about what options there are to perform GC content correction on single-end RNA-seq data and to what extent it would be recommended to perform such a correction prior to differential gene expression analysis? 

To this wend, I was wondering whether it makes sense/would be recommended to perform GC content correction using the EDASeq R package prior to normalization and differential gene expression analysis using the DESeq2 workflow. 

In a way, I was thinking that since each gene is compared to itself across samples in DGE analysis, the GC content differences across genes may not play an important role, but was not sure if this is correct.

Many thanks!

Maya

salmon edaseq rna-seq deseq2 • 2.2k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

You are correct that the GC content flag in Salmon relies on paired-end reads to determine the fragment sequence content. There was some effort to extend to single-end reads but I don't know how far that got in testing.

Are you interested in splicing, or just DGE? 

Also, I recommend to always run FASTQC followed by MultiQC for all RNA-seq datasets. They have a module that will plot the GC content curves for all samples. If they look similar and representative, then for DGE you could skip GC content correction. If they look different, then for DGE you could use EDASeq or cqn to create normalizationFactors for DESeq2 (again, with paired-end, it's easier to use Salmon's GC correction). We have some code in the DESeq2 vignette how to add these.

ADD COMMENT
0
Entering edit mode

Great, thanks!   I do have the MultiQC output and will review them for the GC content curves. Right now, I'm doing DGE but would also like to look at differential transcript usage using something like the DRIMSeq package. 

ADD REPLY
1
Entering edit mode

If the GC curves look similar across samples and are roughly representative of the transcriptome (e.g. there are some 30% GC reads and some 70% GC reads), you can skip GC content correction. I didn't do extensive testing of the single end version.

ADD REPLY

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6