Normalisation - Deseq2 vs StingTie-Ballgown
Entering edit mode
JindrichK • 0
Last seen 4.6 years ago


I have a normalisation question. I am having some inconsistencies between mRNA abundance estimation from Deseq2 and StringTie-Ballgown. I get that that there are many differences between the two, but if you consider 1 gene that has 1 transcript, and use the same bam input file, the main difference between the 2 algorithms is the normalisation - correct?

Attached is the bamcoverage of such a gene. Read Coverage And below are the rpkm estimated by Deseq2 (gene level) and StringTie-ballgown (transcript level) - commands used are at the end of this post :

                                    AMP (blue track)         DLM (green track)
 fpkm by Ballgown                    40.6                        5.1
 fpkm by Deseq2                      21.3                        13.1

The fold change between the 2 conditions according to stringtie is much closer to what you see on the pile up. Is that because StringTie and bamCoverage use the same kind of normalisation algorithm? And if so, which is closer to the "biological truth", Deseq2 or StringTie/read Coverage?


Commands used: StringTie stringtie -e -B -G ${GTF} -o transcripts.gtf -A gene_abundances.tsv input.rmdup.bam

Deseq2 (using featureCounts counts) featureCounts -T $threads -p -F GTF -t exon -g gene_id -s 2 -a ${GTF} -o out.featurecount input.rmdup.bam FPKM values calculated in Deseq2 with: fpkmNormalisedCounts <-, robust =TRUE))

Bigwig bamCoverage -b input.rmdup.bam --ignoreDuplicates --effectiveGenomeSize 142573017 --normalizeUsing RPKM --filterRNAstrand forward -of bigwig -o

deseq2 StringTie RNA-seq normalisation • 3.6k views
Entering edit mode
Last seen 15 hours ago
United States

The fpkm function in DESeq2 is using whatever gene length you provide. So it's not a question of StringTie vs DESeq2, but featureCounts vs StringTie. You can import StringTie data directly into DESeq2 using tximport (has support for type="stringtie"), which would be a 1-to-1 comparison.

Entering edit mode

Thanks for the fast reply Michael. I understand that but I'm not concerned about the fact that Im getting different values. I'm concerned that Im getting different FC (the gene/transcript length would be the same for both conditions - AMP vs DLM)

Im still confused as to why the Deseq2 fpkm don't match the read coverage? I guess I'm going back to - which is closer to the biological truth?

Entering edit mode

FPKM is counts of reads scaled by gene length and library size. StringTie and featureCounts don't agree on gene length. Then the DESeq2 part is just library size. If you use robust=FALSE you will get classic division by the total sum of counts. If you use robust=TRUE we adapt to provide a better estimate of library size than the total sum. So you have a variety of different components and ways to compute FPKM here.


Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6