Differential expression analysis of already normalized sequence counts
1
0
Entering edit mode
mariakarand ▴ 10
@mariakarand-8971
Last seen 5.2 years ago
Norway

Hi,

We have been using DESeq2 for differential expression analysis previously. On the dataset I'm working on (microRNA expression), I don't want to use the normalizing method implemented in the DESeq2 package. I have therefore normalized the dataset myself. Is there any way to use DESeq2 for differential expression analysis of already normalized datasets? If not, do bioconductor have any other R-packages I could use for this purpose?

Thanks for any answers,

Maria

differential expression normalization deseq2 • 1.9k views
0
Entering edit mode

Why would you want to avoid DESEQ2's normalising method? It is good.

0
Entering edit mode

Because I'm not really looking at differential expression, I'm looking at microRNA stability in formalin-fixed paraffin embedded tissue compared to matched fresh frozen tissue. I fear that the normalization method in DESeq2 makes some assumptions that are based on biological differential expression between different conditions which are not assumtions I can make when studying stability of microRNA in dead tissue.

1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

You can use any normalization you like, but you still need to start with the unnormalized counts. Build the dds as usual and then assign your size factors with sizeFactors(dds) <- x before calling DESeq()

0
Entering edit mode

Forgive me, but I'm not a bioinformatician, so I have to ask you to clearify:

I have performed a total count normalization (divided each count by the total count in that library and multiplied it by the average total count for the whole library). Will using the sizeFactors-function do the exact same normalization?

2
Entering edit mode

Total count normalization is a notoriously bad library size adjustment.

See for example Fig 5 here:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94

Or Fig 1 here:

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

You can either use DESeq2's built in library size adjustment, which is far more robust than total count and similar to TMM, or you can use whatever normalization you like, following my advice above:

1) build a dds with unnormalized counts
2) compute your library size adjustment, x, however you like. this should be a numeric vector as long as the number of samples, roughly centered around 1 and with all values > 0
3) store this using sizeFactors(dds) <- x
4) use DESeq() as normal

You are the data analyst so ultimately the choice is up to you, but I would read over those papers and specifically why total count normalization performs poorly.

0
Entering edit mode

Thank you so much. I will try this.

I'm aware that total count normalization is highly discouraged by bioinformaticians. However, from what I can tell DESeq's normalization method is based on the assumption that most genes are not DE and I'm not so sure I can use that assumption in my dataset. I am looking at miRNA isolated from formalin-fixed paraffin embedded tissue (FFPET) and comparing the miRNA stability in FFPET to matched fresh frozen tissue. In other word: this is not really biological differential expression, rather differential miRNA stability.

Anyway, I'm trying out different things and I will look into those articles. Thank you so much for your help :)

Maria

0
Entering edit mode

If you have most or all genes DE then you probably need to come up with an external way to normalize, that is, it is near impossible to do this on the computer using the data you have. This is a very difficult challenge.

0
Entering edit mode

Sorry for reviving this old post; but, I am dealing with a similar problem; maybe the question can be phrased as: Given that DESeq2 assumes that most factors (genes) tested are NOT changing (DE), how can the program being used when MOST, if not all loci of interest are turned off in the test sample (compared to the control), for example in the case of miRNAs above, they have -or could have been all degraded. Or in another case, when an essential protein required for their stability has been affected so that all levels are down. A control for this would be a knock-down of say Argonaute proteins without which the miRNA-levels in general could be very low. Can DESeq2 deal with this scenario? Would in DESeq2 the outliers that somehow survive the treatment come up as 'enhanced'?

As a wet scientist, such 'negative' controls are normal and required; they would show on gel that things have been turned down. But when one needs to back this up with numbers, by say using DESeq2, data from such an approach seem difficult to incorporate. How can this best be done when one aims to compare a lot of samples when some of these have no miRNAs (according to the gels), leave those samples out of the DESeq2 analysis (which would defeat its purpose, wouldn't it??)

Thanks for any ideas how to tackle this.

(Note the only -raw- counts in the data set are for miRNAs as obtained from HTSeq after sRNAseq sequencing)

0
Entering edit mode

If the majority of the features change and in one direction, you would need to use controlGenes to either specify a set of plausible un-changing genes or to specify the technical spike-in sequences that were quantified, and that should be used instead of endogenous genes to perform normalization.

0
Entering edit mode

Hi Michael, thanks for the quick response. Yes, I will look into that; setting 'controlGenes', is not an option I had discovered yet (being an absolute beginner;-).

Rob