Differential expression analysis of already normalized sequence counts
1
0
Entering edit mode
mariakarand ▴ 20
@mariakarand-8971
Last seen 8.4 years ago
Norway

Hi,

We have been using DESeq2 for differential expression analysis previously. On the dataset I'm working on (microRNA expression), I don't want to use the normalizing method implemented in the DESeq2 package. I have therefore normalized the dataset myself. Is there any way to use DESeq2 for differential expression analysis of already normalized datasets? If not, do bioconductor have any other R-packages I could use for this purpose?

Thanks for any answers,

Maria

differential expression normalization deseq2 • 4.6k views
ADD COMMENT
0
Entering edit mode

Why would you want to avoid DESEQ2's normalising method? It is good.

ADD REPLY
0
Entering edit mode

Because I'm not really looking at differential expression, I'm looking at microRNA stability in formalin-fixed paraffin embedded tissue compared to matched fresh frozen tissue. I fear that the normalization method in DESeq2 makes some assumptions that are based on biological differential expression between different conditions which are not assumtions I can make when studying stability of microRNA in dead tissue.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

You can use any normalization you like, but you still need to start with the unnormalized counts. Build the dds as usual and then assign your size factors with sizeFactors(dds) <- x before calling DESeq()

ADD COMMENT
0
Entering edit mode

Thanks Michael for your reply.

Forgive me, but I'm not a bioinformatician, so I have to ask you to clearify:

I have performed a total count normalization (divided each count by the total count in that library and multiplied it by the average total count for the whole library). Will using the sizeFactors-function do the exact same normalization?

ADD REPLY
2
Entering edit mode

Total count normalization is a notoriously bad library size adjustment.

See for example Fig 5 here:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94

Or Fig 1 here:

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

You can either use DESeq2's built in library size adjustment, which is far more robust than total count and similar to TMM, or you can use whatever normalization you like, following my advice above:

1) build a dds with unnormalized counts
2) compute your library size adjustment, x, however you like. this should be a numeric vector as long as the number of samples, roughly centered around 1 and with all values > 0
3) store this using sizeFactors(dds) <- x
4) use DESeq() as normal

You are the data analyst so ultimately the choice is up to you, but I would read over those papers and specifically why total count normalization performs poorly.

ADD REPLY
0
Entering edit mode

Thank you so much. I will try this.

I'm aware that total count normalization is highly discouraged by bioinformaticians. However, from what I can tell DESeq's normalization method is based on the assumption that most genes are not DE and I'm not so sure I can use that assumption in my dataset. I am looking at miRNA isolated from formalin-fixed paraffin embedded tissue (FFPET) and comparing the miRNA stability in FFPET to matched fresh frozen tissue. In other word: this is not really biological differential expression, rather differential miRNA stability.

Anyway, I'm trying out different things and I will look into those articles. Thank you so much for your help :)

Maria

ADD REPLY
0
Entering edit mode

If you have most or all genes DE then you probably need to come up with an external way to normalize, that is, it is near impossible to do this on the computer using the data you have. This is a very difficult challenge.

ADD REPLY
0
Entering edit mode

Sorry for reviving this old post; but, I am dealing with a similar problem; maybe the question can be phrased as: Given that DESeq2 assumes that most factors (genes) tested are NOT changing (DE), how can the program being used when MOST, if not all loci of interest are turned off in the test sample (compared to the control), for example in the case of miRNAs above, they have -or could have been all degraded. Or in another case, when an essential protein required for their stability has been affected so that all levels are down. A control for this would be a knock-down of say Argonaute proteins without which the miRNA-levels in general could be very low. Can DESeq2 deal with this scenario? Would in DESeq2 the outliers that somehow survive the treatment come up as 'enhanced'?

As a wet scientist, such 'negative' controls are normal and required; they would show on gel that things have been turned down. But when one needs to back this up with numbers, by say using DESeq2, data from such an approach seem difficult to incorporate. How can this best be done when one aims to compare a lot of samples when some of these have no miRNAs (according to the gels), leave those samples out of the DESeq2 analysis (which would defeat its purpose, wouldn't it??)

Thanks for any ideas how to tackle this.

(Note the only -raw- counts in the data set are for miRNAs as obtained from HTSeq after sRNAseq sequencing)

ADD REPLY
0
Entering edit mode

If the majority of the features change and in one direction, you would need to use controlGenes to either specify a set of plausible un-changing genes or to specify the technical spike-in sequences that were quantified, and that should be used instead of endogenous genes to perform normalization.

ADD REPLY
0
Entering edit mode

Hi Michael, thanks for the quick response. Yes, I will look into that; setting 'controlGenes', is not an option I had discovered yet (being an absolute beginner;-).

Rob

ADD REPLY

Login before adding your answer.

Traffic: 566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6