Question: Differential expression analysis of already normalized sequence counts
0
gravatar for mariakarand
3.3 years ago by
mariakarand10
Norway
mariakarand10 wrote:

Hi,

We have been using DESeq2 for differential expression analysis previously. On the dataset I'm working on (microRNA expression), I don't want to use the normalizing method implemented in the DESeq2 package. I have therefore normalized the dataset myself. Is there any way to use DESeq2 for differential expression analysis of already normalized datasets? If not, do bioconductor have any other R-packages I could use for this purpose?

Thanks for any answers,

Maria

ADD COMMENTlink modified 3.3 years ago by Michael Love25k • written 3.3 years ago by mariakarand10

Why would you want to avoid DESEQ2's normalising method? It is good.

ADD REPLYlink written 3.3 years ago by chris86380

Because I'm not really looking at differential expression, I'm looking at microRNA stability in formalin-fixed paraffin embedded tissue compared to matched fresh frozen tissue. I fear that the normalization method in DESeq2 makes some assumptions that are based on biological differential expression between different conditions which are not assumtions I can make when studying stability of microRNA in dead tissue.

ADD REPLYlink written 3.3 years ago by mariakarand10
Answer: Differential expression analysis of already normalized sequence counts
1
gravatar for Michael Love
3.3 years ago by
Michael Love25k
United States
Michael Love25k wrote:

You can use any normalization you like, but you still need to start with the unnormalized counts. Build the dds as usual and then assign your size factors with sizeFactors(dds) <- x before calling DESeq()

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Michael Love25k

Thanks Michael for your reply.

Forgive me, but I'm not a bioinformatician, so I have to ask you to clearify:

I have performed a total count normalization (divided each count by the total count in that library and multiplied it by the average total count for the whole library). Will using the sizeFactors-function do the exact same normalization?

ADD REPLYlink written 3.3 years ago by mariakarand10
2

Total count normalization is a notoriously bad library size adjustment.

See for example Fig 5 here:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94

Or Fig 1 here:

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

You can either use DESeq2's built in library size adjustment, which is far more robust than total count and similar to TMM, or you can use whatever normalization you like, following my advice above:

1) build a dds with unnormalized counts
2) compute your library size adjustment, x, however you like. this should be a numeric vector as long as the number of samples, roughly centered around 1 and with all values > 0
3) store this using sizeFactors(dds) <- x
4) use DESeq() as normal

You are the data analyst so ultimately the choice is up to you, but I would read over those papers and specifically why total count normalization performs poorly.

ADD REPLYlink written 3.3 years ago by Michael Love25k

Thank you so much. I will try this.

I'm aware that total count normalization is highly discouraged by bioinformaticians. However, from what I can tell DESeq's normalization method is based on the assumption that most genes are not DE and I'm not so sure I can use that assumption in my dataset. I am looking at miRNA isolated from formalin-fixed paraffin embedded tissue (FFPET) and comparing the miRNA stability in FFPET to matched fresh frozen tissue. In other word: this is not really biological differential expression, rather differential miRNA stability.

Anyway, I'm trying out different things and I will look into those articles. Thank you so much for your help :)

Maria

ADD REPLYlink written 3.3 years ago by mariakarand10

If you have most or all genes DE then you probably need to come up with an external way to normalize, that is, it is near impossible to do this on the computer using the data you have. This is a very difficult challenge.

ADD REPLYlink written 3.3 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 342 users visited in the last hour