DESeq2: design = ~1 For Normalisation Only.
Entering edit mode
Last seen 18 months ago


My question follows on from two related questions ( and

I use DESeq2 in order to get normalized counts from raw TCGA-HTseq Counts.

My method is simple, put the data through DESeq2 - using design = ~ 1 (which gives a warning).

exptDesign_TCGA = data.frame(
  row.names = colnames(matrix_TCGA_data),
  condition = sample_tsv$Project.ID)

dds <- DESeqDataSetFromMatrix(
  countData = matrix_TCGA_data,
  colData = exptDesign_TCGA,
  design = ~ 1)

Then I do all of my analysis (clustering, survival, gene expression heat maps) on the normalised counts from Deseq2.

However I am running into problems with huge numbers of samples and computational power - directly related to the following post:

I know Michael Love suggests using Limma-Voom, but I have tried that - run my analysis and it is essentially scales the counts by gene across all samples, not what I want at all, as I lose the relative expression of one gene to another gene in the same sample. I need each sample to be normalised by the DESeq2 method.

In his response, Michael Love has also suggested this approach -

rowSums(counts(dds,normalized=TRUE) >= 10) >= 5).

However this does not make sense - getting normalized counts is what is taking too long - as in order to perform the counts function with normalized=T - you have to first run the DeSeq2 function in the first place to run (which estimates dispersion and size factors and is incredibly time consuming).

So, can I please have clarification on the following:

a) Was the above response meant to suggest I take the raw counts and cut out all low reads - then perform Deseq2 on this slim data set.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
#differential analysis -DESeq()- is then run on this object
analysisObject_TCGA <- DESeq(dds, parallel = T)

b) Is there another way to do this? Should I just work with TCGA raw counts and not normalise other than scaling prior to clustering and survival analysis? - Is this wrong?

c) Is my use of Limma-Voom incorrect - as the results appear to be simply gene scaling. Should I transpose my counts matrix and re-run Limma-Voom?

Any help or advice would be much appreciated.

Kind Regards, Alex

deseq2 limma normalization tcga rnaseq • 231 views
Entering edit mode
Last seen 4 hours ago
United States

When I try to get normalized counts (counts scaled by size factor), I get the following error message:

> dds <- makeExampleDESeqDataSet()
> x <- counts(dds, normalized=TRUE)
Error in .local(object, ...) :
  first calculate size factors, add normalizationFactors, or set normalized=FALSE
Calls: counts -> counts -> .local

So first you can just estimate the size factors, like it asked for:

> dds <- estimateSizeFactors(dds)
> x <- counts(dds, normalized=TRUE)

If you are going to use limma-voom, I'd recommend to use their pipeline from the beginning, with the raw counts.

From my perspective, the DESeq/DESeq2 and edgeR/limma-voom library size estimation procedures produce very similar scaled counts, so I wouldn't say that you have to use DESeq2 for library size scaling.

Entering edit mode

This works perfectly, a bit silly I didn't realise it just needed size factors, many thanks. Was finished in under 5 minutes on ~10,000 samples.

Thanks again, Alex


Login before adding your answer.

Traffic: 506 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6