Question

DESeq2: design = ~1 For Normalisation Only.

0

Entering edit mode

Alex Greenshields-Watson • 0

@alex-greenshields-watson-22187

Last seen 4.4 years ago

Cardiff

Hi,

My question follows on from two related questions (https://support.bioconductor.org/p/79209/ and https://support.bioconductor.org/p/98476/).

I use DESeq2 in order to get normalized counts from raw TCGA-HTseq Counts.

My method is simple, put the data through DESeq2 - using design = ~ 1 (which gives a warning). https://support.bioconductor.org/p/79209/

exptDesign_TCGA = data.frame(
  row.names = colnames(matrix_TCGA_data),
  condition = sample_tsv$Project.ID)

dds <- DESeqDataSetFromMatrix(
  countData = matrix_TCGA_data,
  colData = exptDesign_TCGA,
  design = ~ 1)

Then I do all of my analysis (clustering, survival, gene expression heat maps) on the normalised counts from Deseq2.

However I am running into problems with huge numbers of samples and computational power - directly related to the following post: https://support.bioconductor.org/p/98476/

I know Michael Love suggests using Limma-Voom, but I have tried that - run my analysis and it is essentially scales the counts by gene across all samples, not what I want at all, as I lose the relative expression of one gene to another gene in the same sample. I need each sample to be normalised by the DESeq2 method.

In his response, Michael Love has also suggested this approach - https://support.bioconductor.org/p/98476/


rowSums(counts(dds,normalized=TRUE) >= 10) >= 5).

However this does not make sense - getting normalized counts is what is taking too long - as in order to perform the counts function with normalized=T - you have to first run the DeSeq2 function in the first place to run (which estimates dispersion and size factors and is incredibly time consuming).

So, can I please have clarification on the following:

a) Was the above response meant to suggest I take the raw counts and cut out all low reads - then perform Deseq2 on this slim data set.


keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
#differential analysis -DESeq()- is then run on this object
analysisObject_TCGA <- DESeq(dds, parallel = T)

b) Is there another way to do this? Should I just work with TCGA raw counts and not normalise other than scaling prior to clustering and survival analysis? - Is this wrong?

c) Is my use of Limma-Voom incorrect - as the results appear to be simply gene scaling. Should I transpose my counts matrix and re-run Limma-Voom?

Any help or advice would be much appreciated.

Kind Regards, Alex

deseq2 limma normalization tcga rnaseq • 2.3k views

ADD COMMENT • link updated 4.5 years ago by Michael Love 41k • written 4.5 years ago by Alex Greenshields-Watson • 0

score 2 · Accepted Answer · 2019-10-22

When I try to get normalized counts (counts scaled by size factor), I get the following error message:

> dds <- makeExampleDESeqDataSet()
> x <- counts(dds, normalized=TRUE)
Error in .local(object, ...) :
  first calculate size factors, add normalizationFactors, or set normalized=FALSE
Calls: counts -> counts -> .local

So first you can just estimate the size factors, like it asked for:

> dds <- estimateSizeFactors(dds)
> x <- counts(dds, normalized=TRUE)

If you are going to use limma-voom, I'd recommend to use their pipeline from the beginning, with the raw counts.

From my perspective, the DESeq/DESeq2 and edgeR/limma-voom library size estimation procedures produce very similar scaled counts, so I wouldn't say that you have to use DESeq2 for library size scaling.