DESeq2: design = ~1 For Normalisation Only.
1
0
Entering edit mode
@alex-greenshields-watson-22187
Last seen 5.1 years ago
Cardiff

Hi,

My question follows on from two related questions (https://support.bioconductor.org/p/79209/ and https://support.bioconductor.org/p/98476/).

I use DESeq2 in order to get normalized counts from raw TCGA-HTseq Counts.

My method is simple, put the data through DESeq2 - using design = ~ 1 (which gives a warning). https://support.bioconductor.org/p/79209/

exptDesign_TCGA = data.frame(
  row.names = colnames(matrix_TCGA_data),
  condition = sample_tsv$Project.ID)

dds <- DESeqDataSetFromMatrix(
  countData = matrix_TCGA_data,
  colData = exptDesign_TCGA,
  design = ~ 1)

Then I do all of my analysis (clustering, survival, gene expression heat maps) on the normalised counts from Deseq2.

However I am running into problems with huge numbers of samples and computational power - directly related to the following post: https://support.bioconductor.org/p/98476/

I know Michael Love suggests using Limma-Voom, but I have tried that - run my analysis and it is essentially scales the counts by gene across all samples, not what I want at all, as I lose the relative expression of one gene to another gene in the same sample. I need each sample to be normalised by the DESeq2 method.

In his response, Michael Love has also suggested this approach - https://support.bioconductor.org/p/98476/


rowSums(counts(dds,normalized=TRUE) >= 10) >= 5).

However this does not make sense - getting normalized counts is what is taking too long - as in order to perform the counts function with normalized=T - you have to first run the DeSeq2 function in the first place to run (which estimates dispersion and size factors and is incredibly time consuming).

So, can I please have clarification on the following:

a) Was the above response meant to suggest I take the raw counts and cut out all low reads - then perform Deseq2 on this slim data set.


keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
#differential analysis -DESeq()- is then run on this object
analysisObject_TCGA <- DESeq(dds, parallel = T)


b) Is there another way to do this? Should I just work with TCGA raw counts and not normalise other than scaling prior to clustering and survival analysis? - Is this wrong?

c) Is my use of Limma-Voom incorrect - as the results appear to be simply gene scaling. Should I transpose my counts matrix and re-run Limma-Voom?

Any help or advice would be much appreciated.

Kind Regards, Alex

deseq2 limma normalization tcga rnaseq • 3.1k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

When I try to get normalized counts (counts scaled by size factor), I get the following error message:

> dds <- makeExampleDESeqDataSet()
> x <- counts(dds, normalized=TRUE)
Error in .local(object, ...) :
  first calculate size factors, add normalizationFactors, or set normalized=FALSE
Calls: counts -> counts -> .local

So first you can just estimate the size factors, like it asked for:

> dds <- estimateSizeFactors(dds)
> x <- counts(dds, normalized=TRUE)

If you are going to use limma-voom, I'd recommend to use their pipeline from the beginning, with the raw counts.

From my perspective, the DESeq/DESeq2 and edgeR/limma-voom library size estimation procedures produce very similar scaled counts, so I wouldn't say that you have to use DESeq2 for library size scaling.

ADD COMMENT
0
Entering edit mode

This works perfectly, a bit silly I didn't realise it just needed size factors, many thanks. Was finished in under 5 minutes on ~10,000 samples.

Thanks again, Alex

ADD REPLY

Login before adding your answer.

Traffic: 684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6