DESeq2: design = ~1 For Normalisation Only.
1
0
Entering edit mode
@alex-greenshields-watson-22187
Last seen 21 months ago
Cardiff

Hi,

My question follows on from two related questions (https://support.bioconductor.org/p/79209/ and https://support.bioconductor.org/p/98476/).

I use DESeq2 in order to get normalized counts from raw TCGA-HTseq Counts.

My method is simple, put the data through DESeq2 - using design = ~ 1 (which gives a warning). https://support.bioconductor.org/p/79209/

exptDesign_TCGA = data.frame(
row.names = colnames(matrix_TCGA_data),
condition = sample_tsv\$Project.ID)

dds <- DESeqDataSetFromMatrix(
countData = matrix_TCGA_data,
colData = exptDesign_TCGA,
design = ~ 1)


Then I do all of my analysis (clustering, survival, gene expression heat maps) on the normalised counts from Deseq2.

However I am running into problems with huge numbers of samples and computational power - directly related to the following post: https://support.bioconductor.org/p/98476/

I know Michael Love suggests using Limma-Voom, but I have tried that - run my analysis and it is essentially scales the counts by gene across all samples, not what I want at all, as I lose the relative expression of one gene to another gene in the same sample. I need each sample to be normalised by the DESeq2 method.

In his response, Michael Love has also suggested this approach - https://support.bioconductor.org/p/98476/


rowSums(counts(dds,normalized=TRUE) >= 10) >= 5).



However this does not make sense - getting normalized counts is what is taking too long - as in order to perform the counts function with normalized=T - you have to first run the DeSeq2 function in the first place to run (which estimates dispersion and size factors and is incredibly time consuming).

So, can I please have clarification on the following:

a) Was the above response meant to suggest I take the raw counts and cut out all low reads - then perform Deseq2 on this slim data set.


keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep, ]
#differential analysis -DESeq()- is then run on this object
analysisObject_TCGA <- DESeq(dds, parallel = T)



b) Is there another way to do this? Should I just work with TCGA raw counts and not normalise other than scaling prior to clustering and survival analysis? - Is this wrong?

c) Is my use of Limma-Voom incorrect - as the results appear to be simply gene scaling. Should I transpose my counts matrix and re-run Limma-Voom?

Any help or advice would be much appreciated.

Kind Regards, Alex

deseq2 limma normalization tcga rnaseq • 271 views
2
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

When I try to get normalized counts (counts scaled by size factor), I get the following error message:

> dds <- makeExampleDESeqDataSet()
> x <- counts(dds, normalized=TRUE)
Error in .local(object, ...) :
first calculate size factors, add normalizationFactors, or set normalized=FALSE
Calls: counts -> counts -> .local


So first you can just estimate the size factors, like it asked for:

> dds <- estimateSizeFactors(dds)
> x <- counts(dds, normalized=TRUE)


If you are going to use limma-voom, I'd recommend to use their pipeline from the beginning, with the raw counts.

From my perspective, the DESeq/DESeq2 and edgeR/limma-voom library size estimation procedures produce very similar scaled counts, so I wouldn't say that you have to use DESeq2 for library size scaling.

0
Entering edit mode

This works perfectly, a bit silly I didn't realise it just needed size factors, many thanks. Was finished in under 5 minutes on ~10,000 samples.

Thanks again, Alex

Traffic: 249 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.