Question: DESeq2 for survival analysis
1
24 months ago by
array chip360
array chip360 wrote:

Dear all,

I am new to RNA-seq analysis with bioconductor. I am wondering if I can use DESeq2 directly to perform survival analysis (e.g. Cox regression)? If not directly, do I understand correctly that we should do rlog or vst transformation on the count data before running Cox regression separately?

Second question is I have RNA-seq dataset with 1000 samples (20000 genes), it seems to be prohibitive to run rlog on this dataset with problem of running out of memory, any suggestion on ways to get around that?

Is there any other DE analysis packages (edgeR, limma, etc) that can perform survival analysis in addition to linear models?

Thanks!

John

modified 24 months ago • written 24 months ago by array chip360

Thank you Michael! This is very helpful.

Michael, just to make sure, if I want to use SAMseq, I should use raw counts without any normalization/transformation (not even sequence depth adjustment), juts like with DESeq(), correct?

Yes, raw counts.

1
24 months ago by
Michael Love26k
United States
Michael Love26k wrote:

We do not have a Cox PH regression model built into DESeq2. You could use variance stabilized counts for downstream methods.

Another option, if you have sufficient sample size, is to use the survival approach implemented in the SAMseq function (for this approach you should provide raw counts, not transformed by DESeq2 functions):

https://www.rdocumentation.org/packages/samr/versions/2.0/topics/SAMseq

In the vignette and in the workflow, we discuss to use the VST for large sample datasets, and not to attempt to use the rlog.

For 1000+ samples, even the VST might take a long time. You might instead use:

dds <- estimateSizeFactors(dds)
ntd <- normTransform(dds)

which simply applies a log2(x+1) transformation to the normalized counts. You can adjust the pseudocount higher (5,10) to produce more shrinkage of log counts.