Question

Which is the best type of data for correlation or survival analysis

0

Entering edit mode

Yang Shi ▴ 10

@ea61ff7a

Last seen 9 months ago

Zheng Zhou

Dear Communities,

Here are my questions:

(1) Which kind of data is the best type for correlation or survival analysis, e.g., DESeq2 normalised count, TPM or FPKM?

(2) Which kind of normalised count data could be used for my desired analysis?

i. RSEM expected_count (DESeq2 standardized)

This kind of data could be fetched from UCSC XENA (https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443)

ii. Transformation by the following DESeq2 code

dds <- DESeqDataSetFromMatrix(countData = exprSet, colData = metadata, design = ~ group)
dds <- dds[rowSums(counts(dds))>1,]
vsd <- vst(dds, blind = FALSE)
expr.normalised <- as.data.frame(assay(vsd)) #used as for correlation or survival analysis.

TPM RNASeq RSEM DESeq2 • 2.2k views

ADD COMMENT • link updated 20 months ago by Michael Love 41k • written 20 months ago by Yang Shi ▴ 10

score 2 · Accepted Answer · 2022-08-06

2

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 1 minute ago

Germany

Asked before: DESeq2 for survival analysis

ADD COMMENT • link 20 months ago ATpoint ★ 4.0k

0

Entering edit mode

Hi ATpoint, thanks for your reply and show me the thread. But there are still some questions:

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds) would be suggested instead of vst transformation because of elapsed time. May I ask whether the two method could be substituted with each other for correlation or survival analysis?

(2) The vst transformation code shown above is correct?

(3) z-score tranformation should be done after vst transformation in case of correlation and survival analysis?

(4) Which is the best type of data for correlation and survival analysis?

Thanks again! And looking forward to your reply!

ADD REPLY • link 20 months ago Yang Shi ▴ 10

2

Entering edit mode

I don't know what is optimal for survival analysis. We basically have our transformations and the motivation for them (see the workflow for detailed discussion). But it's up to you which you use for what application.

I do _not_ recommend z-score after VST. The whole point of the VST is to stabilize the features so that there are all on a comparable scale and you haven't inflated the noise in the data. Dividing by SD undoes that.

ADD REPLY • link 20 months ago Michael Love 41k

0

Entering edit mode

Thanks for your reply and your excellent package! Could you please tell me the difference bwtween vst transformed data and the "vtd" based on the following code? Sorry I'm the naive in Bioinformatics analysis.

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds); expr.normalised <- as.data.frame(assay(ntd))

(2) vsd <- vst(dds, blind = FALSE); expr.normalised <- as.data.frame(assay(vsd))

Furthermore, can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test? If not, is there anyother methods could be utilized? Or this is make no sense do DEGs based on these two kinds of data.

Besides, are these data need to be quantile normalization after that? Here are two boxplots made based on the two kinds od expr.normalised data.

Thanks so much! Looking forward to your reply.

enter image description here

ADD REPLY • link 20 months ago Yang Shi ▴ 10

0

Entering edit mode

VST is the variance stabilizing transformation, and in the code it produces vsd a variance stabilized dataset. VST data = vsd.

can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test?

No see here:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#exploratory-analysis-and-visualization

We in general recommend using DESeq() for differential expression, not Wilcoxon on VST data.

Besides, are these data need to be quantile normalization after that?

No, that's not part of our default pipeline.

ADD REPLY • link 20 months ago Michael Love 41k