Which is the best type of data for correlation or survival analysis
Entering edit mode
Yang Shi • 0
Last seen 6 hours ago
Zheng Zhou

Dear Communities,

Here are my questions:

(1) Which kind of data is the best type for correlation or survival analysis, e.g., DESeq2 normalised count, TPM or FPKM?

(2) Which kind of normalised count data could be used for my desired analysis?

i. RSEM expected_count (DESeq2 standardized)

This kind of data could be fetched from UCSC XENA (https://xenabrowser.net/datapages/?dataset=TCGA-GTEx-TARGET-gene-exp-counts.deseq2-normalized.log2&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443)

ii. Transformation by the following DESeq2 code

dds <- DESeqDataSetFromMatrix(countData = exprSet, colData = metadata, design = ~ group)
dds <- dds[rowSums(counts(dds))>1,]
vsd <- vst(dds, blind = FALSE)
expr.normalised <- as.data.frame(assay(vsd)) #used as for correlation or survival analysis.
TPM RNASeq RSEM DESeq2 • 175 views
Entering edit mode
ATpoint ★ 1.4k
Last seen 1 day ago

Asked before: DESeq2 for survival analysis

Entering edit mode

Hi ATpoint, thanks for your reply and show me the thread. But there are still some questions:

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds) would be suggested instead of vst transformation because of elapsed time. May I ask whether the two method could be substituted with each other for correlation or survival analysis?

(2) The vst transformation code shown above is correct?

(3) z-score tranformation should be done after vst transformation in case of correlation and survival analysis?

(4) Which is the best type of data for correlation and survival analysis?

Thanks again! And looking forward to your reply!

Entering edit mode

I don't know what is optimal for survival analysis. We basically have our transformations and the motivation for them (see the workflow for detailed discussion). But it's up to you which you use for what application.

I do _not_ recommend z-score after VST. The whole point of the VST is to stabilize the features so that there are all on a comparable scale and you haven't inflated the noise in the data. Dividing by SD undoes that.

Entering edit mode

Thanks for your reply and your excellent package! Could you please tell me the difference bwtween vst transformed data and the "vtd" based on the following code? Sorry I'm the naive in Bioinformatics analysis.

(1) dds <- estimateSizeFactors(dds); ntd <- normTransform(dds); expr.normalised <- as.data.frame(assay(ntd))

(2) vsd <- vst(dds, blind = FALSE); expr.normalised <- as.data.frame(assay(vsd))

Furthermore, can I make differentially expressed genes analysis based on these two kinds of data using wilcox.test? If not, is there anyother methods could be utilized? Or this is make no sense do DEGs based on these two kinds of data.

Besides, are these data need to be quantile normalization after that? Here are two boxplots made based on the two kinds od expr.normalised data.

Thanks so much! Looking forward to your reply.

enter image description here enter image description here


Login before adding your answer.

Traffic: 291 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6