Question

some questions about using the rlog function of DEseq2

0

Entering edit mode

137737756 • 0

@137737756-16412

Last seen 4.5 years ago

I am now tring to use the TCGA HTseq_counts data to do some annlysis.

I plan to follow the process below.

First. Use the RNA_seq dataset which containing tumor and normal tissue to find differently expressed genes by DESeq2::DESeq.

Second, because i only want to normalize the tumor data, so i use DESeq2::rolg to transform the data only containing tumor data.

Third, I will use the rolg transformed tumor data to do survial analysis and unsupervised clustering analysis.

Four, after clustering the rolg transformed tumor data, the tumor will be clustered be some groups, at this time, i may want to separately compare the tumor groups each other, and i will to compare the each group to the normal tissue. So, i want to back to the original data which containing the tumor and normal tissue, and set the tumor to some groups, and use DESeq2::DESeq to do differently analysis.

Here is my question:

1. Is there some misstakes of my process?

2. I worry about the data which contains normal tissue will disturb the transform of the tumor data, so i use the data only contains tumor data to do rlog transform, because i am thinking that the survival analysis and clustering analysis is regardless of the normal tissue data. And is rlog transform have a different result to a dataset A and dataset B whcih is part of A?

3. When i go to analysis the differently expressed genes between the tumor group. Is that correct to back to use the original data which is not transformed to do DESeq2::DESeq?

Thank you, I hope you can give me advice.

deseq2 rna-seq • 847 views

ADD COMMENT • link updated 6.4 years ago by Michael Love 43k • written 6.4 years ago by 137737756 • 0

score 1 · Answer 1 · 2018-07-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 15 hours ago

United States

I would recommend the VST for transforming especially large datasets. Anyway if you run rlog() on a large dataset, it will give this warning. The vst() function will be much faster than rlog() and it turns out to be more robust. If you have large differences between groups, I'd recommend vst(dds, blind=FALSE) as discussed in the vignette. You can include the normal samples, it shouldn't be a problem.

In general, yes we recommend variance stabilized data for e.g. calculating sample distances or PCA, and the original counts for differential analysis.

ADD COMMENT • link 6.4 years ago Michael Love 43k

0

Entering edit mode

Thank you for constructive opinion. I will try to use VST transform.

ADD REPLY • link 6.4 years ago 137737756 • 0