Question: some questions about using the rlog function of DEseq2
gravatar for 137737756
9 days ago by
1377377560 wrote:

I am now tring to use the TCGA HTseq_counts data to do some annlysis.

I plan to follow the process below.

First. Use the RNA_seq dataset which containing tumor and normal tissue to find differently expressed genes by DESeq2::DESeq.

Second, because i only want to normalize the tumor data, so i use DESeq2::rolg to transform the data only containing tumor data.

Third, I will use the rolg transformed tumor data to do survial analysis and unsupervised clustering analysis.

Four, after clustering the rolg transformed tumor data, the tumor will be clustered be some groups, at this time, i may want to separately compare the tumor groups each other, and i will to compare the each group to the normal tissue. So, i want to back to the original data which containing the tumor and normal tissue, and set the tumor to some groups, and use DESeq2::DESeq to do differently analysis. 

Here is my question:

1. Is there some misstakes of my process?

2. I worry about the data which contains normal tissue will disturb the transform of the tumor data, so i use the data only contains tumor data to do rlog transform,  because i am thinking that the survival analysis and clustering analysis is regardless of the normal tissue data. And is rlog transform have a different result to a dataset A and dataset B whcih is part of A?

3. When i go to analysis the differently expressed genes between the tumor group. Is that correct to back to use the original data which is not transformed to do DESeq2::DESeq?

Thank you, I hope you can give me advice.

ADD COMMENTlink modified 9 days ago by Michael Love18k • written 9 days ago by 1377377560
gravatar for Michael Love
9 days ago by
Michael Love18k
United States
Michael Love18k wrote:

I would recommend the VST for transforming especially large datasets. Anyway if you run rlog() on a large dataset, it will give this warning. The vst() function will be much faster than rlog() and it turns out to be more robust. If you have large differences between groups, I'd recommend vst(dds, blind=FALSE) as discussed in the vignette. You can include the normal samples, it shouldn't be a problem.

In general, yes we recommend variance stabilized data for e.g. calculating sample distances or PCA, and the original counts for differential analysis.


ADD COMMENTlink written 9 days ago by Michael Love18k

Thank you for constructive opinion. I will try to use VST transform. 

ADD REPLYlink written 8 days ago by 1377377560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 291 users visited in the last hour