some questions about using the rlog function of DEseq2
1
0
Entering edit mode
137737756 • 0
@137737756-16412
Last seen 3.8 years ago

I am now tring to use the TCGA HTseq_counts data to do some annlysis.

I plan to follow the process below.

First. Use the RNA_seq dataset which containing tumor and normal tissue to find differently expressed genes by DESeq2::DESeq.

Second, because i only want to normalize the tumor data, so i use DESeq2::rolg to transform the data only containing tumor data.

Third, I will use the rolg transformed tumor data to do survial analysis and unsupervised clustering analysis.

Four, after clustering the rolg transformed tumor data, the tumor will be clustered be some groups, at this time, i may want to separately compare the tumor groups each other, and i will to compare the each group to the normal tissue. So, i want to back to the original data which containing the tumor and normal tissue, and set the tumor to some groups, and use DESeq2::DESeq to do differently analysis. 

Here is my question:

1. Is there some misstakes of my process?

2. I worry about the data which contains normal tissue will disturb the transform of the tumor data, so i use the data only contains tumor data to do rlog transform,  because i am thinking that the survival analysis and clustering analysis is regardless of the normal tissue data. And is rlog transform have a different result to a dataset A and dataset B whcih is part of A?

3. When i go to analysis the differently expressed genes between the tumor group. Is that correct to back to use the original data which is not transformed to do DESeq2::DESeq?

Thank you, I hope you can give me advice.

deseq2 rna-seq • 710 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

I would recommend the VST for transforming especially large datasets. Anyway if you run rlog() on a large dataset, it will give this warning. The vst() function will be much faster than rlog() and it turns out to be more robust. If you have large differences between groups, I'd recommend vst(dds, blind=FALSE) as discussed in the vignette. You can include the normal samples, it shouldn't be a problem.

In general, yes we recommend variance stabilized data for e.g. calculating sample distances or PCA, and the original counts for differential analysis.

 

ADD COMMENT
0
Entering edit mode

Thank you for constructive opinion. I will try to use VST transform. 

ADD REPLY

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6