Question

What is the correct way to split genex_high and genex_low groups for DE analysis?

0

Entering edit mode

rohanphn • 0

@c7686fef

Last seen 12 months ago

Brazil

Hey guys. Good afternoon.

I would like to separate the TCGA-STAD data into two groups, one with high expression for gene x, and one with low expression for gene x. I would like to separate these groups according to quartiles, taking the upper and lower quartiles. Then, I would perform the DESeq2 analysis between these two groups, since I hypothesize that they have different biological characteristics.

However, I am in doubt on which data I should perform this separation into quartiles, whether it is in the raw data, in the TPM normalized data, or in the data normalized by DESeq2.

I thought of it this way. As DESeq2 only accepts a group design as input, I would create a "fake" variable, randomly placing number 1 or 2, to use as a group variable just to get the normalized data, since the design does not affect normalization. After obtaining the normalized data, I would separate the quartiles and the patients that are part of each quartile. It would then use that information to filter the raw data and use it as a design to run DESeq2. However, I feel this feels wrong.

Could anyone give me some suggestion? I couldn't find a thread about it.

Thanks in advance.

DESeq2 RNASeq • 559 views

ADD COMMENT • link updated 12 months ago by Michael Love 41k • written 12 months ago by rohanphn • 0

score 0 · Answer 1 · 2023-05-01

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

We don't have any statistical procedures for detecting high or low expression. You might try taking the abundance (TPM) and performing model-based clustering on the log of abundance, across samples.

ADD COMMENT • link 12 months ago Michael Love 41k