Question

how to deal with differential analysis between 3 groups in DESeq2?

0

Entering edit mode

kf • 0

@kf-24098

Last seen 3.3 years ago

SA

Hi,

I have three group RNA-seq data, each of them have three replicates: MCF10A (normal), MCF7 (Breast cancer) and MCF7-TamR (Tam Resistant). I was wondering if there is a way to design a differential analysis to find the differential genes between MCF7 and MCF7TR while MCF10A serve as "background/control" role? Thanks in advance!

Best,
Kun

deseq2 • 1.0k views

ADD COMMENT • link updated 3.6 years ago by swbarnes2 ★ 1.3k • written 3.6 years ago by kf • 0

0

Entering edit mode

I think my question could be transform to 'a factor with three level' case, which is discussed in DESeq2 vignette section "Variations to the standard workflow". And this thread discussed well about it: https://www.biostars.org/p/357464/

ADD REPLY • link 3.6 years ago kf • 0

score 0 · Answer 1 · 2020-09-30

0

Entering edit mode

swbarnes2 ★ 1.3k

@swbarnes2-14086

Last seen 12 hours ago

San Diego

I don't think you need to use anything as a background, and I don't adding the variance of an extra comparison is going to be good for the math. If you want to compare two tissues, compare them to each other.

ADD COMMENT • link 3.6 years ago swbarnes2 ★ 1.3k

0

Entering edit mode

Thanks for your reply! So those multiple groups design might better fits the situation of different drug treatment?

ADD REPLY • link 3.6 years ago kf • 0

1

Entering edit mode

Thanks for linking my thread from Biostars (above). Are you referring to a design of the form, for example:

(MCF7 - MCF7TR) - MCF10A

?

Echoing swbarnes2's sentiment, though, I tend to keep things as simple as possible for these things. In my view, multiple pairwise comparisons would provide much useful information here. I find that many people want just a single p-value that can explain their entire experimental setup, while performing multiple different comparisons (and multiple p-values) may actually be better.

ADD REPLY • link 3.6 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi, Kevin

Thanks for your reply! I will try both ways to which one is better in my case. Yes, I would like to design (MCF7TR - MCF7) -MCF10A, I was wondering if there is any modification I need to do with your codes share in biostars' thread in my case? Thanks a lot!

Best,
Kun

ADD REPLY • link 3.6 years ago kf • 0

1

Entering edit mode

A contrast like

(MCF7 - MCF7TR) - MCF10A

doesn't make sense, really. It's the same thing as

MCF7 - MCF7TR - MCF10A
## or
MCF7 - (MCF7TR + MCF10A)

So you are testing for any differences between the sum of MCF7TR and MCF10A and MCF7? I'm not sure that is an interesting thing to know. Most genes would be significant in that instance (and that would certainly include any genes that don't change at all between any of the samples, if they are expressed at any reasonable level).

You could say 'I want genes that are different, after adjusting out the normal gene levels', which would be something like

(MCF7 - MCF10A) - (MCF7TR - MCF10A)

but algebraically that's the same thing as

MCF7 - MCF7TR

so why bother?

ADD REPLY • link 3.6 years ago James W. MacDonald 65k

0

Entering edit mode

Hi, James

Thanks for your reply. I was wondering if this kind of case exist: gene both highly expressed in MCF7TR and MCF10A but lowly expressed in MCF7. This gene might be contribute to the cancer but not contribute to the Tam-Resistant? What I would like to find is those genes differentiated from both cancer and normal. I was wondering if there is a way to design the formula? Thanks again!

Best,
Kun

ADD REPLY • link 3.6 years ago kf • 0

1

Entering edit mode

That's a different question, and there are probably two ways to answer it. You could make the individual comparisons between say MCF7TR and MCF10A, as well as MCF7 and MCF10A, and then look for genes that are significant in the second but not the first. That's not inferential though. Lack of significance isn't the same as evidence for no difference.

An alternative would be the contrast (MCF7R + MCF10A)/2 - MCF7, which tests if the average expression of the first two is different from MCF7. But that isn't the same as saying that the first two are really similar, because you are taking the average and using the within-group variability. So you could have a situation where say MCF10A is really consistent, but 3-fold higher than both MCF7R and MCF7, in which case the average of the two would be high as well, and since the within-group variability is low, it would likely be significant.

I guess another alternative would be to make another factor, say MCF7RorMCF10A, and test vs MCF7. In which case you will pick up genes that are consistently expressed in the first two groups, and different to the second. The downside to that is people could say 'Bro, those are two different groups, why are you combining them?'

ADD REPLY • link 3.6 years ago James W. MacDonald 65k

1

Entering edit mode

"look for genes that are significant in the second but not the first"

=> here you can use altHypothesis="lessAbs" to specify a test of equivalence. You would also need to specify a lfcThreshold to define a region of equivalence.

ADD REPLY • link 3.6 years ago Michael Love 41k

0

Entering edit mode

Hi, Michael

Thanks for your remind!

Best,
Kun

ADD REPLY • link 3.6 years ago kf • 0

0

Entering edit mode

Thanks for your elaboration! Just take many time to digest your answer.. If I understand correctly, these three methods are specifically finding those gene that are high expressedly in both MCF7TR and MCF10A but low expressed in MCF7. But If I would like to generally find differentiated genes between MCF7 and MCF7TR, it would be better just do the DEG analysis between these two and let MCF10A alone, right? Thanks again for your valuable time!

Best,
Kun

ADD REPLY • link 3.6 years ago kf • 0