Question

Paired samples in cell lines using DESeq2

0

Entering edit mode

Puks ▴ 10

@puks-12113

Last seen 3.9 years ago

Estonia

Hi, I would to use DESeq2 to process three bulk RNASeq paired samples but I am trying to figure out what is the valid model to use here. I used tximport to import Kallisto's transcript-level abundance estimates at gene level to use with deseq2.

In the paired samples, the treatment is overxperssion of gene A. Sample information is as follows:

                    condition patient_id
           BT12CONT   Control        BT1
           BT12OE     OverExp        BT1
           BT53CONT   Control       BT53
           BT53OE     OverExp       BT53
           GBM5CONT   Control       GBM5
           GBM5OE     OverExp       GBM5

I am interested in looking at the condition effect while accounting for sample pairs so I thought a model like the following would be enough:

>   ~ condition + patient_id

The PCA for these samples shows that the samples separate by patient_id enter image description here

Is this simple model to look at condition/treatment effect enough?

Thanks! Puks

deseq2 • 1.7k views

ADD COMMENT • link updated 4.8 years ago by Michael Love 41k • written 4.8 years ago by Puks ▴ 10

0

Entering edit mode

Your samples notably cluster by cell line, not by treatment. Therefore it appears unfortunate to use them as biological replicates. From a biological standpoint this quite normal for cell lines. During cell line establishment there are a lot of things changing inside the cell, particular clones start growing out, the cell might acquire all kinds of alterations that help it grow. Therefore it is not unexpected to see large differences between cell lines (or even between different clones of the same cell line). I do not think this setup is a good choice to get the information you want. You should probably have used the same cell line and perform the overexpression study with this line in a replicated manner. This would give you the power to detect significant changes within the cell line. Comparing these results with the same experiment using the other two cell lines in a replicated fashion then would give you information on how reproducible the findings are from a biological standpoint.

ADD REPLY • link 4.8 years ago ATpoint ★ 4.0k

0

Entering edit mode

Thanks ATpoint! You are correct, there should have been replicates for each cell line but unfortunately the person who performed the experiment did not do it.

ADD REPLY • link 4.8 years ago Puks ▴ 10

0

Entering edit mode

I have to disagree with ATpoint here. It is actually a good design to use cell lines derived from multiple patients. This assures that the list of differentially expressed genes that OP will find is not specific to one (arbitrarily chosen) patient but has some generality and hence likely to have good overlap with he list one would find if one tried again with different patients.

The fact that the difference between patients is larger than between treatment and control indicates that the treatment has just a small effect: either a small effect on many genes, or a large one on only few genes. If the latter is the case, including "patient_id" in the model will allow to find these genes (because DESeq2 will look at the differences between treatment nd control within each sample pair).

If, however, the treatment causes genes to only change slightly, the experiment is underpowered with just three patients and will return nothing. However, while performing it with many replicates from the same patient will produce many hits, which are maybe not very useful.

ADD REPLY • link 4.8 years ago Simon Anders ★ 3.7k

score 1 · Answer 1 · 2019-07-10

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Yes, that is the correct model, ~patient + condition (it's good to put condition last in general, see vignette for details).