Question: Paired samples in cell lines using DESeq2
0
3 months ago by
Puks10
Estonia
Puks10 wrote:

Hi, I would to use DESeq2 to process three bulk RNASeq paired samples but I am trying to figure out what is the valid model to use here. I used tximport to import Kallisto's transcript-level abundance estimates at gene level to use with deseq2.

In the paired samples, the treatment is overxperssion of gene A. Sample information is as follows:

                    condition patient_id
BT12CONT   Control        BT1
BT12OE     OverExp        BT1
BT53CONT   Control       BT53
BT53OE     OverExp       BT53
GBM5CONT   Control       GBM5
GBM5OE     OverExp       GBM5


I am interested in looking at the condition effect while accounting for sample pairs so I thought a model like the following would be enough:

>   ~ condition + patient_id


The PCA for these samples shows that the samples separate by patient_id

Is this simple model to look at condition/treatment effect enough?

Thanks! Puks

deseq2 • 168 views
modified 3 months ago by Michael Love25k • written 3 months ago by Puks10

Your samples notably cluster by cell line, not by treatment. Therefore it appears unfortunate to use them as biological replicates. From a biological standpoint this quite normal for cell lines. During cell line establishment there are a lot of things changing inside the cell, particular clones start growing out, the cell might acquire all kinds of alterations that help it grow. Therefore it is not unexpected to see large differences between cell lines (or even between different clones of the same cell line). I do not think this setup is a good choice to get the information you want. You should probably have used the same cell line and perform the overexpression study with this line in a replicated manner. This would give you the power to detect significant changes within the cell line. Comparing these results with the same experiment using the other two cell lines in a replicated fashion then would give you information on how reproducible the findings are from a biological standpoint.

Thanks ATpoint! You are correct, there should have been replicates for each cell line but unfortunately the person who performed the experiment did not do it.

I have to disagree with ATpoint here. It is actually a good design to use cell lines derived from multiple patients. This assures that the list of differentially expressed genes that OP will find is not specific to one (arbitrarily chosen) patient but has some generality and hence likely to have good overlap with he list one would find if one tried again with different patients.

The fact that the difference between patients is larger than between treatment and control indicates that the treatment has just a small effect: either a small effect on many genes, or a large one on only few genes. If the latter is the case, including "patient_id" in the model will allow to find these genes (because DESeq2 will look at the differences between treatment nd control within each sample pair).

If, however, the treatment causes genes to only change slightly, the experiment is underpowered with just three patients and will return nothing. However, while performing it with many replicates from the same patient will produce many hits, which are maybe not very useful.

Answer: Paired samples in cell lines using DESeq2
1
3 months ago by
Michael Love25k
United States
Michael Love25k wrote:

Yes, that is the correct model, ~patient + condition (it's good to put condition last in general, see vignette for details).