Paired samples in cell lines using DESeq2
1
0
Entering edit mode
Puks ▴ 10
@puks-12113
Last seen 4.5 years ago
Estonia

Hi, I would to use DESeq2 to process three bulk RNASeq paired samples but I am trying to figure out what is the valid model to use here. I used tximport to import Kallisto's transcript-level abundance estimates at gene level to use with deseq2.

In the paired samples, the treatment is overxperssion of gene A. Sample information is as follows:

                    condition patient_id
           BT12CONT   Control        BT1
           BT12OE     OverExp        BT1
           BT53CONT   Control       BT53
           BT53OE     OverExp       BT53
           GBM5CONT   Control       GBM5
           GBM5OE     OverExp       GBM5

I am interested in looking at the condition effect while accounting for sample pairs so I thought a model like the following would be enough:

>   ~ condition + patient_id 

The PCA for these samples shows that the samples separate by patient_id enter image description here

Is this simple model to look at condition/treatment effect enough?

Thanks! Puks

deseq2 • 2.2k views
ADD COMMENT
0
Entering edit mode

Your samples notably cluster by cell line, not by treatment. Therefore it appears unfortunate to use them as biological replicates. From a biological standpoint this quite normal for cell lines. During cell line establishment there are a lot of things changing inside the cell, particular clones start growing out, the cell might acquire all kinds of alterations that help it grow. Therefore it is not unexpected to see large differences between cell lines (or even between different clones of the same cell line). I do not think this setup is a good choice to get the information you want. You should probably have used the same cell line and perform the overexpression study with this line in a replicated manner. This would give you the power to detect significant changes within the cell line. Comparing these results with the same experiment using the other two cell lines in a replicated fashion then would give you information on how reproducible the findings are from a biological standpoint.

ADD REPLY
0
Entering edit mode

Thanks ATpoint! You are correct, there should have been replicates for each cell line but unfortunately the person who performed the experiment did not do it.

ADD REPLY
0
Entering edit mode

I have to disagree with ATpoint here. It is actually a good design to use cell lines derived from multiple patients. This assures that the list of differentially expressed genes that OP will find is not specific to one (arbitrarily chosen) patient but has some generality and hence likely to have good overlap with he list one would find if one tried again with different patients.

The fact that the difference between patients is larger than between treatment and control indicates that the treatment has just a small effect: either a small effect on many genes, or a large one on only few genes. If the latter is the case, including "patient_id" in the model will allow to find these genes (because DESeq2 will look at the differences between treatment nd control within each sample pair).

If, however, the treatment causes genes to only change slightly, the experiment is underpowered with just three patients and will return nothing. However, while performing it with many replicates from the same patient will produce many hits, which are maybe not very useful.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Yes, that is the correct model, ~patient + condition (it's good to put condition last in general, see vignette for details).

ADD COMMENT
0
Entering edit mode

Thanks Michael! I will change the order.

ADD REPLY

Login before adding your answer.

Traffic: 945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6