Uncertain about PCA plot for DESeq2 analysis
Entering edit mode
pepere ▴ 40
Last seen 3.3 years ago

Hi, I have a RNA seq dataset obtained from patients treated with a specific drug (The sequencing data is of very good quality), they are separated into 3 groups: healthy controls, good responder patients and bad responder patients (patients samples are also further divided into pre-treatment and post-treatment for the same patient). 

After aligning the RNA seq and performing read count I had a look a the PCA plot, and noticed that the groups do not separate well:

Regular DESeq2 analysis for differential expression between the groups yields no results, which is very strange given that we are also comparing healthy and sick people. What could have gone wrong?


deseq2 pca bioconductor rnaseq • 754 views
Entering edit mode
Last seen 1 day ago
United States

The most plausible explanation is that there isn’t the signal you expect in this dataset, and that’s a question to bring back to the team.

How did you perform the DE analysis. Please post your code. Did you control for patient baseline? There is an example of how to do this in the vignette.

Entering edit mode

Thanks for the quick reply.

My main concern was not seeing differences with the healthy controls. We did a similar experiment in the past with another drug and differences in the PCA and DE were evindent.

I performed the DE analysis using this coldata:

Sample Patient Responder Time Control
m01 15 BadResponder Basal No
m02 14 BadResponder Basal No
m03 11 BadResponder Basal No
m04 6 GoodResponder Basal No
m05 7 GoodResponder Basal No
m06 8 GoodResponder Basal No
m07 9 GoodResponder Basal No
m08 10 GoodResponder Basal No
m09 13 BadResponder Basal No
m10 12 BadResponder Basal No
m11 6 GoodResponder 1year No
m12 14 BadResponder 1year No
m13 7 GoodResponder 1year No
m14 13 BadResponder 1year No
m15 10 GoodResponder 1year No
m16 11 BadResponder 1year No
m17 8 GoodResponder 1year No
m18 15 BadResponder 1year No
m19 9 GoodResponder 1year No
m20 12 BadResponder 1year No
m21 1 Control None Yes
m22 2 Control None Yes
m23 3 Control None Yes
m24 4 Control None Yes
m25 5 Control None Yes

and simply (the cts variable contains the count data for each gene):

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Control)
dds <- dds[ rowSums(counts(dds)) > 1, ]
dds <- DESeq(dds)
res <- results(dds)

almost no gene was found to be DE, which is very strange....

I didn't try with the control for patient baseline, I will check it out


Entering edit mode

As Michael implied, the differences that you expect may simply not exist in this dataset. Also, you should not make any major conclusion about your data from just the PCA bi-plot. In most cases, major differences between control / healthy and other samples will simply not be revealed by PCA. What you can at least say, looking at your plot, is that your dataset does not contain outliers.

I note that your dataset is imbalanced, though, with only 5 controls versus 20 non-controls.


Login before adding your answer.

Traffic: 644 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6