Question: Uncertain about PCA plot for DESeq2 analysis
gravatar for pepere
8 months ago by
pepere40 wrote:

Hi, I have a RNA seq dataset obtained from patients treated with a specific drug (The sequencing data is of very good quality), they are separated into 3 groups: healthy controls, good responder patients and bad responder patients (patients samples are also further divided into pre-treatment and post-treatment for the same patient). 

After aligning the RNA seq and performing read count I had a look a the PCA plot, and noticed that the groups do not separate well:

Regular DESeq2 analysis for differential expression between the groups yields no results, which is very strange given that we are also comparing healthy and sick people. What could have gone wrong?


rnaseq deseq2 bioconductor pca • 232 views
ADD COMMENTlink modified 8 months ago by Michael Love25k • written 8 months ago by pepere40
Answer: Uncertain about PCA plot for DESeq2 analysis
gravatar for Michael Love
8 months ago by
Michael Love25k
United States
Michael Love25k wrote:

The most plausible explanation is that there isn’t the signal you expect in this dataset, and that’s a question to bring back to the team.

How did you perform the DE analysis. Please post your code. Did you control for patient baseline? There is an example of how to do this in the vignette.

ADD COMMENTlink written 8 months ago by Michael Love25k

Thanks for the quick reply.

My main concern was not seeing differences with the healthy controls. We did a similar experiment in the past with another drug and differences in the PCA and DE were evindent.

I performed the DE analysis using this coldata:

Sample Patient Responder Time Control
m01 15 BadResponder Basal No
m02 14 BadResponder Basal No
m03 11 BadResponder Basal No
m04 6 GoodResponder Basal No
m05 7 GoodResponder Basal No
m06 8 GoodResponder Basal No
m07 9 GoodResponder Basal No
m08 10 GoodResponder Basal No
m09 13 BadResponder Basal No
m10 12 BadResponder Basal No
m11 6 GoodResponder 1year No
m12 14 BadResponder 1year No
m13 7 GoodResponder 1year No
m14 13 BadResponder 1year No
m15 10 GoodResponder 1year No
m16 11 BadResponder 1year No
m17 8 GoodResponder 1year No
m18 15 BadResponder 1year No
m19 9 GoodResponder 1year No
m20 12 BadResponder 1year No
m21 1 Control None Yes
m22 2 Control None Yes
m23 3 Control None Yes
m24 4 Control None Yes
m25 5 Control None Yes

and simply (the cts variable contains the count data for each gene):

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Control)
dds <- dds[ rowSums(counts(dds)) > 1, ]
dds <- DESeq(dds)
res <- results(dds)

almost no gene was found to be DE, which is very strange....

I didn't try with the control for patient baseline, I will check it out


ADD REPLYlink written 8 months ago by pepere40

As Michael implied, the differences that you expect may simply not exist in this dataset. Also, you should not make any major conclusion about your data from just the PCA bi-plot. In most cases, major differences between control / healthy and other samples will simply not be revealed by PCA. What you can at least say, looking at your plot, is that your dataset does not contain outliers.

I note that your dataset is imbalanced, though, with only 5 controls versus 20 non-controls.

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 123 users visited in the last hour