Hi, I have a RNA seq dataset obtained from patients treated with a specific drug (The sequencing data is of very good quality), they are separated into 3 groups: healthy controls, good responder patients and bad responder patients (patients samples are also further divided into pre-treatment and post-treatment for the same patient).
After aligning the RNA seq and performing read count I had a look a the PCA plot, and noticed that the groups do not separate well:
Regular DESeq2 analysis for differential expression between the groups yields no results, which is very strange given that we are also comparing healthy and sick people. What could have gone wrong?
thanks
Thanks for the quick reply.
My main concern was not seeing differences with the healthy controls. We did a similar experiment in the past with another drug and differences in the PCA and DE were evindent.
I performed the DE analysis using this coldata:
and simply (the cts variable contains the count data for each gene):
almost no gene was found to be DE, which is very strange....
I didn't try with the control for patient baseline, I will check it out
As Michael implied, the differences that you expect may simply not exist in this dataset. Also, you should not make any major conclusion about your data from just the PCA bi-plot. In most cases, major differences between control / healthy and other samples will simply not be revealed by PCA. What you can at least say, looking at your plot, is that your dataset does not contain outliers.
I note that your dataset is imbalanced, though, with only 5 controls versus 20 non-controls.