Hi,
It puzzles me in a recent RNAseq analysis that the PCA plot clustered the repeated measures for each human subject rather reflected the actual biological condition groupings. Two RNAseq pipelines have been adopted:
- the standard pipeline: FastQC - HISAT - HTSeq - DESeq2 (all exploratory and results plots look fine)
- the 2nd pipeline: FastQC - HISAT - bedtools (converting BAM files to FASTQs) - Salmon (using selective alignment and --seqBias --gcBias flags are on; alignment rate is on average around 75%) - Tximport - DESeq2 (this is where the problematic PCA plot was observed). The PCA plot was generated using the vsd data i.e. vsd <- vst(dds, blind=FALSE) and dds was obtained from DESeqDataSetFromTximport().
The same samples sequenced on a different sequencing platform have also been analysed using the 2nd pipeline above (alignment rate following Salmon is around 30-35%), all plots look fine. I must have done something incorrectly in the most recent analysis with the PCA in question. I tried to troubleshoot where the problem may lie, have not got any luck to figure it out. I'm just wondering if anyone may have similar experience and may provide me a heads-up. Many thanks.
Guan
Thanks Kevin. You are right that the sample-specific effects may be expected.I removed the subjects as batches using limma::removeBatchEffect() and re-plotted the PCA, which is now showing the actual biological effects (PC 1 44% vs PC2 10%).
Also thanks for introducing the PCAtools; in my toolkit.
Thanks Kevin. You are right that the sample-specific effects may be expected.I removed the subjects as batches using limma::removeBatchEffect() and re-plotted the PCA, which is now showing the actual biological effects (PC 1 44% vs PC2 10%).
Also thanks for introducing the PCAtools; in my toolkit.