Question

RNAseq PCA resembles subject-specific effect in a paired study design

0

Entering edit mode

guanwang179 ▴ 10

@guanwang179-22258

Last seen 5.9 years ago

Hi,

It puzzles me in a recent RNAseq analysis that the PCA plot clustered the repeated measures for each human subject rather reflected the actual biological condition groupings. Two RNAseq pipelines have been adopted:

the standard pipeline: FastQC - HISAT - HTSeq - DESeq2 (all exploratory and results plots look fine)
the 2nd pipeline: FastQC - HISAT - bedtools (converting BAM files to FASTQs) - Salmon (using selective alignment and --seqBias --gcBias flags are on; alignment rate is on average around 75%) - Tximport - DESeq2 (this is where the problematic PCA plot was observed). The PCA plot was generated using the vsd data i.e. vsd <- vst(dds, blind=FALSE) and dds was obtained from DESeqDataSetFromTximport().

The same samples sequenced on a different sequencing platform have also been analysed using the 2nd pipeline above (alignment rate following Salmon is around 30-35%), all plots look fine. I must have done something incorrectly in the most recent analysis with the PCA in question. I tried to troubleshoot where the problem may lie, have not got any luck to figure it out. I'm just wondering if anyone may have similar experience and may provide me a heads-up. Many thanks.

Guan

salmon DESeq2 • 1.5k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe ★ 4.0k • written 5.9 years ago by guanwang179 ▴ 10

score 0 · Answer 1 · 2019-11-03

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 27 days ago

The Cave, 181 Longwood Avenue, Boston, …

You are claiming that there is a problem based solely on the PCA biplots that you have generated? There is not necessarily any problem - sample-specific effects can often be greater than your biological condition of interest. You should take a look at the percent explained variation on your PCs, primarily PC1 and PC2, in order to elucidate further what might be happening. Also, look at other PC bi-plot comparisons to check whether or not your condition of interest is segregated on a PC 'of lesser importance', such as PC8, PC10, or some other PC. You can check this via, for example, a pairsplot or eigencorplot from PCAtools (my own package):

Kevin

ADD COMMENT • link 5.9 years ago Kevin Blighe ★ 4.0k

1

Entering edit mode

Thanks Kevin. You are right that the sample-specific effects may be expected.I removed the subjects as batches using limma::removeBatchEffect() and re-plotted the PCA, which is now showing the actual biological effects (PC 1 44% vs PC2 10%).

Also thanks for introducing the PCAtools; in my toolkit.

ADD REPLY • link 5.9 years ago guanwang179 ▴ 10

0

Entering edit mode

Thanks Kevin. You are right that the sample-specific effects may be expected.I removed the subjects as batches using limma::removeBatchEffect() and re-plotted the PCA, which is now showing the actual biological effects (PC 1 44% vs PC2 10%).

Also thanks for introducing the PCAtools; in my toolkit.

ADD REPLY • link 5.9 years ago guanwang179 ▴ 10