Question: Looking for Batch effects in PCA
gravatar for K
8 months ago by
United States
K50 wrote:


I need some advice. I'm lookin at PCA plots of RNAseq data, and am understand whether my data has batch effects or not.  I performed alignment using STAR, and then obtained gene counts, and gene TPM values

I used ```prcomp``` to find the Principal components and plotted PC1 vs PC2 and PC2 vs PC3 for

  1. (a) Raw counts
  2. (b) Log2(counts + 1)
  3. (c) TPM values
  4. (d) Log2(TPM + 1)

I am showing the PCA plots below (These are links to images from google drive)

(a) PCA on Raw counts 

(b) PCA onLog2 (Count+1)  

(c) PCA on TPM values

(d) PCA on log2(TPM + 1)

It seems that there could be a batch effect, but I'm not a 100% sure, since I'm doing this for the first time.

- Can anyone provide advice on if this is really a batch effect ?  

- If there is a batch effect, could this be mitigated with  either using ComBat, or SVA , or adjusting in linear model ? 

Please advise.

Thank you !



batch effect pca • 324 views
ADD COMMENTlink written 8 months ago by K50

The batch effect is caused by differences when samples are sequenced separately. If this is your case you can correct the effect by using ComBat function from sva package. If the samples were sequenced together you may want to check for possible errors caused by different lanes. Also, verify your data in the previous steps. mapping, coverage, and multi-mapped-reads should be discarded. There is a series of details to take into account. Check Bioconductor vignettes for edgeR package.

ADD REPLYlink written 8 months ago by eegonzalezk0

Thank you for the feedback. I did infact use EdgeR for differential expression analysis of this data, and we got an unusually extremely large number of differentially expressed results. 

That is why I'm going back to the data, and the PCA plot, to see if there is a batch effect. I need help with interpretation of the PCA plots, to understand if the separation I see is large enough that it could be a batch effect. 

ADD REPLYlink written 8 months ago by K50

This may be a bit late, but if you still tackling the problem you could try guided PCA (a link to the vignette: for identifying batch effect. However, you must know which batch each sample is from to make it work.


ADD REPLYlink written 6 months ago by kentfung0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 434 users visited in the last hour