Looking for Batch effects in PCA
Entering edit mode
K ▴ 50
Last seen 4.1 years ago
United States


I need some advice. I'm lookin at PCA plots of RNAseq data, and am understand whether my data has batch effects or not.  I performed alignment using STAR, and then obtained gene counts, and gene TPM values

I used ```prcomp``` to find the Principal components and plotted PC1 vs PC2 and PC2 vs PC3 for

  1. (a) Raw counts
  2. (b) Log2(counts + 1)
  3. (c) TPM values
  4. (d) Log2(TPM + 1)

I am showing the PCA plots below (These are links to images from google drive)

(a) PCA on Raw counts 

(b) PCA onLog2 (Count+1)  

(c) PCA on TPM values

(d) PCA on log2(TPM + 1)

It seems that there could be a batch effect, but I'm not a 100% sure, since I'm doing this for the first time.

- Can anyone provide advice on if this is really a batch effect ?  

- If there is a batch effect, could this be mitigated with  either using ComBat, or SVA , or adjusting in linear model ? 

Please advise.

Thank you !



batch effect pca • 2.1k views
Entering edit mode

The batch effect is caused by differences when samples are sequenced separately. If this is your case you can correct the effect by using ComBat function from sva package. If the samples were sequenced together you may want to check for possible errors caused by different lanes. Also, verify your data in the previous steps. mapping, coverage, and multi-mapped-reads should be discarded. There is a series of details to take into account. Check Bioconductor vignettes for edgeR package.

Entering edit mode

Thank you for the feedback. I did infact use EdgeR for differential expression analysis of this data, and we got an unusually extremely large number of differentially expressed results. 

That is why I'm going back to the data, and the PCA plot, to see if there is a batch effect. I need help with interpretation of the PCA plots, to understand if the separation I see is large enough that it could be a batch effect. 

Entering edit mode

This may be a bit late, but if you still tackling the problem you could try guided PCA (a link to the vignette: https://cran.r-project.org/web/packages/gPCA/vignettes/gPCA.pdf) for identifying batch effect. However, you must know which batch each sample is from to make it work.



Login before adding your answer.

Traffic: 277 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6