Search
Question: Looking for Batch effects in PCA
0
3 months ago by
K50
United States
K50 wrote:

Hello,

I need some advice. I'm lookin at PCA plots of RNAseq data, and am understand whether my data has batch effects or not.  I performed alignment using STAR, and then obtained gene counts, and gene TPM values

I used prcomp to find the Principal components and plotted PC1 vs PC2 and PC2 vs PC3 for

1. (a) Raw counts
2. (b) Log2(counts + 1)
3. (c) TPM values
4. (d) Log2(TPM + 1)

I am showing the PCA plots below (These are links to images from google drive)

(a) PCA on Raw counts

(b) PCA onLog2 (Count+1)

(c) PCA on TPM values

(d) PCA on log2(TPM + 1)

It seems that there could be a batch effect, but I'm not a 100% sure, since I'm doing this for the first time.

- Can anyone provide advice on if this is really a batch effect ?

- If there is a batch effect, could this be mitigated with  either using ComBat, or SVA , or adjusting in linear model ?

Thank you !

K

written 3 months ago by K50

The batch effect is caused by differences when samples are sequenced separately. If this is your case you can correct the effect by using ComBat function from sva package. If the samples were sequenced together you may want to check for possible errors caused by different lanes. Also, verify your data in the previous steps. mapping, coverage, and multi-mapped-reads should be discarded. There is a series of details to take into account. Check Bioconductor vignettes for edgeR package.

Thank you for the feedback. I did infact use EdgeR for differential expression analysis of this data, and we got an unusually extremely large number of differentially expressed results.

That is why I'm going back to the data, and the PCA plot, to see if there is a batch effect. I need help with interpretation of the PCA plots, to understand if the separation I see is large enough that it could be a batch effect.

This may be a bit late, but if you still tackling the problem you could try guided PCA (a link to the vignette: https://cran.r-project.org/web/packages/gPCA/vignettes/gPCA.pdf) for identifying batch effect. However, you must know which batch each sample is from to make it work.