Looking for Batch effects in PCA
0
0
Entering edit mode
KB ▴ 50
@k-8495
Last seen 15 months ago
United States

Hello,

I need some advice. I'm lookin at PCA plots of RNAseq data, and am understand whether my data has batch effects or not.  I performed alignment using STAR, and then obtained gene counts, and gene TPM values

I used ```prcomp``` to find the Principal components and plotted PC1 vs PC2 and PC2 vs PC3 for

  1. (a) Raw counts
  2. (b) Log2(counts + 1)
  3. (c) TPM values
  4. (d) Log2(TPM + 1)

I am showing the PCA plots below (These are links to images from google drive)

(a) PCA on Raw counts 

(b) PCA onLog2 (Count+1)  

(c) PCA on TPM values

(d) PCA on log2(TPM + 1)

It seems that there could be a batch effect, but I'm not a 100% sure, since I'm doing this for the first time.

- Can anyone provide advice on if this is really a batch effect ?  

- If there is a batch effect, could this be mitigated with  either using ComBat, or SVA , or adjusting in linear model ? 

Please advise.

Thank you !

K

 

batch effect pca • 3.4k views
ADD COMMENT
1
Entering edit mode

This may be a bit late, but if you still tackling the problem you could try guided PCA (a link to the vignette: https://cran.r-project.org/web/packages/gPCA/vignettes/gPCA.pdf) for identifying batch effect. However, you must know which batch each sample is from to make it work.

 

ADD REPLY
0
Entering edit mode

Late response, but Thank you so much. The guided PCA package looks very interesting ! I will check it out.

I ended up doing a strong filtering of features with low gene counts before running edgeR , and this helped avoid the issue.

ADD REPLY
0
Entering edit mode

The batch effect is caused by differences when samples are sequenced separately. If this is your case you can correct the effect by using ComBat function from sva package. If the samples were sequenced together you may want to check for possible errors caused by different lanes. Also, verify your data in the previous steps. mapping, coverage, and multi-mapped-reads should be discarded. There is a series of details to take into account. Check Bioconductor vignettes for edgeR package.

ADD REPLY
0
Entering edit mode

Thank you for the feedback. I did infact use EdgeR for differential expression analysis of this data, and we got an unusually extremely large number of differentially expressed results. 

That is why I'm going back to the data, and the PCA plot, to see if there is a batch effect. I need help with interpretation of the PCA plots, to understand if the separation I see is large enough that it could be a batch effect. 

ADD REPLY

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6