How to ellipse a subset of data PCA biplot?
1
0
Entering edit mode
mschmidt ▴ 10
@mschmidt-18923
Last seen 9 weeks ago
Poznan

I explore expression data from multiple tissues grouped into three sets (colored red, green, and blue). How to use ellipse for only 1, 2 or subset of tissues based on metadata (columns "set", "tissue"). I succeed to ellipse e.g. a whole set1 or set2 but I need to preserve colors of sets but ellipse just selected tissue(s). The tissues belong to the same set or different ones.

pcaData <- pca(txcount, metadata = designexp)
biplot(pcaData,
       colby = "set", 
       colkey = c("set1" = "blue","set2" = "green","set3" = "red"),
       # ellipse config
       ellipse = TRUE,
       ellipseConf = 0.95,
       ellipseFill = TRUE,
       ellipseAlpha = 1/4,
       ellipseLineSize = 0,
       ellipseFillKey = c("tissue1"="#FEE5D9","tissue5"="#99000D"),
       xlim = c(-60,20), ylim = c(-30,40),
       hline = 0, vline = 0,
       legendPosition = 'right')

enter image description here

sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS  10.16
biplot PCA RNASeq PCAtools • 699 views
ADD COMMENT
0
Entering edit mode
@kevin
Last seen 30 minutes ago
Republic of Ireland

Hi, I developed this package. There is no way to colour by one group (colby) and then generate an ellipse for another. The ellipse functionality is intrinsically tied to colby.

I may suggest that you use colby for tissue, and then use shape to have different shapes for set?

Kevin

ADD COMMENT
0
Entering edit mode

OK, I understand. That's why I failed at all my attempts to this task. I believe such visualization will solve my scientific problem so I am pretty much fixed to it. I am not fluent with R, so could you give me a hint for two other scenarios for that task: (1) the data contains biological replicates. In order to reduce number of dots at biplot I could average the biological replicates of the same tissue into one. At what stage should I do this? Raw expression data, normalized expression data or on principal components? ...and how? (2) generate two separate biplots and overlap them. One the oryginal with all the dots, and the other with subset of data (e.g. only ellipses visible). How to overlap two plots and how to extract subset from PCA object?

Could you possibly help me with that?

Best regards, Marcin

ADD REPLY
0
Entering edit mode

(1) the data contains biological replicates. In order to reduce number of dots at biplot I could average the biological replicates of the same tissue into one. At what stage should I do this?

I have not seen other users doing this. I would only average across technical replicates, not biological replicates.

(2) generate two separate biplots and overlap them. One the oryginal with all the dots, and the other with subset of data (e.g. only ellipses visible). How to overlap two plots and how to extract subset from PCA object?

Why not generate 2 plots side-by-side?

ADD REPLY
0
Entering edit mode

2 plots side-by-side I have already learned to make

require(gridExtra)
plot1 <- biplot(pcaData1)
plot2 <- biplot(pcaData2)
grid.arrange(plot1, plot2, ncol=2)

but still try to subset PCA data, I tried

biplot(subset(pcaData, pcaData[["metadata"]][,7] %in% c("tissue1", "tissue5"))

the tissue column is 7th in metadata but this returns an error

Error in round(pcaobj$variance[x], digits = 2) : 
  non-numeric argument to mathematical function
ADD REPLY
0
Entering edit mode

If you would like to hide one tissue's ellipse in the plot, it may be better to set it's colour to NULL, I think, or NA. Have you tried that?

ADD REPLY
0
Entering edit mode

I tried similar approach - used white with max transparency (alpha = 0) for color generation, and set it for ellipses I do not want to be drawn. But it needs some adjustments as it dims datapoints.

ADD REPLY
1
Entering edit mode

It seems that you just need to select NA as the colour mapping:

  biplot(p,
    colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
    # ellipse config
      ellipse = TRUE,
      ellipseConf = 0.95,
      ellipseFill = TRUE,
      ellipseAlpha = 1/4,
      ellipseLineSize = 0,
      ellipseFillKey = c('ER+' = 'yellow', 'ER-' = NA),
    xlim = c(-125,125), ylim = c(-50, 80),
    hline = 0, vline = c(-25, 0, 25),
    legendPosition = 'top', legendLabSize = 16, legendIconSize = 8.0)

kk

ADD REPLY
1
Entering edit mode

NA works in my case as well. Thanks!

ADD REPLY
0
Entering edit mode

NULL produce error

ADD REPLY
0
Entering edit mode

Hi Kevin, I wanted to try ggbiplot library to try making ellipses for subset of data, however it gives an error

> ggbiplot(pcaData)
Error in ggbiplot(pcaData) : 
  Expected a object of class prcomp, princomp, PCA, or lda

I found in PCAtools documentation that pca() creates object of class 'pca' so it should work (?!) How pcaData <- pca(rloggedtxcounts) can be transformed to ggbiplot usable data or subset ? Can you help? I want to stay with pcaData generated with your package as it transforms the data in the best way for my hypothesis (as compared to DESeq2 PCA).

ADD REPLY
1
Entering edit mode

Hi, ggbiplot is not part of Bioconductor.

ADD REPLY
0
Entering edit mode

???...

ADD REPLY
0
Entering edit mode

This forum is for Bioconductor packages.

ADD REPLY
0
Entering edit mode

Is biostars.org general?

ADD REPLY
0
Entering edit mode

Yes, Biostars is more general (I am also moderator there), and also Bioinformatics StackExchange

ADD REPLY

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6