Interpreting ellipses in PCAtools for bulk RNAseq
2
0
Entering edit mode
@97dfc144
Last seen 14 months ago
United States

I had a quick question about using ellipses in PCAtools on bulk RNA-seq data. On my data, I can make a plot using the code below:

biplot(p,
lab = NULL,
legendPosition = 'right',
colby = 'genotype',
colLegendTitle = 'Genotype',
# ellipse config
ellipse = TRUE,
ellipseLevel = 0.95,
ellipseFill = TRUE,
ellipseAlpha = 1/4,
ellipseLineSize = 0)


As you can see, the WT group is circled and the CRE group has 2 lines drawn around it, rather than another circle, which I'm not sure how to interpret. Does this mean, that at the 95% confidence level, the CRE group isn't really one group?

For example, when I change ellipseLevel = 0.75, here's the output:

Since now all the points in the CRE group fall under the same circle, can I interpret that as I can be 75% confident all the CRE samples are statistically similar, or something like that? Just want to make sure I'm interpreting this correctly. Thank you!

ellipse RNAseq PCAtools • 1.0k views
2
Entering edit mode
@kevin
Last seen 1 hour ago
Republic of Ireland

Hey Elizabeth,

Which version of PCAtools are you using? A new version was recently released along with the new version of Bioconductor.

The ellipse functionality is taken from stat_ellipse(): https://ggplot2.tidyverse.org/reference/stat_ellipse.html

The relevant parameters in the new version of PCAtools are listed here: https://github.com/kevinblighe/PCAtools/blob/master/R/biplot.R#L75-L98

So, the default is the 't' distribution, but you can change this to 'norm' or 'euclid' via ellipseType. The ellipseLevel parameter, for ellipseType 't' and 'norm', relates to the confidence interval. So, 0.95 is an ellipse drawn at 95% confidence.

So, in your plots, I would be more comfortable with the first plot, but, to visualise it better, you need to extend the x and y axes via the xlim and ylim parameters in order to make the ellipse render properly. Apologies about this minor issue.

One can already make the interpretation that, at the 95% confidence level, both your groups' samples are grouping together as expected (they are each comprised within a single ellipse).

Kevin

0
Entering edit mode

Hi Kevin,

Thanks for the information about the parameters. I am using the development version of PCAtools (the one from your "kevinblighe/PCAtools"). And thanks for the tip about extending the graph! Here's what it looks like with ellipseLevel set at 0.75 and the limits extended:

I'm a little confused on what you meant by "both your groups' samples are grouping together as expected (they are each comprised within a single ellipse)" at the 95% confidence interval, because in the first graph in my original post (the one where ellipseLevel = 0.95), the dots in the CRE group are not within a single ellipse. Would it be more correct to say that they're within a single ellipse at the 75% confidence interval? (Obviously that's not ideal, I'd rather they were all within an ellipse at the 95% CI!). Sorry if I'm misunderstanding something, this is all pretty new to me. Thank you for your patience!

0
Entering edit mode

Ah, if you generate the same plot [with extended axes] but using ellipseLevel = 0.95, do the 4 CRE samples fall within their own pink/red ellipse?

In the first plot in your original question, there is a rendering issue, but there is just a single pink/red ellipse there, one that will [I assume] easily comprise all CRE samples when the axes are extended.

Or, maybe you want a single ellipse but for all samples (CRE+WT)? - to do this, you will have to choose a metadata variable for colby that is the same across all samples, i.e., there is no direct way to generate an ellipse for all samples.

0
Entering edit mode

Ah sorry about that, it looks correct to me so not sure how to show you the real graph - I've attached it again below, hopefully it works. In case it doesn't- so when I set ellipseLevel = 0.95, there is a single ellipse that cover the WT samples as expected, but for the CRE, it looks like there's 2 parallel lines (which I'm assuming are just very skinny ellipses) flanking both sides of the dots. So, none of the CRE dots actually fall under an ellipsis, since they're all in between 2 separate ones (which I'm assuming isn't good). Ideally, I would expect 2 clusters (one per genotype, with all the samples falling in the cluster of their proper genotype). In the graph below, I'm curious how to interpret just like the samples falling in between 2 ellipses (but not actually inside the ellipses) - is the only thing to really say is that, they don't cluster well (at least when assessing at a 95% confidence interval)? Thanks again!

1
Entering edit mode

Hi, no, there should be just one ellipse. I think that you need to do, for example:

biplot(p,
lab = NULL,
legendPosition = 'right',
colby = 'genotype',
colLegendTitle = 'Genotype',
xlim=c(-100,100),
ylim=c(-100,100),
# ellipse config
ellipse = TRUE,
ellipseLevel = 0.95,
ellipseFill = TRUE,
ellipseAlpha = 1/4,
ellipseLineSize = 0)

0
Entering edit mode

Thank you so much for your help- when I set the xlim and ylim bigger, all the CRE samples then started to fall under one ellipse at ellipseLevel = 0.95. I think with the smaller/default limits, it was too zoomed in so it was plotted as if they weren't? Or something like that- anyway, thanks!

1
Entering edit mode

haha, yes, sorry about that. It is difficult to automatically code the limits for xlim and ylim so that the ellipses render properly.