Feature selection across batches in simpleSingleCell "Correcting batch effects" vignette
1
0
Entering edit mode
@angelos-armen-21507
Last seen 4 months ago
United Kingdom

In the Feature selection across batches section of the simpleSingleCell Correcting batch effects vignette, genes with positive average (across batches) biological variance are selected. What is the reasoning behind that? Why aren't genes with positive biological variance in any batch selected instead?

simpleSingleCell • 840 views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

Consider genes for which the null hypothesis is true, i.e., there is no biological variability such that the total variance is equal to the technical component determined by the mean-variance trend. The estimate of the variance, however, will fluctuate around the true value, meaning that this gene will have a positive biological component ~50% of the time.

For an analysis of a single batch, that's fine - retaining some of these uninteresting genes is part of the cost we have to pay for retaining as much biological signal as possible. However, this adds up prohibitively for multiple batches. For example, if we had 3 batches, a null gene would get a positive biological component in at least one batch ~90% of the time. Eventually, if we had enough batches, every gene would get a positive biological component just by chance and be retained.

Taking the average biological component aims to mitigate this effect. If a gene is genuinely highly variable in at least one batch, it will have a high biological component in that batch. Then, the chances are good that the average biological component will be positive and the gene will be retained. However, if a gene is null in all batches, the average component is still only likely to be positive ~50% of the time, so no harm done.

ADD COMMENT

Login before adding your answer.

Traffic: 603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6