Search
Question: Most Efficient Application of Voom to Regularly Increasing Study
0
15 months ago by
Dario Strbenac1.4k
Australia
Dario Strbenac1.4k wrote:

What is the most efficient usage of voom in a scenario where more batches are arriving at some time in the future? If voom is applied to each batch individually, then, due to filtering of each count matrix on CPM, the resulting normalised value tables of each batch contain a different number of rows and the combined matrix has some NA entries which is problematic because batch correction methods need a complete matrix. For example:

      Batch 1                    Batch 2
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Gene 1  8.75       9.09     8.31     7.89     7.99     7.90
Gene 2  8.55       9.01     8.77     7.99     7.98     8.00
Gene 3    NA         NA       NA     1.00     9.00    10.00

Alternatively, if I provide the combined matrix of counts as the input to voom, Gene 3 (filtered due to low CPM) will no longer have all NA for Batch 1, but when Batch 3 arrives in the future, the values in the entire matrix will change, which may cause unease with the biologists. It there a third approach which I haven't listed as an option which is better?

modified 15 months ago by Aaron Lun20k • written 15 months ago by Dario Strbenac1.4k
1
15 months ago by
Aaron Lun20k
Cambridge, United Kingdom
Aaron Lun20k wrote:

The downstream results of a DE analysis would change anyway, due to changes in the normalization factors, precision weights, average expression values, variance/coefficient estimates, etc. when you incorporate new data into the model. I don't think that your collaborators would expect the results to stay exactly the same when you put new data into the analysis; otherwise, what's the point of collecting more data?

In my experience, they'll be happy if their qualitative conclusions are unaffected. On the flip side, if their favourite gene disappears from the DE list upon adding more data, you can expect to be grilled about it.