**10**wrote:

Hi all, hi Michael,

we want to build receiver operating characteristics for patients from two separate cohorts (using gene expression from sequencing as predictor), and I was thinking about how to compare them in this context. I did vst() (`varianceStablilizingTransformation()`

) on both of them and tried to use one dataset as training set ("`pred_data`

") and the other set as the test set ("`test_data`

") in the generation of the ROC, like so (count is the variance stabilized count):

```
fit_glm <- glm(event ~ count, pred_data, family=binomial(link="logit"))
glm_link_scores <- predict(fit_glm, test_data, type="link")
glm_response_scores <- predict(fit_glm, test_data, type="response")
```

I chose the most highly DE gene in both datasets as the first example. However, the vst() output of data set 1 has slighty higher stabilized counts than the second one for this particular gene (median around 20 vs median around 16). Thus, the linear model based on data set 1 seems to be non-applicable to data set 2 and because of that the ROC curve is nonsense.

This is a scatter plot of the stabilized counts of the example gene in both data sets. Each dot represents an individual patient, the color is event/no event. I imagine that for a valid comparison, I would have to get these two distributions to approximately the same mean and sd. Adding pseudocounts to the log-transform does not really impact these highly expressed genes. I also think that a per-gene approach would be more appropriate than a correction of the whole set, but that might be totally wrong.

Now I believe that using TPM values for comparing between samples is not a good approach, correct me if I'm wrong. The question is, can I normalize or scale the vst() counts to get a usable distribution around the same mean or median for these two datasets, and how would I robustly do that? (Ie, "pull up" the stabilized counts of set 2 to the levels of set 1.)

Thanks in advance!

Sebastian