Edit: it would help to understand if this is your entire dataset (n=7)? - sometimes people only post a small example. As per James's answer, `tech2`

is nested in `Treatment`

. For all intents and purposes, assuming n=7, you could simply drop `tech2`

and proceed with just `tech1`

's samples. There is probably some other approach whereby you could find 'anchor' genes whose expression is constant across `tech1`

's samples, and then use these to adjust the data for `tech2`

, but this is difficult given a sample size of 5 and 2 (for `tech1`

and `tech2`

respectively).

## ------------

Hey tim.meese,

There have been many questions on this topic across various web-forums over the years. It would have helped to have seen your PCA bi-plot and to understand the amount of variation contributed by the PC(s) (principal component(s)) along which you are observing a batch difference.

Nevertheless, generally, for dealing with batch effects, the broadly-accepted procedure is to *not* directly modify your input raw counts in order to adjust for batch differences. Instead, one can include `batch`

in the design formula, which means that one is simply modeling and adjusting for the effect size of `batch`

. In your case, you would need:

```
~ lab_tech + treatment
```

Then, when you derive test statistics for `treatment`

, these (the statistical inferences) will be adjusted for the effects of `lab_tech`

.

## --------

If, later, you then wish to directly modify your transformed (rlog, vst, or logCPM) expression levels for downstream analyses like clustering, PCA, 'machine learning' stuff, *et cetera*, you could use `limma::removeBatchEffect()`

to model and directly eliminate the batch effect in these. See Why after VST are there still batches in the PCA plot?.

Other related posts:

Kevin

It's one of those rare times where the fix for a complication is simpler than you first think it will be.