Just divide the size factors for the affected cells by 2.
This is most obviously valid when no library quantification is performed, i.e., you did not force each cell to contribute equal amounts of cDNA prior to multiplexing. In this case, twice as much spike-in RNA should result in twice as much spike-in coverage in the affected cells, and thus size factors that are twice as large. Dividing these size factors by two will then bring everything back to the same scale.
If you did do library quantification, then the reasoning becomes more complicated, as twice as much spike-in RNA will not lead to size factors that are twice as large (due to composition effects). Nonetheless, division is still valid here as the composition effects affect both the spike-in RNA and the endogenous genes. This means that they cancel out upon normalization; the ultimate effect of having twice as much spike-in RNA would be to halve the normalized expression of the endogenous genes. You can again fix this by dividing the size factors by 2 before normalization.
That being said; seeing different amounts of spike-ins in a dataset is usually a red flag for me, as it is symptomatic of other experimental factors being different between these cells and the others (that your collaborators have not told you about). In such cases, it is likely that you would have to do batch correction anyway, e.g., with removeBatchEffect()
or even better mnnCorrect()
.
Also, if you are planning to use trendVar()
on the log-normalized values, I would strongly advise you to run it separately on the cells with different amounts of spike-in, and then combine the results in the end with combineVar()
. This is because the technical mean-variance trend will fundamentally differ between the cells with 1x and 2x spike-in (the latter will have the trend shifted to the right), making it impossible to estimate a sensible trend from a data set where they are combined.
That's good, but for future reference, post replies with "Add comment" rather than "Add answer".