A similar issue has already been discussed Single cells batch effects, but I want to make sure that I got it right because I also came across Ding et al who looks into how bulk-RNA normalization methods, including RUVr, work for single cell data. Ding et al conclude that, unless spike-ins are available, RUVr is the best choice among many bulk-RNA methods.
At the same time, as Aaron Lun pointed out, if the study purpose is subpopulation identification (clustering) then both "unwanted factors" and the factors that define the clusters are latent (unobserved) factors. RUVr has no way to tell them apart, and any of the latent factors can be eliminated. In that case, there is a gaping hole in Ding et al paper because they should not have considered RUVr at all.
The queerest part is that RUVr worked well in their study: "Qualitatively, RUVr normalization alleviates difference between scRNA-seq protocols and the clustering results are closer to the ground truth
than the other methods, i.e. the samples are clustered based on the source (HBR) and
the RNA amount (bulk, 100pg or 10pg). However, the UHR samples normalized with
RUVr were still clustered according to protocols rather than RNA amount. The other
four methods showed worse clustering results than RUVr because ..."
I think that, luckily, the variance explained by the nuisance protocol factor was so much higher than the variance explained by the factor of interest (RNA amount) that RUVr decided to remove only the former. However, had the factor of interest been more influential, RUVr would have backfired and removed it. Please let me know if I got it right.