I'm using the rlog function in the DESeq2 package and I notice a quirk in the transformed data that I do not know what to make of: for genes that are expressed in a small proportion of samples (say, for gene X, 10 samples have non-zero raw counts out of 300 samples), the transformed dataset has no zero values at all; instead, the majority of samples have some other value that is either negative or positive. Negative count doesn't make sense so I could, I suppose, deal with that by zeroing all counts less than 1 in the transformed dataset, but I don't know what to do about the cases where most samples have a positive value, say 3.5, and a small proportion have other higher values- it's as if the zero-level for that gene is shifted to a small positive number. This is seen only with genes expressed in a small proportion of samples, and the amount of shift, positive or negative, varies across genes. I notice the same with variance-stabilizing transformation and regardless of whether I set blind=FALSE or not.
Have others noticed this with their dataset? If so, how did you deal with it? I don't know how much of an impact this would have on the results of clustering-type exploratory analyses, but I am also not comfortable with seeing that a gene that should not be expressed at all in most samples has positive counts for all of them.