I would be surprised if the difference was due to library size alone. Normalization by library size (which the size factors from DESeq should do, amongst other things) should take care of library size differences between samples. In particular, the more sophisticated normalization methods (e.g., size factors in DESeq, TMM in edgeR) should only fail if the majority of genes are DE in your data set, or if you have many, many zeroes in your count matrix after abundance-based filtering. This is usually not the case for most analyses.
A more common explanation for non-biological differences between samples involves a batch effect due to separate processing. This means that your affected sample has read counts that are systematically different from the other samples - however, these differences are not consistent across genes and cannot be removed by scaling normalization with a global scaling factor. As a result, your sample will cluster separately from everyone else, even after normalization. Such batch effects are frequently introduced in genomics experiments, e.g., if you send two identical samples to different sequencing centers, or to the same sequencing center at different times, or for processing by different operators, or at different phases of the moon, etc. and will be present regardless of whether you downsample the library.
b.nota's suggestion of blocking on the batch is only relevant if you have multiple samples in each batch. If you only have one sample in a batch, then all information in that sample will be used to estimate the blocking term for the batch effect. This means that the sample will effectively be useless for estimating the variance, detecting DE, etc. However, you can't treat it as a direct replicate either, because the batch effect will inflate the estimated variance and distort all of your downstream statistics. You're stuck between a rock and a hard place here; it's a fundamental flaw in the experimental design that cannot be fixed at the analysis stage.
I don't know if your outlier sample is a biological replicate of other samples in your experiment? And I don't know exactly what your experimental design is, but probably adding a "batch effect" to your model would be useful.