I'd like to use the deseq2 rlog transformation in order to use the normalized matrix for pca and heatplots.
I understood that I should use the raw counts as input, but I'd like to understand how the transformation takes into account the different library size.
In particular, I'd like to use it with Kegg orthologs, so this is only a "subset" of my raw counts matrix, containing only genes that I could assign to KO, so the real library size for each sample was much bigger. Moreover, since each KO can belong to different metabolisms or pathways, rows in my files are repeated (same KO repeated for the n pathways it belongs to, with the same counts). So basically in my matrix the sum of the columns is not the library size.
Is it correct using this kind of matrix for rlog? Is it possible specify the "true" library size?