I have counts from Salmon that I have imported using Tximport for WT and KO samples. I want to use ERCC spike-in data for normalizing these counts. To do this, my strategy involves estimating size factors using only the spike-in data (function: Deseq2::estimateSizefactors) and using those normalize my counts (Mus musculus transcripts).
What I have done so far: step1: I used Salmon for quantification of both Mus Musculus and ERCC RNA simultaneously (using a concatenated cDNA file). I imported these counts into R using the package "Tximport".
step2: I made two ddsTxi objects, one with ERCC spike-in tx2gene file and the other with mouse Ensembl tx2gene file.
step3: I then used the Deseq2::estimateSizeFactors with the Spike-in_ddsTxi object to get size factors. However, I am unable to get sample-wise size factors:
sizeFactors(Spike-in_ddsTxi) = null
I am aware that
normalizationFactors(Spike-in_ddsTxi) gives me a matrix but I am not sure how to use this for normalization.
Can I please get advice on the following: Question1: Is my method above the correct way of going about normalization with spike-in data? Question2: If the answer to question 1 is no, then what method should I use? Question3: What is the difference between using estimateSizeFactors with Salmon data imported using Tximport vs some other count data e.g. from featureCounts? Question4: Lastly, what is the "control genes" parameter in estimateSizeFactors function? Is that what I am supposed to use?