I am currently trying to conduct a differential expression analysis using the recount data. I want to compare the GTEx and TCGA data against a specific set of separate runs I have picked out.
My issue is with the targetSize argument in the scale_counts function. By default it is set 4e7 for all runs in the RSE object, but should it not be specific to each run since it represents the number of single-end mapped reads? I would expect the scaling should then be:
(mappedreadcount / (2 if paired_end == TRUE else 1)) /auc
where mappedreadcount is the column in colData(rse).
My questions is why is the targetSize argument constant across samples, should it not be specific to each sample?
P.S. I intend to use TMM normalization for differential expression analysis on the scaled counts afterwards, if that's relevant.