Hi, I have two questions regarding proper usage of Combat-seq:
When using Combat-seq, should VST-transform be done before or after? My understanding is that the VST-transform flattens the mean-variance relationship, which would interfere with Combat-seq's assumption of a negative binomial distribution. So I think the proper order of operations is Combat-seq on raw counts first, then VST-transform the batch-corrected counts. Is this reasoning correct?
How can Combat-seq be used to correct for batch effects arising from stranded and unstranded RNA-seq data? DepMap has done this for CCLE transcriptomics data, but they perform this on log2(TPM+1) data, rather than raw counts. Is it valid to use log2(TPM+1) data instead of raw counts? Also, in their data there is no sample overlap between the two sets of stranded and unstranded RNA-seq data. This seems like a case where batch effect is confounded with potential biological differences. Is it possible to use Combat-seq at all, then?
Thanks!
In general, is it ever acceptable to use Combat-seq without providing biological covariates? Wouldn't this prevent the model from being able to distinguish between biological variation and batch variation?