I have an RNA seq dataset obtained from mRNA ribosomal pulldown (TRAP) from a mutated and control cell line. I also have input samples for each of these. Due to the nature of TRAP, these samples are strongly enriched in mRNA, and the inputs are similar to bulk RNA sequencing, with a more complex library. Each sample was spiked with sequins (http://sequins.xyz) from mixture A or B, which I want to use to normalise my data. I have additional potential sources of batch effect including lane and differentiation day. How should I decide what to include in my final DESeq GLM?
What is the best way of checking what the greatest batch effect on this kind of data is? So far I have not been able to obtain a nice correlation of the observed LFC with expected LFC for the sequins. I have tried only using the TRAP samples in the DESeq object in case the samples are too different to analyse together, but this does not fix the problem either.
Many thanks, Chaitra