Our lab as cumulated many RNA-Seq experiments with the same cell line over the years. We want to take all the control experiments done to estimate the expression levels of genes in this cell line. Fastq files for all these 8 experiments were processed using Hisat2 and featureCounts to get gene counts. Replicates from each exp vary from 1 to 3. To complicate matters, half the exp were performed using ribodepletion and half using polyA enrichment.
Almost all documentation online is about differential expression. In this case, we simply want to estimate the mean expression levels of every genes (and also maybe the standard deviation to estimate how much it varies).
- Can we use polyA and ribodepletion experiments together? Literature suggests this should be avoided. If not, which one is best, polyA or ribo?
- What would be the best way to normalize this data? I imported the raw counts in DESEq2 and used the vst (or rlog) function. Should I use these normalized counts to compute the FPKM for each gene?
- When should I remove the batch effect from this data, before or after loading it in DESeq2?
Thanks!