Our lab has tried to adopt the Pertea et al protocol (https://www.nature.com/articles/nprot.2016.095]) to evaluate Differential Gene Expression in mouse (one single tissue)- the set up is with 3 or 4 replicates per treatment, comparing treated vs control.
We have observed a few issues we are unable to resolve/explain easily. I am wondering if others may know or have seen something similar.
1) Ballgown - libadjust option. This is supposed to be left on by default to adjust for library size. We find some really abnormal FC values when we set libadjust=TRUE for one set of samples - with FCs reported in several 1000 fold in some cases. (there is not much variation in read count - average of 40-50M reads per sample). These weird magnitude FCs go off when we turn libadjust=FALSE. We do not see this with a separate data set, with similar sample numbers and read counts, hence it makes us wonder.
2) Gene level vs Transcript level. Having deferred to libadjust off option for both sets of data given above observations, we see huge differences in the # of results for Differential Expression based on gene-level vs transcript-level. In our experimental set up, we do not expect tremendous transcript level perturbations in treatment vs control.
We are unable to assign confidence in the transcript level results. Upon visual examination of the bam (bigwig) files, at some of the loci showing D.T.E with the largest magnitude of FC, we do not see any difference in the reads aligning at the various splice junctions. E.g. below