I have been following the usage of prepDE.py script to convert stringtie quantifications to counts to input to DESeq2, exactly as described here: [http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#deseq]
We typically run Stringtie in the simple reference-based mode (using the –eB arguments, bypassing the merging and assembling steps, simply quantifying genes found in the reference). We then supply to prepDE.py the abundances output from this to get counts.
Over a series of analyses, performed on different data sets, different experiments, etc. I have consistently found the results to show very few, if any, differential expressed genes, and those that I do identify are of very small magnitude fold-change (no greater than 2 - 2.5 fold)
Typical experimental set ups comprise of a minimum or 3 replicates, bulk-tissue RNA-seq and these are in cell-lines or animal models. (human/mouse/rat). Inter-replicate variation (looking at individual gene fpkms or counts) does exist in some cases, whereas replicates are more tight in other cases.
While this is possible in the biological sense - most of these are discovery experiments so we don't have a "positive control" - I am concerned whether the analytical pipeline is responsible for what we see. (I am not entirely clear how prepDE works, to compute counts from coverage - could there be any over-normalizing, or ?? ).
I am curious if there are others who use this approach routinely and whether there are any other things to pay attention to, that I may not have considered.
Thanks for any inputs.