I have RNA seq data for six different treatments (A,B,C,D,E,F) of a model organism, with four-fold biological (NOT technical) replicates.
FASTQC revealed no abnormalites in the RNAseq data and after normalization (rlogtransformation) with DESeq2 I generated a PCA plot (using the 500 most variable genes).
Based on the PCA plot (see link: http://imgur.com/NVcWv5j) and a hierachical clustering (HC) analysis (not shown) I would think that the dots with a rectangle (1,2,3) can be considered as outliers and might be left out for further differential expression analysis (between treatments).
However, this is just based on visual inspection of the PCA/HC analysis. I was wondering if there is any objective metric to determine whether an RNAseq sample can be considered as an outlier (instead of just by visual inspection of PCA, like most papers do).
In a recent paper of Conesa et al 2016 (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8) they state the following:
"Reproducibility among technical replicates should be generally high (Spearman R2 > 0.9) , but no clear standard exists for biological replicates, as this depends on the heterogeneity of the experimental system."
So one might consider to include all replicates (incl. outliers) based on Conesa et al. 2016, but then you might end up with a lower number of diff. expressed genes between treatments...
Any advice/help regarding this topic would be much appreciated