When clear clusters aren't formed with PCA, should I revert to tSNE? How many data points (row X columns) are needed at a minimum for tSNE to work? Is tSNE better suited for single-cell RNA-seq? And PCA better suited for whole-tissue/bulk RNA-seq? In GWAS we usually use PCA, but I guess genotypes have a more linear distribution compared to gene expression. May it be better, for that reason, to use tSNE with RNA-seq anytime?
I think there are some clear use cases for t-SNE, for example within a clustering algorithm, but from my testing and that of others, I think it can potentially lead you astray a bit, and so I recommend PCA plot for general purpose bulk RNA-seq EDA (exploratory data analysis). I'm interested in what methods are developed for factor analysis of scRNA-seq, particularly ZINB-WaVE (Bioc).
A little more on why I prefer PCA for bulk RNA-seq EDA: with some simulations, I have seen t-SNE generate artificial structure (though this may be due to a since-fixed bug in one of the R pkgs), and also t-SNE can "snap" groups apart farther than what represents the data generating mechanism (which I know because I simulated the data). I've been told by t-SNE experts that for both issues, parameters can be optimized such that the artifacts or snapping are minimized, or that PCA should be first applied and the right number of top dimensions passed to t-SNE. Also, I've been told that the "snapping" I observe is a known consequence of the method. Critically I don't think that biologists or investigators (who we like to share our dimension reduction plots with) are aware of what caveats are needed to interpret the t-SNE plots, and that the large scale structure or cluster separation distances should not be interpreted as representative of anything meaningful in the data. The nice thing about PCA (or MDS) is its simplicity, and that, if the groups are overlapping or separated, this is clear from the plot. With PCA, you can compare the inter-cluster distances relative to the intra-cluster distances, and get a sense of their ratio. If the groups are in a particular arrangement, you can look up the loadings to understand why this is the case. The first link is really informative about these qualities with respect to t-SNE, see sections 2 and 3:
Mike and Davide have already covered the issue really well. I would just like to stress a couple of points in follow up. I have found t-SNE to be extremely sensitive to the dimensionality of reduction, the perplexity and the number of iterations run. Mike already posted this, where the authors (Wattenberg et al) present a very nice explanation of the issues with t-SNE and how best to control for them. I have seen many people use the default setting in t-sne package and report the results, but I would recommend using multiple options for the input arguments.
Even with this adjustment, my take from running on different types of data simulations is that the first two PCs/MDS dimensions capture the global structure in the better than t-SNE (on 2 dims) while the local patterns better preserved in the latter. This is sort of intuitive if one thinks about how the algorithms work for the two cases. Also, the clustering from PCA is more believable to me purely because of the statistical understanding of the eigen-spaces behind it.
Lastly, to talk about a personal experience, for a paper of mine, I ran t-SNE on bulk RNA-seq data from GTEx tissues ( first figure here) and my first reaction was of awe at how well t-sne captures the clusters. But on closer look, I did see a few discrepancies, for instance, Liver samples forming two groups, which on follow up analysis, we did not see find this grouping to be meaningful.