I'm new to the field, and I'm trying to do a differential gene expression on the GTex dataset. My aim is to identify sets of genes which (with some confidence) identify each of the 50 odd tissue types in the said dataset. The dataset is (bulk) RNA-seq ~50k genes and ~12k samples. The resource I have at hand has ~50 CPU, each with 12 cores and plenty of RAM.
1) browsed through the DESeq2 vignettes and I feel it may be a good fit.
2) Removed housekeeping genes, in the hope that it makes the task of the software a little easier. 3) Put the code to run
I was wondering if
1) My choice of algorithm is advisable, and
2) anyone has an estimate of how much time it may take the code to run
I'd be glad to give more details, if you need it.
Thanks for reading through. :-)