This is a question related to RNAseq expression and the need to extract biologically relevant results at single patient level. For context, we have a big number of brain tumours but due to the delicate structure of the brain and funding, we do not have the capacity for biological replicates. So far this has been fine for comparison at brain tumour cohort level, comparing the expression of multiple brain tumours with a certain classification to that of normal tissue and identifying significant results.
However, in the process of making informed decisions for single patients, we want to integrate along with DNA and histopathology data, gene expression data as well. The questions are:
Would it be of merit to run DESeq2 with 3 groups: A. SingleSample, B. All the rest tumours together (130-170 samples), C. Normals and extract results from contrast A/C. Would the dispersion calculated from the rest of the groups be good enough to consider the resulting p-values in the single sample? The 170 tumours are quite heterogeneous and do not cluster well in PCA so I am sceptical about the sensitivity of using all those together.
Run DESeq2 with 2 groups: A. Combined Tumours vs B. Normal. Get the PCA plot and use the closest-neighbours to identify the 3-4 closest samples to the single sample of interest then re-run Deseq2 with those 3-4 samples + single sample and design= ~1 to get dispersion estimate. Finally feed the newly calculated dispersion estimate to a single sample vs 4 normals comparison in order to get p-values. This is essentially a thought to mimic biological replicates based on similarity of expression profiles of Tumour samples.
Any other suggestions/discussions are very much welcome.
Thank you for your time and effort in this.