Hello,
This is a question related to RNAseq expression and the need to extract biologically relevant results at single patient level. For context, we have a big number of brain tumours but due to the delicate structure of the brain and funding, we do not have the capacity for biological replicates. So far this has been fine for comparison at brain tumour cohort level, comparing the expression of multiple brain tumours with a certain classification to that of normal tissue and identifying significant results.
However, in the process of making informed decisions for single patients, we want to integrate along with DNA and histopathology data, gene expression data as well. The questions are:
Would it be of merit to run DESeq2 with 3 groups: A. SingleSample, B. All the rest tumours together (130-170 samples), C. Normals and extract results from contrast A/C. Would the dispersion calculated from the rest of the groups be good enough to consider the resulting p-values in the single sample? The 170 tumours are quite heterogeneous and do not cluster well in PCA so I am sceptical about the sensitivity of using all those together.
Run DESeq2 with 2 groups: A. Combined Tumours vs B. Normal. Get the PCA plot and use the closest-neighbours to identify the 3-4 closest samples to the single sample of interest then re-run Deseq2 with those 3-4 samples + single sample and design= ~1 to get dispersion estimate. Finally feed the newly calculated dispersion estimate to a single sample vs 4 normals comparison in order to get p-values. This is essentially a thought to mimic biological replicates based on similarity of expression profiles of Tumour samples.
Any other suggestions/discussions are very much welcome.
Thank you for your time and effort in this.
Thank you for your answer Michael Love!
I am naively making the assumption that the technical variation is not that strong as the samples and analysis are all from the same study. I thought that without biological replication I could still use the cohort level we have built to identify biological replicate-like tumours based on similarity of the most variable genes (and their classification e.g., would only look at closest neighbours that share the same Tumour Grade or mutational profile and adequate purity) that could approximate a replicate in order to generate some p-values but also balance sensitivity at this single sample case.
It is assumption over assumption over assumption all in an effort to create more informative results for individual patients that could be well off reality so at this stage we are just experimenting. Maybe we need to revisit our design.