Question

DESeq2: Comparison of single clinical sample to 4 normals using tumour cohort to infer dispersion of single sample

0

Entering edit mode

Manthos • 0

@e3bc7671

Last seen 4 months ago

United Kingdom

Hello,

This is a question related to RNAseq expression and the need to extract biologically relevant results at single patient level. For context, we have a big number of brain tumours but due to the delicate structure of the brain and funding, we do not have the capacity for biological replicates. So far this has been fine for comparison at brain tumour cohort level, comparing the expression of multiple brain tumours with a certain classification to that of normal tissue and identifying significant results.

However, in the process of making informed decisions for single patients, we want to integrate along with DNA and histopathology data, gene expression data as well. The questions are:

Would it be of merit to run DESeq2 with 3 groups: A. SingleSample, B. All the rest tumours together (130-170 samples), C. Normals and extract results from contrast A/C. Would the dispersion calculated from the rest of the groups be good enough to consider the resulting p-values in the single sample? The 170 tumours are quite heterogeneous and do not cluster well in PCA so I am sceptical about the sensitivity of using all those together.
Run DESeq2 with 2 groups: A. Combined Tumours vs B. Normal. Get the PCA plot and use the closest-neighbours to identify the 3-4 closest samples to the single sample of interest then re-run Deseq2 with those 3-4 samples + single sample and design= ~1 to get dispersion estimate. Finally feed the newly calculated dispersion estimate to a single sample vs 4 normals comparison in order to get p-values. This is essentially a thought to mimic biological replicates based on similarity of expression profiles of Tumour samples.

Any other suggestions/discussions are very much welcome.

Thank you for your time and effort in this.

GeneExpression DESeq2 RNASeqData RNASeq StatisticalMethod • 394 views

ADD COMMENT • link 5 months ago Manthos • 0

score 2 · Accepted Answer · 2023-11-21

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

DESeq2 method details:

DESeq2 would estimate dispersion using the groups of samples with replication.

"I am sceptical about the sensitivity of using all those together" Yes, this is also what we note in the FAQ of the vignette.

I'm skeptical about what you would get with null hypothesis testing of one sample vs a group, and with the confounding of technical and biological variation comparing these tumor samples to normals that were likely assayed in a separate study.

ADD COMMENT • link 5 months ago Michael Love 41k

0

Entering edit mode

Thank you for your answer Michael Love!

I am naively making the assumption that the technical variation is not that strong as the samples and analysis are all from the same study. I thought that without biological replication I could still use the cohort level we have built to identify biological replicate-like tumours based on similarity of the most variable genes (and their classification e.g., would only look at closest neighbours that share the same Tumour Grade or mutational profile and adequate purity) that could approximate a replicate in order to generate some p-values but also balance sensitivity at this single sample case.

It is assumption over assumption over assumption all in an effort to create more informative results for individual patients that could be well off reality so at this stage we are just experimenting. Maybe we need to revisit our design.

ADD REPLY • link 5 months ago Manthos • 0