Hi folks, I know that to "run edgeR without replication" has been a popular question, but I'd like to specify the situation I'm in.
I'd like to compare DGE from pair tumor/normal samples collected from multiple patients. Uniquelly, I'm less interested to generalize a pattern for the patient population but to focus on each patient. The DGE from each patient is to validate the MSMS data obtained from the same tissues.
Therefore I don't have a typical replicates. However, I have data from multiple patients. I wonder if I can take full advantage of the sample size, to generate a universal tagwise dispersion that I could export and use for the comparison between individual samples, to hopefully generate a p value.
So my questions would be:
Is it statistically meaningful to do so? If yes, should all normal samples be used for global tagwise dispersion be calculated or should I include the tumor samples?
Codewise, how am I suppose to export and inport dispersion database?
Thank you very much. Field
Thank you so much professor, for the program and as well the effort to help.
Hi Professor Smyth,
Thank you for pointing me to your previous publication. I went through the verbal part and found you did pursue the DGE for each single patient. That's exactly what I'd like to analyze.
I noticed that the example section 4.1 of the manual closely resembles the paper. The section 4.1 has been concise and easy to follow. However, the DGE-for-individual function has not been demonstrated. Unfortunately, I understand neither statistics nor algorithm deep enough to figure out myself.
I wonder if you could kindly draft an update section for that application. I'm sure it would be helpful to many other researchers as well.
Thank you anyway. Best, Field
You need to start by doing an ordinary paired analysis.
Hi Professor Smyth,
I would assume that's the analysis demonstrated within user guide, example 4.1. If so, I have performed.
Taking the published data as example, I followed the instructions and got the following results
That's identical to the result shown in the demonstration. I believe that's the overall trend for DGE but not personalized.
For example, if you look at the gene TPCN2, it is overexpressed in patient 8 tumor (16.56 folds, read counts 3130 vs 189), and slightly under-expressed in patient 33 tumor (0.84 fold, read counts 256 vs 304) and patient 51 tumor (0.62 fold read counts 496 vs 800).
So from the ordinary test, the gene TPCN2 would appear insignificant anyway. However, to patient 8, it may play a role in his/her personalized medicine.
I wonder how I can perform individualized test with edgeR but still use the genewise BCV I've acquired from the multiple normal tissues I have.
Thank you. Field
You simply continue on from the same analysis. You can have already estimated the dispersions from the paired analysis:
Now make a new design matrix that has patient-specific tissue effects:
Fit glms using the same dispersions as before:
Now you can conduct tests for each patient:
Hi Professor Smyth,
Thank you very much for the code. That has solved my problem. All the best. Field