Question

EdgeR, tumor normal pair, no replicatie but multiple patients, to focus on individual patients not combined

0

Entering edit mode

Field -Ye • 0

@b4aec241

Last seen 4.0 years ago

Hong Kong

Hi folks, I know that to "run edgeR without replication" has been a popular question, but I'd like to specify the situation I'm in.

I'd like to compare DGE from pair tumor/normal samples collected from multiple patients. Uniquelly, I'm less interested to generalize a pattern for the patient population but to focus on each patient. The DGE from each patient is to validate the MSMS data obtained from the same tissues.

Therefore I don't have a typical replicates. However, I have data from multiple patients. I wonder if I can take full advantage of the sample size, to generate a universal tagwise dispersion that I could export and use for the comparison between individual samples, to hopefully generate a p value.

So my questions would be:

Is it statistically meaningful to do so？ If yes, should all normal samples be used for global tagwise dispersion be calculated or should I include the tumor samples?
Codewise, how am I suppose to export and inport dispersion database?

Thank you very much. Field

edgeR StatisticalMethod • 2.1k views

ADD COMMENT • link 4.2 years ago • updated 4.1 years ago Field -Ye • 0

score 2 · Accepted Answer · 2021-09-15

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

We actually did this in one of the early edgeR papers: see the Oral squamous cell carcinoma analysis in https://doi.org/10.1093/nar/gks042

ADD COMMENT • link 4.2 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you so much professor, for the program and as well the effort to help.

ADD REPLY • link 4.2 years ago Field -Ye • 0

0

Entering edit mode

Hi Professor Smyth,

Thank you for pointing me to your previous publication. I went through the verbal part and found you did pursue the DGE for each single patient. That's exactly what I'd like to analyze.

I noticed that the example section 4.1 of the manual closely resembles the paper. The section 4.1 has been concise and easy to follow. However, the DGE-for-individual function has not been demonstrated. Unfortunately, I understand neither statistics nor algorithm deep enough to figure out myself.

I wonder if you could kindly draft an update section for that application. I'm sure it would be helpful to many other researchers as well.

Thank you anyway. Best, Field

ADD REPLY • link 4.1 years ago Field -Ye • 0

0

Entering edit mode

You need to start by doing an ordinary paired analysis.

ADD REPLY • link 4.1 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Professor Smyth,

I would assume that's the analysis demonstrated within user guide, example 4.1. If so, I have performed.

Taking the published data as example, I followed the instructions and got the following results

RefseqID    Symbol  Exon    logFC   logCPM  LR  PValue  FDR
5024    NM_001039585    PTGFR   4   -5.175298384    4.751601536 100.3059967 1.31E-23    1.38E-19
3086    NM_198966   PTHLH   4   3.901903335 5.761509802 84.55385089 3.74E-20    1.97E-16
4653    NM_007168   ABCA8   38  -3.976252234    4.947732053 77.44843748 1.36E-18    3.96E-15
......
2567    NM_139075   TPCN2   25  0.787834685 6.465138982 0.640374565 0.423575194 0.729821907
......

That's identical to the result shown in the demonstration. I believe that's the overall trend for DGE but not personalized.

For example, if you look at the gene TPCN2, it is overexpressed in patient 8 tumor (16.56 folds, read counts 3130 vs 189), and slightly under-expressed in patient 33 tumor (0.84 fold, read counts 256 vs 304) and patient 51 tumor (0.62 fold read counts 496 vs 800).

So from the ordinary test, the gene TPCN2 would appear insignificant anyway. However, to patient 8, it may play a role in his/her personalized medicine.

I wonder how I can perform individualized test with edgeR but still use the genewise BCV I've acquired from the multiple normal tissues I have.

Thank you. Field

ADD REPLY • link 4.1 years ago Field -Ye • 0

0

Entering edit mode

You simply continue on from the same analysis. You can have already estimated the dispersions from the paired analysis:

design <- model.matrix(~Patient+Tissue)
y <- estimateDisp(y, design, robust=TRUE)

Now make a new design matrix that has patient-specific tissue effects:

design.patientspecific <- model.matrix(~Patient+Patient:Tissue)

Fit glms using the same dispersions as before:

fit.patientspecific <- glmFit(y, design.patientspecific)

Now you can conduct tests for each patient:

lrt.patient1 <- glmLRT(fit.patientspecific, coef=4)
lrt.patient2 <- glmLRT(fit.patientspecific, coef=5)
lrt.patient3 <- glmLRT(fit.patientspecific, coef=6)

ADD REPLY • link 4.1 years ago Gordon Smyth 53k

0

Entering edit mode

Hi Professor Smyth,

Thank you very much for the code. That has solved my problem. All the best. Field

ADD REPLY • link 4.1 years ago Field -Ye • 0