Comapre two groups in edgeR- Coefficient and logFC explanation
JG13 • 0
@eeb93914
Last seen 7 days ago
Greece

i have a dataset in which i want to compare the disease vs control (disease/control) When I am doing the analysis I have Coefficient: -1*control 1*disease. Is this correct to compare disease vs control ? and the total DE results are

 summary(deg1)
-1*control 1*disease
Down                     5342
NotSig                  58145
Up                       5462


Could you please explain what is going on with the logFC?

Coefficient:  -1*control 1*parkinson
logFC   logCPM        LR       PValue          FDR
ENST00000309758.6 5.596663 5.290097 104.58809 1.503581e-24 1.036704e-19
ENST00000375650.5 2.499489 9.443004  73.07818 1.246143e-17 4.296015e-13
ENST00000375651.7 2.814114 9.399579  71.41025 2.901603e-17 6.668753e-13
ENST00000525876.1 3.499724 2.089409  70.70220 4.154286e-17 7.160846e-13
ENST00000674129.1 -1.615221 1.126     12.388    0.00043209  0.00730928


if it is positive means overexpression in disease or not? and if it negative means underexpression ?

Yunshun Chen ▴ 860
@yunshun-chen-5451
Last seen 4 days ago
Australia

It looks correct if your design matrix is constructed in a way that the disease column represents samples in the disease group and the control column is for all the control samples.

Under this testing contrast, a positive logFC means up-regulation in parkinon (disease) compared to the control, and vice versa.

So in case where i have Coefficient: -1*disease 1*control the positive logFC means underexpression and the negative over in disease? so it is the opposite than before ?

You can interpret this as if it were simple algebra (which it is). -1 * disease 1 * control is identical to control - disease, which shows the directionality. This is not how I would normally fit a contrast, because a negative logFC indicates upregulation in disease, which is not IMO how people would normally think about such things. I usually put the 'least affected' group in the denominator (these are logged coefficients, so log(disease) - log(control) == log(disease/control)), so control is always subtracted from treatment or disease or whatever. In which case the contrast should be 1 * disease -1 * control.

Yes i completely agree but somehow i used in all my data -1* disease 1* control which was coming from manual of edger and lrt12 <- glmLRT(fit, contrast=c(-1,1)). Thats why I am confused how to select the contrast in a proper way.

My design is :

\$design
disease control
1   0        1
2   0        1
3   0        1
4   1        0
5   1        0
6   1        0


So the -1 * disease 1 * control is the opposite .

the -1*control 1*disease or 1*disease -1*control is the same ?

Well, you shouldn't be using glmLRT these days. You want the quasi-likelihood pipeline.

I'll answer your question with a question. If contrast = c(-1,1) doesn't give you what you want (you want the opposite), how do you think one would change that to get what you do want?

I tried the contrast=c(1,-1)) which is the opposite. Maybe is should try the quasi-likelihood ! Thank you !