This is a head of my dataset, with expected counts from RSEM, with 3 biological replicates of 2 samples (control and treatment), as can be seen in the group variable below. > head(rsem, 10) X4_7 X7_9 X8_13 X7_10 X4_27 X5_25 comp0_c0_seq1 0 0 0 0 0 0 comp100001_c0_seq1 0 1 3 2 1 1 comp100006_c0_seq1 2 0 0 0 0 2 comp100009_c0_seq1 0 1 0 0 0 0 comp100010_c0_seq1 0 0 0 0 0 0 comp100012_c0_seq1 0 3 0 3 1 0 comp100013_c0_seq1 0 0 0 3 3 6 comp100017_c0_seq1 0 0 0 0 0 0 comp100018_c0_seq1 2 0 2 3 5 4 comp100019_c0_seq1 0 0 0 3 0 0
> group [1] CS CS CS TS TS TS Levels: CS TS
Contrasts:
Contrasts Levels treatment CS -1 TS 1
After filtering and running edgeR I obtain the results table:
> head(glm$glm_lrt$table, 10) logFC logCPM LR PValue comp0_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100010_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100013_c0_seq1 4.891693 -3.025308 12.954572 0.0003191411 comp100017_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100025_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100035_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100037_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100038_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp100039_c0_seq1 0.000000 -4.010505 0.000000 1.0000000000 comp10003_c0_seq1 1.186745 -1.847143 2.686438 0.1012058293
These are the respective coefficients (I guess):
> head(glm$glm_lrt$coefficients, 10) CS TS comp0_c0_seq1 -19.36797 -19.36797 comp100010_c0_seq1 -19.36797 -19.36797 comp100013_c0_seq1 -19.36797 -15.97731 comp100017_c0_seq1 -19.36797 -19.36797 comp100025_c0_seq1 -19.36797 -19.36797 comp100035_c0_seq1 -19.36797 -19.36797 comp100037_c0_seq1 -19.36797 -19.36797 comp100038_c0_seq1 -19.36797 -19.36797 comp100039_c0_seq1 -19.36797 -19.36797 comp10003_c0_seq1 -15.84720 -15.02462
> hist(glm$glm_lrt$coefficients)
Is it normal to have only negative values?
I guess these are the b(g) coefficients from:
"Here xi is a vector of covariates that specifies the treatment conditions applied to RNA sample i, and βg is a vector of regression coefficients by which the covariate effects are mediated for gene g." from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/
In the edgeR code (glmfit.R):
logFC <- drop((glmfit$coefficients %*% contrast)/log(2))
Where I guess the glmfit$coefficients are the beta(g) above and the "contrast" variable is the x(i) vector.
Is this the same formula? If not, what is the difference? If so, why is log(mu(gi)) = logFC?
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/ again:
"We denote the total number of mapped reads in library i by Ni and the number that map to the g-th gene by ygi."
Therefore, I would understand that
logFC = log(mu(gi) - mu(gj)) ~ x(i)*beta(i)+x(j)*beta(j)
for samples i, j. Is that correct?
Thank you in advance
I can probably guess, but how did you parametrize your design matrix? Also, what's your call to
glmLRT
?design <- model.matrix(~0+group)
colnames(design) <- levels(glm$samples$group)
my.constrasts <- makeContrasts(treatment = TS-CS, levels=design)
glm$glm_fit <- function(glm, design){
glm <- estimateGLMCommonDisp(glm, design)
glm <- estimateGLMTrendedDisp(glm, design)
glm <- estimateGLMTagwiseDisp(glm, design)
return(glm)
}
glm$glm_lrt <- glmLRT(glm$glm_fit, contrast=my.constrasts[,"treatment"])