Problems with edgeR for differential expression.
0
0
Entering edit mode
Nick Schurch ▴ 60
@nick-schurch-4014
Last seen 9.6 years ago
Dear Bioconductors, I'm trying to use edgeR to compute the differential expression from some RNA-Seq data, but it seems to be generating some odd results. I'm running edgeR on data on about 6000 genes, across 5 experimental conditions. When I compute the differential expression for any two of these conditions edgeR it is returning a 'nan' fold-change for about 1500 out of the 6000 genes being tested. Amazingly, it is also returning p-values for these fold-changes, and the p-values cover a range of values from total insignificant (>0.5) to very significant (<1E-4)! At first I thought if might be because there is sometimes no signal for a gene in a given condition, but 1) other cases of this nature produce '-inf' or 'inf' fold-changes, not 'nan' fold-hanges, and 2) in some cases edgeR is calculating a 'nan' fold-change for something that has signal in every replicate of both conditions! I've checked everything I can think of... I don't get any errors or warnings, the Norm Factors are sensible, the raw signal is sensible, all the objects look well-formed (i.e. like they contain all the bits they should contain).... its very confusing and frustrating. So, am I doing something wrong? Or is there a deeper problem with this package? I'm using R v 2.13.1, edgeR 2.2.5 and the commands I'm using are: > # create the design matrix > model.groups<-groups > model.factors<-as.factor(model.groups) > model<-model.matrix(~model.factors) > > # build DGElist and calculate normalization factors > x=as.data.frame(data) > rownames(x)=genenames > y=DGEList(x,group=groups) > y=calcNormFactors(y) > str(y) Formal class 'DGEList' [package "edgeR"] with 1 slots ..@ .Data:List of 3 .. ..$ :'data.frame': 17 obs. of 3 variables: .. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3 3 4 4 4 3 3 1 1 ... .. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920 2780054 ... .. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ... .. ..$ : num [1:6351, 1:17] 7 187 44 0 98 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021" "cond07.rep0022" "cond08.rep0023" ... .. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ... .. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538" "38539" ... > > # estimate dispersion and fit models > z=estimateGLMCommonDisp(y, design)) > str(z) Formal class 'DGEList' [package "edgeR"] with 1 slots ..@ .Data:List of 4 .. ..$ :'data.frame': 17 obs. of 3 variables: .. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3 3 4 4 4 3 3 1 1 ... .. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920 2780054 ... .. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ... .. ..$ : num [1:6351, 1:17] 7 187 44 0 98 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021" "cond07.rep0022" "cond08.rep0023" ... .. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ... .. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538" "38539" ... .. ..$ : num 0.101 > > fit<-glmFit(z,model,dispersion=z$common.dispersion) > str(fit) Formal class 'DGEGLM' [package "edgeR"] with 1 slots ..@ .Data:List of 12 .. ..$ : num [1:6351, 1:5] -12.89 -9.22 -10.72 -11.54 -9.56 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06" "model.factorscond07" "model.factorscond08" ... .. ..$ : int [1:6351] 12 12 12 12 12 12 12 12 12 12 ... .. ..$ : Named num [1:6351] 23.3 3.45 30.57 25.89 3.07 ... .. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538" "38539" ... .. ..$ : num [1:17, 1:5] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ... .. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06" "model.factorscond07" "model.factorscond08" ... .. .. ..- attr(*, "assign")= int [1:5] 0 1 1 1 1 .. .. ..- attr(*, "contrasts")=List of 1 .. .. .. ..$ model.factors: chr "contr.treatment" .. ..$ : num [1:6351, 1:17] 14.4 14.4 14.4 14.4 14.4 ... .. ..$ :'data.frame': 17 obs. of 3 variables: .. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3 3 4 4 4 3 3 1 1 ... .. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920 2780054 ... .. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ... .. ..$ : NULL .. ..$ : num 0.101 .. ..$ : num [1:17] 1851365 2313681 1716797 2882487 2703330 ... .. ..$ : NULL .. ..$ : num [1:6351, 1:17] 6.32 160.53 59.13 12.7 105.68 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:17] "cond07.rep0020" "cond07.rep0021" "cond07.rep0022" "cond08.rep0023" ... .. ..$ : Named num [1:6351] -12.67 -9.33 -10.36 -11.66 -9.64 ... .. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538" "38539" ... > > # liklihood ratio statistics > results=glmLRT(z, fit, coef = 4) > str(results) Formal class 'DGELRT' [package "edgeR"] with 1 slots ..@ .Data:List of 10 .. ..$ :'data.frame': 17 obs. of 3 variables: .. .. ..$ group : Factor w/ 5 levels "cond00","cond06",..: 3 3 3 4 4 4 3 3 1 1 ... .. .. ..$ lib.size : num [1:17] 1866963 2364994 1712838 2782920 2780054 ... .. .. ..$ norm.factors: num [1:17] 0.992 0.978 1.002 1.036 0.972 ... .. ..$ : Named logi [1:6351] FALSE FALSE FALSE FALSE FALSE FALSE ... .. .. ..- attr(*, "names")= chr [1:6351] "38536" "38537" "38538" "38539" ... .. ..$ : num 0.101 .. ..$ :'data.frame': 6351 obs. of 4 variables: .. .. ..$ logConc : num [1:6351] -12.67 -9.33 -10.36 -11.66 -9.64 ... .. .. ..$ logFC : num [1:6351] 0.7057 -0.0157 0.5583 -0.3311 -0.1913 ... .. .. ..$ LR.statistic: num [1:6351] 1.38671 0.00165 1.83369 0.46732 0.24042 ... .. .. ..$ p.value : num [1:6351] 0.239 0.968 0.176 0.494 0.624 ... .. ..$ : num [1:6351, 1:5] -12.89 -9.22 -10.72 -11.54 -9.56 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06" "model.factorscond07" "model.factorscond08" ... .. ..$ : num [1:6351, 1:4] -12.63 -9.22 -10.52 -11.63 -9.62 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:6351] "38536" "38537" "38538" "38539" ... .. .. .. ..$ : chr [1:4] "(Intercept)" "model.factorscond07" "model.factorscond08" "model.factorscond10" .. ..$ : num [1:17, 1:5] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ... .. .. .. ..$ : chr [1:5] "(Intercept)" "model.factorscond06" "model.factorscond07" "model.factorscond08" ... .. .. ..- attr(*, "assign")= int [1:5] 0 1 1 1 1 .. .. ..- attr(*, "contrasts")=List of 1 .. .. .. ..$ model.factors: chr "contr.treatment" .. ..$ : num [1:17, 1:4] 1 1 1 1 1 1 1 1 1 1 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:17] "1" "2" "3" "4" ... .. .. .. ..$ : chr [1:4] "(Intercept)" "model.factorscond07" "model.factorscond08" "model.factorscond10" .. ..$ : num 0.101 .. ..$ : chr "model.factorscond06" These are some examples of the results: -------------------- edgeR out: LR.statistic: 0.975075487675 logConc: -13.3931239073 p.value: 0.323417620577 logFC: NaN geneID: 38956 Raw data: Condition 1: replicate 1: 2,372,532 good reads, female: 8.0 replicate 2: 3,966,968 good reads, female: no signal replicate 3: 1,389,571 good reads, male: no signal Condition 2: replicate 1: 3,102,608 good reads, male: 5.0 replicate 2: 3,451,983 good reads, male: no signal replicate 3: 2,892,192 good reads, male: no signal replicate 4: 3,620,124 good reads, male: 5.0 replicate 5: 2,640,968 good reads, female: no signal -------------------- -------------------- edgeR out: LR.statistic: 3.57045101322 logConc: -13.6684814046 p.value: 0.0588163345523 logFC: NaN geneID: 38959 Raw data: Condition 1: replicate 1: 2,372,532 good reads, female: 5.0 replicate 2: 3,966,968 good reads, female: 5.0 replicate 3: 1,389,571 good reads, male: no signal Condition 2: replicate 1: 3,102,608 good reads, male: no signal replicate 2: 3,451,983 good reads, male: no signal replicate 3: 2,892,192 good reads, male: 7.0 replicate 4: 3,620,124 good reads, male: no signal replicate 5: 2,640,968 good reads, female: no signal -------------------- -------------------- edgeR out: LR.statistic: 1.81602638091 logConc: -11.265720531 p.value: 0.177786996458 logFC: NaN geneID: 38965 Raw data: Condition 1: replicate 1: 2,372,532 good reads, female: 22.0 replicate 2: 3,966,968 good reads, female: 27.0 replicate 3: 1,389,571 good reads, male: 5.0 Condition 2: replicate 1: 3,102,608 good reads, male: 26.0 replicate 2: 3,451,983 good reads, male: 26.0 replicate 3: 2,892,192 good reads, male: 9.0 replicate 4: 3,620,124 good reads, male: 36.0 replicate 5: 2,640,968 good reads, female: 55.0 -------------------- -- Cheers, Nick Schurch Data Analysis Group (The Barton Group), School of Life Sciences, University of Dundee, Dow St, Dundee, DD1 5EH, Scotland, UK Tel: +44 1382 388707 Fax: +44 1382 345 893 [[alternative HTML version deleted]]
Normalization edgeR Normalization edgeR • 961 views
ADD COMMENT

Login before adding your answer.

Traffic: 739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6