Problem with analysis of paired data using Illumina arrays and the Lumi and limma packages

0

Entering edit mode

Groeneweg M PATH ▴ 10

@groeneweg-m-path-3054

Last seen 9.6 years ago

Dear all, We are currently analyzing Illumina data using the lumi and affy packages. The results we are getting are suspicious: we get adjusted p-values in the 10^-40 to 10^-98 range. Obviously, we are doing something wrong. The experiment consists of 20 patients of whom we have positive and negative samples, in other words, we have paired observations in each patient. We did both a paired and an unpaired analysis, but this did not matter much. We also tried SAM analysis using siggenes, but again with the same results. As a test, we also used the non-normalized data, but this did not change the results. Has anybody an idea what is going wrong? We used the following code (abbreviated where appropriate): library(lumi); library(limma); library(Matrix); library(GOstats); library(siggenes); fileName <- 'Group Probe Profile TAB paired.csv'; wouter.lumi <- lumiR(fileName, lib='lumiHumanAll.db'); wouter.lumi; summary(wouter.lumi, 'QC'); wouter.lumi.N.Q <- lumiExpresso(wouter.lumi, QC.evaluation=TRUE); summary(wouter.lumi.N.Q); summary(wouter.lumi.N.Q, 'QC'); ## Differential expression over all data sets and gene indentification ## wouter.lumi.N <- lumiN(wouter.lumi.N.Q); dataMatrix <- exprs(wouter.lumi.N.Q); presentCount <- detectionCall(wouter.lumi.N.Q); selDataMatrix <- dataMatrix[presentCount > 0,]; selProbe <- rownames(selDataMatrix); ## Paired Comparission ## specify the groups and conditions PlaqueType <- c('A', 'B', 'A', 'B', 'A', 'B', ... 'A', 'B', 'A', 'B', 'A') ; PatientType <- c("p01", "p01", "p02", "p02", "p03", "p03", ... "p21", "p21", "p22", "p22", "p16", "p18"); ## top 10 genes if(require(limma)) { factor.PlaqueType <- factor(PlaqueType, levels=c("A","B")); factor.PatientType <- factor(PatientType); design <- model.matrix(~ 0+factor.PatientType+factor.PlaqueType); fit <- lmFit(selDataMatrix, design); ## contrast.matrix <-makeContrasts(A-B, levels=design); ## fit2<-contrasts.fit(fit,contrast.matrix) fit <-eBayes(fit); if (require(lumiHumanAll.db) & require(annotate)) { if (nrow(fit$genes) == 0) fit$genes <- data.frame(ID=rownames(selDataMatrix)) geneSymbol <- getSYMBOL(fit$genes$ID, 'lumiHumanAll.db') fit$genes <- data.frame(fit$genes, geneSymbol=geneSymbol) } ## Print the top 10 genes print(topTable(fit, number=10)) ## get significant gene list with FDR adjusted p values less than 0.01 p.adj <- p.adjust(fit$p.value[,2]) sigGene.adj <- selProbe[ p.adj < 0.01] ## write.table(sigGene.adj, file="significant_genes.csv", sep=";"); ## without FDR adjustment sigGene <- selProbe[ fit$p.value[,2] < 0.01] } ID geneSymbol factor.PatientTypep01 factor.PatientTypep02 1820 ZdpxBOa0B_.YqnniiE C22orf33 7.324658 7.314660 7974 oeBgiYqwKqi5.p7lTc MLLT4 7.338124 7.305449 1678 HYFYQOpk4Y6IkqA6Hk C1orf183 7.310553 7.350842 27 Nonl3upr2ojFAogAsU ABCA5 7.301086 7.296429 12509 BkA6hCdSI1BR4pUQio TAS2R3 7.311262 7.324070 11087 oTZyp7gBIx7iV4FdSg RXRG 7.324615 7.318605 2657 oBKSggJ.exxFIq_qe4 CDY1B 7.394242 7.332206 10176 fSIikgsewouigigijo PSMC3IP 7.372082 7.354313 12106 Wqi4jiw9KgiCoIiCLc SPINLW1 7.339931 7.297919 2429 6lWX99edJ1RohUS3dc CCR10 7.393743 7.383870 factor.PatientTypep03 factor.PatientTypep04 factor.PatientTypep05 1820 7.335805 7.307950 7.314493 7974 7.344440 7.327994 7.343561 1678 7.314656 7.342585 7.320585 27 7.275228 7.344058 7.287226 12509 7.328430 7.317047 7.325980 11087 7.362075 7.319144 7.307189 2657 7.392088 7.380562 7.326872 10176 7.418103 7.346580 7.331416 12106 7.355262 7.334333 7.312377 2429 7.402435 7.382996 7.398166 etc... factor.PlaqueTypeB AveExpr F P.Value adj.P.Val 1820 5.812695e-03 7.334350 79437.93 1.524058e-46 6.165472e-43 7974 9.565944e-03 7.351467 76510.25 2.253042e-46 6.165472e-43 1678 7.535491e-03 7.351725 76065.06 2.394156e-46 6.165472e-43 27 5.886626e-05 7.312479 73115.55 3.613654e-46 6.165472e-43 12509 -2.588097e-05 7.331387 72769.81 3.796430e-46 6.165472e-43 11087 -5.466541e-04 7.341570 71478.61 4.574186e-46 6.165472e-43 2657 6.219771e-03 7.359755 69690.49 5.954555e-46 6.165472e-43 10176 1.360613e-03 7.367441 69121.32 6.485202e-46 6.165472e-43 12106 -3.237106e-03 7.327304 68823.54 6.783319e-46 6.165472e-43 2429 -1.786489e-02 7.380408 68733.39 6.876512e-46 6.165472e-43 Thanks you very much in advance, Mathijs [[alternative HTML version deleted]]

probe siggenes lumi probe siggenes lumi • 987 views

ADD COMMENT • link updated 15.6 years ago by James W. MacDonald 65k • written 15.6 years ago by Groeneweg M PATH ▴ 10

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 7 hours ago

United States

Hi Mathijs, Groeneweg M (PATH) wrote: > Dear all, > > We are currently analyzing Illumina data using the lumi and affy packages. > The results we are getting are suspicious: we get adjusted p-values in the 10^-40 to 10^-98 range. > Obviously, we are doing something wrong. The experiment consists of 20 patients of whom we have positive and negative samples, in other words, we have paired observations in each patient. We did both a paired and an unpaired analysis, but this did not matter much. We also tried SAM analysis using siggenes, but again with the same results. As a test, we also used the non-normalized data, but this did not change the results. Has anybody an idea what is going wrong? > > We used the following code (abbreviated where appropriate): > > library(lumi); > library(limma); > library(Matrix); > library(GOstats); > library(siggenes); > fileName <- 'Group Probe Profile TAB paired.csv'; > wouter.lumi <- lumiR(fileName, lib='lumiHumanAll.db'); > wouter.lumi; > summary(wouter.lumi, 'QC'); > > wouter.lumi.N.Q <- lumiExpresso(wouter.lumi, QC.evaluation=TRUE); > summary(wouter.lumi.N.Q); > summary(wouter.lumi.N.Q, 'QC'); > ## Differential expression over all data sets and gene indentification > ## wouter.lumi.N <- lumiN(wouter.lumi.N.Q); > dataMatrix <- exprs(wouter.lumi.N.Q); > presentCount <- detectionCall(wouter.lumi.N.Q); > selDataMatrix <- dataMatrix[presentCount > 0,]; > selProbe <- rownames(selDataMatrix); > ## Paired Comparission > ## specify the groups and conditions > PlaqueType <- c('A', 'B', 'A', 'B', 'A', 'B', ... 'A', 'B', 'A', 'B', 'A') ; > PatientType <- c("p01", "p01", "p02", "p02", "p03", "p03", ... "p21", "p21", "p22", "p22", "p16", "p18"); > ## top 10 genes > if(require(limma)) > { > factor.PlaqueType <- factor(PlaqueType, levels=c("A","B")); > factor.PatientType <- factor(PatientType); > design <- model.matrix(~ 0+factor.PatientType+factor.PlaqueType); > fit <- lmFit(selDataMatrix, design); > ## contrast.matrix <-makeContrasts(A-B, levels=design); > ## fit2<-contrasts.fit(fit,contrast.matrix) > fit <-eBayes(fit); > if (require(lumiHumanAll.db) & require(annotate)) > { > if (nrow(fit$genes) == 0) fit$genes <- data.frame(ID=rownames(selDataMatrix)) > geneSymbol <- getSYMBOL(fit$genes$ID, 'lumiHumanAll.db') > fit$genes <- data.frame(fit$genes, geneSymbol=geneSymbol) > } > ## Print the top 10 genes > print(topTable(fit, number=10)) > > ## get significant gene list with FDR adjusted p values less than 0.01 > p.adj <- p.adjust(fit$p.value[,2]) > sigGene.adj <- selProbe[ p.adj < 0.01] > ## write.table(sigGene.adj, file="significant_genes.csv", sep=";"); > ## without FDR adjustment > sigGene <- selProbe[ fit$p.value[,2] < 0.01] You have commented out the part where you fit the contrast, instead running eBayes() on your original model fit. Since you aren't computing a contrast, you are simply testing to see if the coefficients are different from zero (in this case you are testing to see if 'B' is different from zero). For the vast majority of your data this will be unequivocally true, so you will get really small p-values. You also seem to want to do unnecessarily tricky things. Rather than subsetting the data objects by hand, why don't you simply use the wrappers that exist in limma to do these things for you? For instance, you can get the top genes based on p-value and/or fold change using topTable(). Best, Jim > } > ID geneSymbol factor.PatientTypep01 factor.PatientTypep02 > 1820 ZdpxBOa0B_.YqnniiE C22orf33 7.324658 7.314660 > 7974 oeBgiYqwKqi5.p7lTc MLLT4 7.338124 7.305449 > 1678 HYFYQOpk4Y6IkqA6Hk C1orf183 7.310553 7.350842 > 27 Nonl3upr2ojFAogAsU ABCA5 7.301086 7.296429 > 12509 BkA6hCdSI1BR4pUQio TAS2R3 7.311262 7.324070 > 11087 oTZyp7gBIx7iV4FdSg RXRG 7.324615 7.318605 > 2657 oBKSggJ.exxFIq_qe4 CDY1B 7.394242 7.332206 > 10176 fSIikgsewouigigijo PSMC3IP 7.372082 7.354313 > 12106 Wqi4jiw9KgiCoIiCLc SPINLW1 7.339931 7.297919 > 2429 6lWX99edJ1RohUS3dc CCR10 7.393743 7.383870 > factor.PatientTypep03 factor.PatientTypep04 factor.PatientTypep05 > 1820 7.335805 7.307950 7.314493 > 7974 7.344440 7.327994 7.343561 > 1678 7.314656 7.342585 7.320585 > 27 7.275228 7.344058 7.287226 > 12509 7.328430 7.317047 7.325980 > 11087 7.362075 7.319144 7.307189 > 2657 7.392088 7.380562 7.326872 > 10176 7.418103 7.346580 7.331416 > 12106 7.355262 7.334333 7.312377 > 2429 7.402435 7.382996 7.398166 > > etc... > > factor.PlaqueTypeB AveExpr F P.Value adj.P.Val > 1820 5.812695e-03 7.334350 79437.93 1.524058e-46 6.165472e-43 > 7974 9.565944e-03 7.351467 76510.25 2.253042e-46 6.165472e-43 > 1678 7.535491e-03 7.351725 76065.06 2.394156e-46 6.165472e-43 > 27 5.886626e-05 7.312479 73115.55 3.613654e-46 6.165472e-43 > 12509 -2.588097e-05 7.331387 72769.81 3.796430e-46 6.165472e-43 > 11087 -5.466541e-04 7.341570 71478.61 4.574186e-46 6.165472e-43 > 2657 6.219771e-03 7.359755 69690.49 5.954555e-46 6.165472e-43 > 10176 1.360613e-03 7.367441 69121.32 6.485202e-46 6.165472e-43 > 12106 -3.237106e-03 7.327304 68823.54 6.783319e-46 6.165472e-43 > 2429 -1.786489e-02 7.380408 68733.39 6.876512e-46 6.165472e-43 > > Thanks you very much in advance, > > Mathijs > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 15.6 years ago James W. MacDonald 65k

Login before adding your answer.