single color using Limma : very low p-value "1.40456052121752e-60"

0

Entering edit mode

bigoun ▴ 40

@bigoun-4795

Last seen 9.7 years ago

Dear all, I'm analysing data single color data (only intensities at F532) using limma. I have 14 arrays in one group and 5 in the other group. I want to compare them so my targets file is as follows FileName Mal Sain M2.gpr 1 0 M3.gpr 1 0 M4.gpr 1 0 M5.gpr 1 0 M6.gpr 1 0 M7.gpr 1 0 M8.gpr 1 0 M9.gpr 1 0 M10.gpr 1 0 M11.gpr 1 0 M12.gpr 1 0 M13.gpr 1 0 M14.gpr 1 0 M15.gpr 1 0 S1.gpr 0 1 S2.gpr 0 1 S3.gpr 0 1 S4.gpr 0 1 S5.gpr 0 1 I use some correction and normalisation methods as recommanded in limma user guide. Afterthat, I make my design and as I have 2 technical replicate (2 spot for the same miRNA) for each array, I wanted to compute duplicatecorrelation coefficient .GAL file as follows : ATF 1 52 5 Type=GenePix ArrayList V1.0 BlockCount=48 BlockType=0 URL=http://genome-www4.stanford.edu/cgi-bin/SGD/locus.pl?locus=[ID] "Block1=2645, 2295, 100, 12, 300, 8, 300" "Block2=7145, 2295, 100, 12, 300, 8, 300" "Block3=11555, 2235, 100, 12, 300, 8, 300" "Block4=16055, 2235, 100, 12, 300, 8, 300" The corfit$consensus is not good (0,36). Do I still use this parameters for lmfit??? Anyway, the more strange is with or without this parameters I have p-adjust value very very low. first line of my result : Block Row Column ID Name X.Intercept. factor.pData.population.sain AveExpr F P.Value adj.P.Val 39 2 10 42554 hsa-miR-923 13.4125293049973 -0.0517387609947947 13.3989138415776 25264.4487822867 1.20563134868457e-63 1.40456052121752e-60 Is anybody help me and what is wrong. As you can see in UNFILTERED section, I have less than 50% genes for the analysis. Could it be the explanation of these strange result?? Thanks in advance My srcipt: library(limma) myFilter = function(X) { H_threshold=2 okFLAG = X$Flags > -49; okm1 =abs(X[,"F532 Median"]-X[,"F532 Mean"]) okm2 = 0.5*(X[,"F532 Median"]+X[,"F532 Mean"]) okH = ((okm1)/okm2) <h_threshold as.numeric(okflag="" &="" okh)}="" targets="readTargets(file=" targets.txt","="" path="NULL," sep="\t" )="" e="read.maimages(targets,source=" genepix",wt.fun="myFilter,columns=list"" (e="F532 Mean" ,eb="B532 Mean" ,names="Name" ))="" #taux="" de="" filtrage="" unfiltered="apply(E$weights,MARGIN=2,FUN=mean)" round(unfiltered,2)="" #="" m2="" m3="" m4="" m5="" m6="" m7="" m8="" m9="" m10="" m11="" m12="" m13="" m14="" m15="" s1="" s2="" #0.41="" 0.43="" 0.34="" 0.29="" 0.51="" 0.35="" 0.48="" 0.35="" 0.29="" 0.56="" 0.38="" 0.56="" 0.61="" 0.52="" 0.48="" 0.34="" #="" s3="" s4="" s5="" #0.36="" 0.46="" 0.47="" #boite="" a="" moustache="" avant="" correction="" boxplot(as.data.frame(e$e),main="Mean intensities" )="" #correction="" background="" -="" normexp="" enorm="" <-="" backgroundcorrect(e,="" method="normexp" ,offset="1)" boxplot(as.data.frame(enorm$e),="" main="Mean intensities - normexp" )="" #normalisation="" quantile="" norme="" <-="" normalizebetweenarrays(enorm,method="quantile" )="" boxplot(as.data.frame(norme$e),="" main="Normalized intensities" )="" norme$e="" <-="" log2(norme$e)="" pdata="" <-="" data.frame(population="c('mal','mal'," 'mal',="" 'mal','mal','mal','mal','mal','mal','mal','mal','mal','mal','mal','sai="" n','sain','sain','sain','sain'))="" #="" create="" design="" matrix="" design="" <-="" model.matrix(~factor(pdata$population))="" #="" in="" my="" .gpr="" files="" all="" mirnas="" contain="" the="" string="" "mir"="" in="" their="" name="" #="" so="" the="" grep="" function="" can="" be="" used="" to="" extract="" all="" of="" these,="" removing="" #="" all="" control="" signals="" and="" printing="" buffers="" etc.="" #="" you="" need="" to="" check="" your="" .gpr="" files="" to="" find="" which="" signals="" you="" should="" extract.="" mirs="" <-="" c(grep("-mir-",="" norme$genes$name),="" grep("-let-",="" norme$genes$name))="" norme.final="" <-="" norme[mirs,="" ]="" norme.final="" <-norme.final[order(norme.final$genes$name),="" ]="" norme.final$genes$name="" #="" calculate="" duplicate="" correlation="" between="" the="" 2="" replicates="" on="" each="" array="" corfit="" <-="" duplicatecorrelation(norme.final,="" design,="" ndups="2)" corfit$consensus="" #="" show="" a="" boxplot="" of="" the="" correlations="" boxplot(tanh(corfit$atanh.correlations))="" #="" fit="" the="" linear="" model,="" including="" info="" on="" duplicates="" and="" correlation="" fit="" <-="" lmfit(norme.final,="" design,="" ndups="2," correlation="corfit$consensus)" #fit="" <-="" lmfit(norme.final,="" design)="" #="" calculate="" values="" using="" ebayes="" ebayes="" <-="" ebayes(fit)="" #="" output="" a="" list="" of="" top="" differnetially="" expressed="" genes...="" result="topTable(ebayes,number=4608," p="0.05,adjust" =="" "bh")="" genes="result$Status==" genes""="" write.table(result,"gene.txt",quote="FALSE,sep=" \t",row.names="FALSE,"" col.names="TRUE)" mikel="" [[alternative="" html="" version="" deleted]]="" <="" div="">

• 1.2k views

ADD COMMENT • link updated 12.7 years ago by Sean Davis 21k • written 12.7 years ago by bigoun ▴ 40

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

I didn't go through the code in great detail, but since these are 1-color data, make sure that you are not testing that the intercept is different than 0 (and it looks like you probably are). Sean On Wed, Aug 10, 2011 at 5:11 AM, bigoun <bigounn at="" yahoo.fr=""> wrote: > Dear all, > > I'm analysing data single color data (only intensities at F532) using limma. > > I have 14 arrays in one group and 5 in the other group. I want to compare them > so my targets file is as follows > FileName Mal Sain > M2.gpr 1 0 > M3.gpr 1 0 > M4.gpr 1 0 > M5.gpr 1 0 > M6.gpr 1 0 > M7.gpr 1 0 > M8.gpr 1 0 > M9.gpr 1 0 > M10.gpr 1 0 > M11.gpr 1 0 > M12.gpr 1 0 > M13.gpr 1 0 > M14.gpr 1 0 > M15.gpr 1 0 > S1.gpr 0 1 > S2.gpr 0 1 > S3.gpr 0 1 > S4.gpr 0 1 > S5.gpr 0 1 > > I use some correction and normalisation methods as recommanded in limma user > guide. Afterthat, I make my design and as I have 2 technical replicate (2 spot > for the same miRNA) for each array, I wanted to compute duplicatecorrelation > coefficient > > > .GAL file as follows : > ATF 1 > 52 5 > Type=GenePix ArrayList V1.0 > BlockCount=48 > BlockType=0 > URL=http://genome-www4.stanford.edu/cgi-bin/SGD/locus.pl?locus=[ID] > "Block1=2645, 2295, 100, 12, 300, 8, 300" > "Block2=7145, 2295, 100, 12, 300, 8, 300" > "Block3=11555, 2235, 100, 12, 300, 8, 300" > "Block4=16055, 2235, 100, 12, 300, 8, 300" > > > The ?corfit$consensus is not good (0,36). Do I still use this parameters for > lmfit??? > > Anyway, the more strange is with or without this parameters I have p-adjust > value very very low. > > first line of my result : > Block Row Column ID Name X.Intercept. factor.pData.population.sain AveExpr F P.Value adj.P.Val > > 39 2 10 42554 hsa-miR-923 13.4125293049973 -0.0517387609947947 13.3989138415776 25264.4487822867 1.20563134868457e-63 1.40456052121752e-60 > > > Is anybody help me and what is wrong. > > As you can see in UNFILTERED section, I have less than 50% genes for the > analysis. Could it be the explanation of these strange result?? > > Thanks in advance > > My srcipt: > library(limma) > myFilter = function(X) { > H_threshold=2 > okFLAG = X$Flags > -49; > okm1 =abs(X[,"F532 Median"]-X[,"F532 Mean"]) > okm2 = 0.5*(X[,"F532 Median"]+X[,"F532 Mean"]) > okH = ((okm1)/okm2) <h_threshold> as.numeric(okFLAG & okH)} > > targets=readTargets(file="targets.txt", path=NULL, sep="\t") > > E =read.maimages(targets,source="genepix",wt.fun=myFilter,columns=li st(E="F532 > Mean",Eb="B532 Mean",Names="Name")) > > #Taux de filtrage > unFiltered = apply(E$weights,MARGIN=2,FUN=mean) > round(unFiltered,2) > > # ?M2 ? M3 ? M4 ? M5 ? M6 ? M7 ? M8 ? M9 ?M10 ?M11 ?M12 ?M13 ?M14 ?M15 ? S1 ? S2 > > #0.41 0.43 0.34 0.29 0.51 0.35 0.48 0.35 0.29 0.56 0.38 0.56 0.61 0.52 0.48 0.34 > > # ?S3 ? S4 ? S5 > #0.36 0.46 0.47 > > #Boite a moustache avant correction > boxplot(as.data.frame(E$E),main="Mean intensities") > > #Correction background - normexp > Enorm <- backgroundCorrect(E, method="normexp",offset=1) > boxplot(as.data.frame(Enorm$E), main="Mean intensities - normexp") > #Normalisation quantile > NormE <- normalizeBetweenArrays(Enorm,method="quantile") > boxplot(as.data.frame(NormE$E), main="Normalized intensities") > > NormE$E <- log2(NormE$E) > > pData <- data.frame(population = c('mal','mal', 'mal', > 'mal','mal','mal','mal','mal','mal','mal','mal','mal','mal','mal','s ain','sain','sain','sain','sain')) > > > # create design matrix > design <- model.matrix(~factor(pData$population)) > > # In my .gpr files all miRNAs contain the string "miR" in their name > # so the grep function can be used to extract all of these, removing > # all control signals and printing buffers etc. > # You need to check your .gpr files to find which signals you should extract. > miRs <- c(grep("-miR-", NormE$genes$Name), grep("-let-", NormE$genes$Name)) > NormE.final <- NormE[miRs, ] > NormE.final <-NormE.final[order(NormE.final$genes$Name), ] > NormE.final$genes$Name > > # calculate duplicate correlation between the 2 replicates on each array > corfit <- duplicateCorrelation(NormE.final, design, ndups=2) > corfit$consensus > # show a boxplot of the correlations > boxplot(tanh(corfit$atanh.correlations)) > > # fit the linear model, including info on duplicates and correlation > fit <- lmFit(NormE.final, design, ndups=2, ?correlation=corfit$consensus) > #fit <- lmFit(NormE.final, design) > # calculate values using ebayes > ebayes <- eBayes(fit) > > # output a list of top differnetially expressed genes... > result=topTable(ebayes,number=4608, p=0.05,adjust = "BH") > genes=result$Status=="genes" > write.table(result,"gene.txt",quote=FALSE,sep="\t",row.names=FALSE, > col.names=TRUE) > > > mikel > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.7 years ago Sean Davis 21k

0

Entering edit mode

Agreed. Specifying the second coefficient (which excludes the intercept term) in topTable should do what you want: result=topTable(ebayes,coef=2,number=4608, p=0.05,adjust = "BH") Alex On Wed, 10 Aug 2011 13:21:00 -0400, Sean Davis wrote: > I didn't go through the code in great detail, but since these are > 1-color data, make sure that you are not testing that the intercept > is > different than 0 (and it looks like you probably are). > > Sean > > > On Wed, Aug 10, 2011 at 5:11 AM, bigoun <bigounn at="" yahoo.fr=""> wrote: >> Dear all, >> >> I'm analysing data single color data (only intensities at F532) >> using limma. >> >> I have 14 arrays in one group and 5 in the other group. I want to >> compare them >> so my targets file is as follows >> FileName Mal Sain >> M2.gpr 1 0 >> M3.gpr 1 0 >> M4.gpr 1 0 >> M5.gpr 1 0 >> M6.gpr 1 0 >> M7.gpr 1 0 >> M8.gpr 1 0 >> M9.gpr 1 0 >> M10.gpr 1 0 >> M11.gpr 1 0 >> M12.gpr 1 0 >> M13.gpr 1 0 >> M14.gpr 1 0 >> M15.gpr 1 0 >> S1.gpr 0 1 >> S2.gpr 0 1 >> S3.gpr 0 1 >> S4.gpr 0 1 >> S5.gpr 0 1 >> >> I use some correction and normalisation methods as recommanded in >> limma user >> guide. Afterthat, I make my design and as I have 2 technical >> replicate (2 spot >> for the same miRNA) for each array, I wanted to compute >> duplicatecorrelation >> coefficient >> >> >> .GAL file as follows : >> ATF 1 >> 52 5 >> Type=GenePix ArrayList V1.0 >> BlockCount=48 >> BlockType=0 >> URL=http://genome-www4.stanford.edu/cgi-bin/SGD/locus.pl?locus=[ID] >> "Block1=2645, 2295, 100, 12, 300, 8, 300" >> "Block2=7145, 2295, 100, 12, 300, 8, 300" >> "Block3=11555, 2235, 100, 12, 300, 8, 300" >> "Block4=16055, 2235, 100, 12, 300, 8, 300" >> >> >> The ?corfit$consensus is not good (0,36). Do I still use this >> parameters for >> lmfit??? >> >> Anyway, the more strange is with or without this parameters I have >> p-adjust >> value very very low. >> >> first line of my result : >> Block Row Column ID Name X.Intercept. factor.pData.population.sain >> AveExpr F P.Value adj.P.Val >> >> 39 2 10 42554 hsa-miR-923 13.4125293049973 -0.0517387609947947 >> 13.3989138415776 25264.4487822867 1.20563134868457e-63 >> 1.40456052121752e-60 >> >> >> Is anybody help me and what is wrong. >> >> As you can see in UNFILTERED section, I have less than 50% genes for >> the >> analysis. Could it be the explanation of these strange result?? >> >> Thanks in advance >> >> My srcipt: >> library(limma) >> myFilter = function(X) { >> H_threshold=2 >> okFLAG = X$Flags > -49; >> okm1 =abs(X[,"F532 Median"]-X[,"F532 Mean"]) >> okm2 = 0.5*(X[,"F532 Median"]+X[,"F532 Mean"]) >> okH = ((okm1)/okm2) <h_threshold>> as.numeric(okFLAG & okH)} >> >> targets=readTargets(file="targets.txt", path=NULL, sep="\t") >> >> E >> =read.maimages(targets,source="genepix",wt.fun=myFilter,columns=lis t(E="F532 >> Mean",Eb="B532 Mean",Names="Name")) >> >> #Taux de filtrage >> unFiltered = apply(E$weights,MARGIN=2,FUN=mean) >> round(unFiltered,2) >> >> # ?M2 ? M3 ? M4 ? M5 ? M6 ? M7 ? M8 ? M9 ?M10 ?M11 ?M12 ?M13 ?M14 >> ?M15 ? S1 ? S2 >> >> #0.41 0.43 0.34 0.29 0.51 0.35 0.48 0.35 0.29 0.56 0.38 0.56 0.61 >> 0.52 0.48 0.34 >> >> # ?S3 ? S4 ? S5 >> #0.36 0.46 0.47 >> >> #Boite a moustache avant correction >> boxplot(as.data.frame(E$E),main="Mean intensities") >> >> #Correction background - normexp >> Enorm <- backgroundCorrect(E, method="normexp",offset=1) >> boxplot(as.data.frame(Enorm$E), main="Mean intensities - normexp") >> #Normalisation quantile >> NormE <- normalizeBetweenArrays(Enorm,method="quantile") >> boxplot(as.data.frame(NormE$E), main="Normalized intensities") >> >> NormE$E <- log2(NormE$E) >> >> pData <- data.frame(population = c('mal','mal', 'mal', >> >> 'mal','mal','mal','mal','mal','mal','mal','mal','mal','mal','mal',' sain','sain','sain','sain','sain')) >> >> >> # create design matrix >> design <- model.matrix(~factor(pData$population)) >> >> # In my .gpr files all miRNAs contain the string "miR" in their name >> # so the grep function can be used to extract all of these, removing >> # all control signals and printing buffers etc. >> # You need to check your .gpr files to find which signals you should >> extract. >> miRs <- c(grep("-miR-", NormE$genes$Name), grep("-let-", >> NormE$genes$Name)) >> NormE.final <- NormE[miRs, ] >> NormE.final <-NormE.final[order(NormE.final$genes$Name), ] >> NormE.final$genes$Name >> >> # calculate duplicate correlation between the 2 replicates on each >> array >> corfit <- duplicateCorrelation(NormE.final, design, ndups=2) >> corfit$consensus >> # show a boxplot of the correlations >> boxplot(tanh(corfit$atanh.correlations)) >> >> # fit the linear model, including info on duplicates and correlation >> fit <- lmFit(NormE.final, design, ndups=2, >> ?correlation=corfit$consensus) >> #fit <- lmFit(NormE.final, design) >> # calculate values using ebayes >> ebayes <- eBayes(fit) >> >> # output a list of top differnetially expressed genes... >> result=topTable(ebayes,number=4608, p=0.05,adjust = "BH") >> genes=result$Status=="genes" >> write.table(result,"gene.txt",quote=FALSE,sep="\t",row.names=FALSE, >> col.names=TRUE) >> >> >> mikel >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Alex Gutteridge

ADD REPLY • link 12.7 years ago Alex Gutteridge ▴ 650

Login before adding your answer.