Question: limma and minimum fold change
1
gravatar for Eva Benito Garagorri
8.9 years ago by
Eva Benito Garagorri70 wrote:
Dear list, I have been working with some microarray data where I make simple comparisons between a control and a treatment group. I used vsn to normalize and limma to find differentially expressed genes. I used "topTable" with number=Inf and p.value=0.05 to get all significant genes for a given condition, regardless of their fold change. I seem to be getting significant genes with a fold change systematically bigger than 1.3 (linear scale), both up and downregulated. I was wondering whether this can happen and why or whether I am missing something or making a mistake in the analysis. I guess I expected that some genes would be significant even with a very small fold change. I tried the same analysis with the ALL dataset and I found that genes were significant with a fold change above ~1.1. Below is the code I used for the analysis of the ALL dataset, which is essentially the same I used for my own analysis. Thanks in advance! Eva library(ALL) data(ALL) library(limma) f = ALL$mol.biol mat = model.matrix(~f, ALL) lm = lmFit(ALL, mat) eb = eBayes(lm) sign = topTable(eb, coef=2,number=Inf,p.value=0.05) dim(sign) sort(2^sign$logFC) > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 loaded via a namespace (and not attached): [1] tools_2.11.1 ---------- Eva Benito Garagorri PhD program in Neurosciences Institute for Neurosciences in Alicante UMH-CSIC San Juan de Alicante 03550 Spain ebenito@umh.es (34) 965 91 92 33 [[alternative HTML version deleted]]
microarray vsn limma • 1.1k views
ADD COMMENTlink modified 8.9 years ago by Sean Davis21k • written 8.9 years ago by Eva Benito Garagorri70
Answer: limma and minimum fold change
1
gravatar for Sean Davis
8.9 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri <ebenito@umh.es>wrote: > Dear list, > > I have been working with some microarray data where I make simple > comparisons between a control and a treatment group. I used vsn to normalize > and limma to find differentially expressed genes. I used "topTable" with > number=Inf and p.value=0.05 to get all significant genes for a given > condition, regardless of their fold change. I seem to be getting significant > genes with a fold change systematically bigger than 1.3 (linear scale), both > up and downregulated. I was wondering whether this can happen and why or > whether I am missing something or making a mistake in the analysis. I guess > I expected that some genes would be significant even with a very small fold > change. I tried the same analysis with the ALL dataset and I found that > genes were significant with a fold change above ~1.1. Below is the code I > used for the analysis of the ALL dataset, which is essentially the same I > used for my own analysis. Thanks in advance! > > Hi, Eva. I think what you are saying is that you are seeing genes that change by 30% and that you are concerned that you should see genes with a smaller fold change being significant? The fold change that is significant will be dependent on a number of factors, but sample size and biologic factors such as the amount of heterogeneity in the two conditions are strong contributors. I wouldn't say that there is anything wrong with your results, at least without knowing more detail. Sean > Eva > > > library(ALL) > data(ALL) > library(limma) > > > f = ALL$mol.biol > > mat = model.matrix(~f, ALL) > lm = lmFit(ALL, mat) > eb = eBayes(lm) > > sign = topTable(eb, coef=2,number=Inf,p.value=0.05) > dim(sign) > > sort(2^sign$logFC) > > > > > > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] tools_2.11.1 > > > > > > > > > ---------- > Eva Benito Garagorri > PhD program in Neurosciences > Institute for Neurosciences in Alicante > UMH-CSIC > San Juan de Alicante > 03550 > Spain > ebenito@umh.es > (34) 965 91 92 33 > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 8.9 years ago by Sean Davis21k
Hi Eva in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." Best wishes Wolfgang Il Nov/13/10 10:44 PM, Sean Davis ha scritto: > On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito at="" umh.es="">wrote: > >> Dear list, >> >> I have been working with some microarray data where I make simple >> comparisons between a control and a treatment group. I used vsn to normalize >> and limma to find differentially expressed genes. I used "topTable" with >> number=Inf and p.value=0.05 to get all significant genes for a given >> condition, regardless of their fold change. I seem to be getting significant >> genes with a fold change systematically bigger than 1.3 (linear scale), both >> up and downregulated. I was wondering whether this can happen and why or >> whether I am missing something or making a mistake in the analysis. I guess >> I expected that some genes would be significant even with a very small fold >> change. I tried the same analysis with the ALL dataset and I found that >> genes were significant with a fold change above ~1.1. Below is the code I >> used for the analysis of the ALL dataset, which is essentially the same I >> used for my own analysis. Thanks in advance! >> >> > Hi, Eva. I think what you are saying is that you are seeing genes that > change by 30% and that you are concerned that you should see genes with a > smaller fold change being significant? The fold change that is significant > will be dependent on a number of factors, but sample size and biologic > factors such as the amount of heterogeneity in the two conditions are strong > contributors. I wouldn't say that there is anything wrong with your > results, at least without knowing more detail. > > Sean > > > >> Eva >> >> >> library(ALL) >> data(ALL) >> library(limma) >> >> >> f = ALL$mol.biol >> >> mat = model.matrix(~f, ALL) >> lm = lmFit(ALL, mat) >> eb = eBayes(lm) >> >> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >> dim(sign) >> >> sort(2^sign$logFC) >> >> >> >> >> >>> sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-apple-darwin9.8.0 >> >> locale: >> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] tools_2.11.1 >> >> >> >> >> >> >> >> >> ---------- >> Eva Benito Garagorri >> PhD program in Neurosciences >> Institute for Neurosciences in Alicante >> UMH-CSIC >> San Juan de Alicante >> 03550 >> Spain >> ebenito at umh.es >> (34) 965 91 92 33 >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 8.9 years ago by Wolfgang Huber13k
Hi Eva, Volcano plots are very useful for showing the relationship between logFC and pvalue. # logFC vs uncorrected p-values plot(sign$logFC, -log10(sign$p.value)) # logFC vs corrected p-values plot(sign$logFC, -log10(sign$adj.p.val)) I suspect you'll see a distinct V shape, with large dispersion on either side of the V cheers, Mark On 15/11/2010, at 4:43 AM, Wolfgang Huber wrote: > > Hi Eva > > in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. > > With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. > > Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." > > Best wishes > Wolfgang > > > Il Nov/13/10 10:44 PM, Sean Davis ha scritto: >> On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito at="" umh.es="">wrote: >> >>> Dear list, >>> >>> I have been working with some microarray data where I make simple >>> comparisons between a control and a treatment group. I used vsn to normalize >>> and limma to find differentially expressed genes. I used "topTable" with >>> number=Inf and p.value=0.05 to get all significant genes for a given >>> condition, regardless of their fold change. I seem to be getting significant >>> genes with a fold change systematically bigger than 1.3 (linear scale), both >>> up and downregulated. I was wondering whether this can happen and why or >>> whether I am missing something or making a mistake in the analysis. I guess >>> I expected that some genes would be significant even with a very small fold >>> change. I tried the same analysis with the ALL dataset and I found that >>> genes were significant with a fold change above ~1.1. Below is the code I >>> used for the analysis of the ALL dataset, which is essentially the same I >>> used for my own analysis. Thanks in advance! >>> >>> >> Hi, Eva. I think what you are saying is that you are seeing genes that >> change by 30% and that you are concerned that you should see genes with a >> smaller fold change being significant? The fold change that is significant >> will be dependent on a number of factors, but sample size and biologic >> factors such as the amount of heterogeneity in the two conditions are strong >> contributors. I wouldn't say that there is anything wrong with your >> results, at least without knowing more detail. >> >> Sean >> >> >> >>> Eva >>> >>> >>> library(ALL) >>> data(ALL) >>> library(limma) >>> >>> >>> f = ALL$mol.biol >>> >>> mat = model.matrix(~f, ALL) >>> lm = lmFit(ALL, mat) >>> eb = eBayes(lm) >>> >>> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >>> dim(sign) >>> >>> sort(2^sign$logFC) >>> >>> >>> >>> >>> >>>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-apple-darwin9.8.0 >>> >>> locale: >>> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.11.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> ---------- >>> Eva Benito Garagorri >>> PhD program in Neurosciences >>> Institute for Neurosciences in Alicante >>> UMH-CSIC >>> San Juan de Alicante >>> 03550 >>> Spain >>> ebenito at umh.es >>> (34) 965 91 92 33 >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 8.9 years ago by Mark Cowley910
Hi, First of all, I want to thank everyone for their kind answers. Thanks! I tried the conventional t-test after non-specific filtering on my data using the command "rowttests" from the "genefilter" package and that gives significant genes (after BH-adjustment) with a smaller fold change close to 1.1, so I guess what I see is a consequence of using limma and a small (5 controls and 3 treatment) number of replicates... I also tried the volcano plot representation and that showed a clear dispersion on the sides of the 'V', as Mark suggested. I was just trying to understand why it is that I'm observing this particular effect on the fc on my data, even though I may end up filtering the list of significant genes to remove those with a very small change... but I was curious to know where and how that was happening. Thanks a lot for your help everyone! Best regards, Eva On 14/11/2010, at 23:51, Mark Cowley wrote: > Hi Eva, > Volcano plots are very useful for showing the relationship between logFC and pvalue. > > # logFC vs uncorrected p-values > plot(sign$logFC, -log10(sign$p.value)) > # logFC vs corrected p-values > plot(sign$logFC, -log10(sign$adj.p.val)) > > I suspect you'll see a distinct V shape, with large dispersion on either side of the V > > cheers, > Mark > On 15/11/2010, at 4:43 AM, Wolfgang Huber wrote: > >> >> Hi Eva >> >> in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. >> >> With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. >> >> Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." >> >> Best wishes >> Wolfgang >> >> >> Il Nov/13/10 10:44 PM, Sean Davis ha scritto: >>> On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito@umh.es>wrote: >>> >>>> Dear list, >>>> >>>> I have been working with some microarray data where I make simple >>>> comparisons between a control and a treatment group. I used vsn to normalize >>>> and limma to find differentially expressed genes. I used "topTable" with >>>> number=Inf and p.value=0.05 to get all significant genes for a given >>>> condition, regardless of their fold change. I seem to be getting significant >>>> genes with a fold change systematically bigger than 1.3 (linear scale), both >>>> up and downregulated. I was wondering whether this can happen and why or >>>> whether I am missing something or making a mistake in the analysis. I guess >>>> I expected that some genes would be significant even with a very small fold >>>> change. I tried the same analysis with the ALL dataset and I found that >>>> genes were significant with a fold change above ~1.1. Below is the code I >>>> used for the analysis of the ALL dataset, which is essentially the same I >>>> used for my own analysis. Thanks in advance! >>>> >>>> >>> Hi, Eva. I think what you are saying is that you are seeing genes that >>> change by 30% and that you are concerned that you should see genes with a >>> smaller fold change being significant? The fold change that is significant >>> will be dependent on a number of factors, but sample size and biologic >>> factors such as the amount of heterogeneity in the two conditions are strong >>> contributors. I wouldn't say that there is anything wrong with your >>> results, at least without knowing more detail. >>> >>> Sean >>> >>> >>> >>>> Eva >>>> >>>> >>>> library(ALL) >>>> data(ALL) >>>> library(limma) >>>> >>>> >>>> f = ALL$mol.biol >>>> >>>> mat = model.matrix(~f, ALL) >>>> lm = lmFit(ALL, mat) >>>> eb = eBayes(lm) >>>> >>>> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >>>> dim(sign) >>>> >>>> sort(2^sign$logFC) >>>> >>>> >>>> >>>> >>>> >>>>> sessionInfo() >>>> R version 2.11.1 (2010-05-31) >>>> x86_64-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.11.1 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ---------- >>>> Eva Benito Garagorri >>>> PhD program in Neurosciences >>>> Institute for Neurosciences in Alicante >>>> UMH-CSIC >>>> San Juan de Alicante >>>> 03550 >>>> Spain >>>> ebenito@umh.es >>>> (34) 965 91 92 33 >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ---------- Eva Benito Garagorri PhD program in Neurosciences Institute for Neurosciences in Alicante UMH-CSIC San Juan de Alicante 03550 Spain ebenito@umh.es (34) 965 91 92 33 [[alternative HTML version deleted]]
ADD REPLYlink written 8.9 years ago by Eva Benito Garagorri70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 419 users visited in the last hour