limma and minimum fold change

1

Entering edit mode

Eva Benito Garagorri ▴ 70

@eva-benito-garagorri-4263

Last seen 9.7 years ago

Dear list, I have been working with some microarray data where I make simple comparisons between a control and a treatment group. I used vsn to normalize and limma to find differentially expressed genes. I used "topTable" with number=Inf and p.value=0.05 to get all significant genes for a given condition, regardless of their fold change. I seem to be getting significant genes with a fold change systematically bigger than 1.3 (linear scale), both up and downregulated. I was wondering whether this can happen and why or whether I am missing something or making a mistake in the analysis. I guess I expected that some genes would be significant even with a very small fold change. I tried the same analysis with the ALL dataset and I found that genes were significant with a fold change above ~1.1. Below is the code I used for the analysis of the ALL dataset, which is essentially the same I used for my own analysis. Thanks in advance! Eva library(ALL) data(ALL) library(limma) f = ALL$mol.biol mat = model.matrix(~f, ALL) lm = lmFit(ALL, mat) eb = eBayes(lm) sign = topTable(eb, coef=2,number=Inf,p.value=0.05) dim(sign) sort(2^sign$logFC) > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 loaded via a namespace (and not attached): [1] tools_2.11.1 ---------- Eva Benito Garagorri PhD program in Neurosciences Institute for Neurosciences in Alicante UMH-CSIC San Juan de Alicante 03550 Spain ebenito@umh.es (34) 965 91 92 33 [[alternative HTML version deleted]]

Microarray vsn limma Microarray vsn limma • 2.2k views

ADD COMMENT • link updated 13.5 years ago by Sean Davis 21k • written 13.5 years ago by Eva Benito Garagorri ▴ 70

1

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri <ebenito@umh.es>wrote: > Dear list, > > I have been working with some microarray data where I make simple > comparisons between a control and a treatment group. I used vsn to normalize > and limma to find differentially expressed genes. I used "topTable" with > number=Inf and p.value=0.05 to get all significant genes for a given > condition, regardless of their fold change. I seem to be getting significant > genes with a fold change systematically bigger than 1.3 (linear scale), both > up and downregulated. I was wondering whether this can happen and why or > whether I am missing something or making a mistake in the analysis. I guess > I expected that some genes would be significant even with a very small fold > change. I tried the same analysis with the ALL dataset and I found that > genes were significant with a fold change above ~1.1. Below is the code I > used for the analysis of the ALL dataset, which is essentially the same I > used for my own analysis. Thanks in advance! > > Hi, Eva. I think what you are saying is that you are seeing genes that change by 30% and that you are concerned that you should see genes with a smaller fold change being significant? The fold change that is significant will be dependent on a number of factors, but sample size and biologic factors such as the amount of heterogeneity in the two conditions are strong contributors. I wouldn't say that there is anything wrong with your results, at least without knowing more detail. Sean > Eva > > > library(ALL) > data(ALL) > library(limma) > > > f = ALL$mol.biol > > mat = model.matrix(~f, ALL) > lm = lmFit(ALL, mat) > eb = eBayes(lm) > > sign = topTable(eb, coef=2,number=Inf,p.value=0.05) > dim(sign) > > sort(2^sign$logFC) > > > > > > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] tools_2.11.1 > > > > > > > > > ---------- > Eva Benito Garagorri > PhD program in Neurosciences > Institute for Neurosciences in Alicante > UMH-CSIC > San Juan de Alicante > 03550 > Spain > ebenito@umh.es > (34) 965 91 92 33 > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.5 years ago Sean Davis 21k

0

Entering edit mode

Hi Eva in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." Best wishes Wolfgang Il Nov/13/10 10:44 PM, Sean Davis ha scritto: > On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito at="" umh.es="">wrote: > >> Dear list, >> >> I have been working with some microarray data where I make simple >> comparisons between a control and a treatment group. I used vsn to normalize >> and limma to find differentially expressed genes. I used "topTable" with >> number=Inf and p.value=0.05 to get all significant genes for a given >> condition, regardless of their fold change. I seem to be getting significant >> genes with a fold change systematically bigger than 1.3 (linear scale), both >> up and downregulated. I was wondering whether this can happen and why or >> whether I am missing something or making a mistake in the analysis. I guess >> I expected that some genes would be significant even with a very small fold >> change. I tried the same analysis with the ALL dataset and I found that >> genes were significant with a fold change above ~1.1. Below is the code I >> used for the analysis of the ALL dataset, which is essentially the same I >> used for my own analysis. Thanks in advance! >> >> > Hi, Eva. I think what you are saying is that you are seeing genes that > change by 30% and that you are concerned that you should see genes with a > smaller fold change being significant? The fold change that is significant > will be dependent on a number of factors, but sample size and biologic > factors such as the amount of heterogeneity in the two conditions are strong > contributors. I wouldn't say that there is anything wrong with your > results, at least without knowing more detail. > > Sean > > > >> Eva >> >> >> library(ALL) >> data(ALL) >> library(limma) >> >> >> f = ALL$mol.biol >> >> mat = model.matrix(~f, ALL) >> lm = lmFit(ALL, mat) >> eb = eBayes(lm) >> >> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >> dim(sign) >> >> sort(2^sign$logFC) >> >> >> >> >> >>> sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-apple-darwin9.8.0 >> >> locale: >> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] tools_2.11.1 >> >> >> >> >> >> >> >> >> ---------- >> Eva Benito Garagorri >> PhD program in Neurosciences >> Institute for Neurosciences in Alicante >> UMH-CSIC >> San Juan de Alicante >> 03550 >> Spain >> ebenito at umh.es >> (34) 965 91 92 33 >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi Eva, Volcano plots are very useful for showing the relationship between logFC and pvalue. # logFC vs uncorrected p-values plot(sign$logFC, -log10(sign$p.value)) # logFC vs corrected p-values plot(sign$logFC, -log10(sign$adj.p.val)) I suspect you'll see a distinct V shape, with large dispersion on either side of the V cheers, Mark On 15/11/2010, at 4:43 AM, Wolfgang Huber wrote: > > Hi Eva > > in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. > > With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. > > Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." > > Best wishes > Wolfgang > > > Il Nov/13/10 10:44 PM, Sean Davis ha scritto: >> On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito at="" umh.es="">wrote: >> >>> Dear list, >>> >>> I have been working with some microarray data where I make simple >>> comparisons between a control and a treatment group. I used vsn to normalize >>> and limma to find differentially expressed genes. I used "topTable" with >>> number=Inf and p.value=0.05 to get all significant genes for a given >>> condition, regardless of their fold change. I seem to be getting significant >>> genes with a fold change systematically bigger than 1.3 (linear scale), both >>> up and downregulated. I was wondering whether this can happen and why or >>> whether I am missing something or making a mistake in the analysis. I guess >>> I expected that some genes would be significant even with a very small fold >>> change. I tried the same analysis with the ALL dataset and I found that >>> genes were significant with a fold change above ~1.1. Below is the code I >>> used for the analysis of the ALL dataset, which is essentially the same I >>> used for my own analysis. Thanks in advance! >>> >>> >> Hi, Eva. I think what you are saying is that you are seeing genes that >> change by 30% and that you are concerned that you should see genes with a >> smaller fold change being significant? The fold change that is significant >> will be dependent on a number of factors, but sample size and biologic >> factors such as the amount of heterogeneity in the two conditions are strong >> contributors. I wouldn't say that there is anything wrong with your >> results, at least without knowing more detail. >> >> Sean >> >> >> >>> Eva >>> >>> >>> library(ALL) >>> data(ALL) >>> library(limma) >>> >>> >>> f = ALL$mol.biol >>> >>> mat = model.matrix(~f, ALL) >>> lm = lmFit(ALL, mat) >>> eb = eBayes(lm) >>> >>> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >>> dim(sign) >>> >>> sort(2^sign$logFC) >>> >>> >>> >>> >>> >>>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-apple-darwin9.8.0 >>> >>> locale: >>> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.11.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> ---------- >>> Eva Benito Garagorri >>> PhD program in Neurosciences >>> Institute for Neurosciences in Alicante >>> UMH-CSIC >>> San Juan de Alicante >>> 03550 >>> Spain >>> ebenito at umh.es >>> (34) 965 91 92 33 >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.5 years ago Mark Cowley ▴ 910

0

Entering edit mode

Hi, First of all, I want to thank everyone for their kind answers. Thanks! I tried the conventional t-test after non-specific filtering on my data using the command "rowttests" from the "genefilter" package and that gives significant genes (after BH-adjustment) with a smaller fold change close to 1.1, so I guess what I see is a consequence of using limma and a small (5 controls and 3 treatment) number of replicates... I also tried the volcano plot representation and that showed a clear dispersion on the sides of the 'V', as Mark suggested. I was just trying to understand why it is that I'm observing this particular effect on the fc on my data, even though I may end up filtering the list of significant genes to remove those with a very small change... but I was curious to know where and how that was happening. Thanks a lot for your help everyone! Best regards, Eva On 14/11/2010, at 23:51, Mark Cowley wrote: > Hi Eva, > Volcano plots are very useful for showing the relationship between logFC and pvalue. > > # logFC vs uncorrected p-values > plot(sign$logFC, -log10(sign$p.value)) > # logFC vs corrected p-values > plot(sign$logFC, -log10(sign$adj.p.val)) > > I suspect you'll see a distinct V shape, with large dispersion on either side of the V > > cheers, > Mark > On 15/11/2010, at 4:43 AM, Wolfgang Huber wrote: > >> >> Hi Eva >> >> in the ordinary t-test, the t-statistic is proportional to the ratio of fold change 'fc' and estimated within-group standard deviation 's'; and the p-value depends on that through a monotonous transformation, ie. setting a p-value cutoff is equivalent to setting a cutoff on |t|. s_gene is simply, for each gene separately, the empirical standard deviation. >> >> With limma/eBayes, you replace s by a cleverly calibrated compromise s_limma between s_gene and an overall pooled s_tot (across all genes). With a small number of replicates, that compromise is close to s_tot; with many replicates, close to s_gene; and somewhere inbetween in intermediate cases. Now, if (eBayes decides that) you have few replicates, then s_gene ~ s_tot is always the same for all genes, and your p-value cutoff is directly equivalent to a cutoff on fc. This, it appears, is what you see. >> >> Have a look, in your 'eb' object, at the slots 'df.prior' and 'df.residual'. The manual page of eBayes says: "s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively." >> >> Best wishes >> Wolfgang >> >> >> Il Nov/13/10 10:44 PM, Sean Davis ha scritto: >>> On Sat, Nov 13, 2010 at 1:56 PM, Eva Benito Garagorri<ebenito@umh.es>wrote: >>> >>>> Dear list, >>>> >>>> I have been working with some microarray data where I make simple >>>> comparisons between a control and a treatment group. I used vsn to normalize >>>> and limma to find differentially expressed genes. I used "topTable" with >>>> number=Inf and p.value=0.05 to get all significant genes for a given >>>> condition, regardless of their fold change. I seem to be getting significant >>>> genes with a fold change systematically bigger than 1.3 (linear scale), both >>>> up and downregulated. I was wondering whether this can happen and why or >>>> whether I am missing something or making a mistake in the analysis. I guess >>>> I expected that some genes would be significant even with a very small fold >>>> change. I tried the same analysis with the ALL dataset and I found that >>>> genes were significant with a fold change above ~1.1. Below is the code I >>>> used for the analysis of the ALL dataset, which is essentially the same I >>>> used for my own analysis. Thanks in advance! >>>> >>>> >>> Hi, Eva. I think what you are saying is that you are seeing genes that >>> change by 30% and that you are concerned that you should see genes with a >>> smaller fold change being significant? The fold change that is significant >>> will be dependent on a number of factors, but sample size and biologic >>> factors such as the amount of heterogeneity in the two conditions are strong >>> contributors. I wouldn't say that there is anything wrong with your >>> results, at least without knowing more detail. >>> >>> Sean >>> >>> >>> >>>> Eva >>>> >>>> >>>> library(ALL) >>>> data(ALL) >>>> library(limma) >>>> >>>> >>>> f = ALL$mol.biol >>>> >>>> mat = model.matrix(~f, ALL) >>>> lm = lmFit(ALL, mat) >>>> eb = eBayes(lm) >>>> >>>> sign = topTable(eb, coef=2,number=Inf,p.value=0.05) >>>> dim(sign) >>>> >>>> sort(2^sign$logFC) >>>> >>>> >>>> >>>> >>>> >>>>> sessionInfo() >>>> R version 2.11.1 (2010-05-31) >>>> x86_64-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] limma_3.4.4 ALL_1.4.7 Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.11.1 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ---------- >>>> Eva Benito Garagorri >>>> PhD program in Neurosciences >>>> Institute for Neurosciences in Alicante >>>> UMH-CSIC >>>> San Juan de Alicante >>>> 03550 >>>> Spain >>>> ebenito@umh.es >>>> (34) 965 91 92 33 >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ---------- Eva Benito Garagorri PhD program in Neurosciences Institute for Neurosciences in Alicante UMH-CSIC San Juan de Alicante 03550 Spain ebenito@umh.es (34) 965 91 92 33 [[alternative HTML version deleted]]

ADD REPLY • link 13.5 years ago Eva Benito Garagorri ▴ 70

Login before adding your answer.