which genes to choose for being truly differentially expressed from a long list
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear All, I used LIMMA for a dataset on Human plus 2.0 arrayto get fold change values for differentially expressed genes. I have a long list of 500 some genes with fold changes > 2 from the topTable function. How can I select genes which are most differentially expressed from this list? Thank you, EJ -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C [5] LC_TIME=English_India.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 [4] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
limma limma • 736 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States
Hi EJ, On Tuesday, December 03, 2013 10:04:01 AM, EJ [guest] wrote: > > Dear All, > I used LIMMA for a dataset on Human plus 2.0 arrayto get fold change values for differentially expressed genes. I have a long list of 500 some genes with fold changes > 2 from the topTable function. How can I select genes which are most differentially expressed from this list? You will have to first define what you mean by 'most differentially expressed'. If you mean 'largest fold change' then please see ?topTable, specifically the sort.by argument. Best, Jim > > Thank you, > EJ > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 > [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C > [5] LC_TIME=English_India.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 > [4] zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Thank you James. I actually meant to ask that if I have a list of 500 genes and they all have fold change values in the range of 2-3, If i condense the list based on cut off value of say 2.8 fold change I still have a list of 200 some genes. How could I eliminate other genes to include more parameters in the threshold to define them as showing differential expression. I am not sure if there is any such thing but I suppose there might some definition which helps to identify genes as a perfect candidate for being differentially expressed. Appreciate your help, EJ On 3 December 2013 20:42, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi EJ, > > > On Tuesday, December 03, 2013 10:04:01 AM, EJ [guest] wrote: > >> >> Dear All, >> I used LIMMA for a dataset on Human plus 2.0 arrayto get fold change >> values for differentially expressed genes. I have a long list of 500 some >> genes with fold changes > 2 from the topTable function. How can I select >> genes which are most differentially expressed from this list? >> > > You will have to first define what you mean by 'most differentially > expressed'. If you mean 'largest fold change' then please see ?topTable, > specifically the sort.by argument. > > Best, > > Jim > > > >> Thank you, >> EJ >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 >> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C >> [5] LC_TIME=English_India.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1 >> BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >> [4] zlibbioc_1.6.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi EJ, This is the time to revisit the original hypothesis you are testing, or if you are trying to generate hypotheses, then what hypotheses you would find interesting. In general, a set of differentially expressed genes isn't particularly compelling, especially when you start to think about Biological relevance and measurement errors. For example, it's pretty easy to come up with a single gene that appears to be differentially expressed, but is in fact a false positive. And what does a single differentially expressed gene (if a true positive) mean in the greater Biological context of your experiment? So univariate statistics aren't that useful IMO in this context, and trying to reduce your set of genes to a tractable number that you can eyeballometrically test for 'interestingness' is likely not the way to go. In other words, taking a list of genes and scanning them to see if you can find particular genes that you already know are implicated in the process you are examining is not likely to be a useful exercise. Instead, you might re-consider the rationale for doing this experiment and see if there is something you can do to further that cause. As an example, perhaps you are trying to find pathways that are perturbed by some treatment. There are any number of ways to try to tease that sort of information out; you can do GO hypergeometric tests (or KEGG tests if you prefer). Or you could look for interesting gene sets at the Broad, and do GSEA against those. Or maybe you want to mine the data for interesting relationships using something like WGCNA (a google search will get you there). Note that WGCNA is particularly useful if you have other (preferably continuous) phenotypes. Given that you apparently have a big signal here, I would recommend trying to use that signal rather than chopping away at your data simply to reduce the dimensionality. Best, Jim On Tuesday, December 03, 2013 10:39:05 AM, Ekta Jain wrote: > Thank you James. I actually meant to ask that if I have a list of 500 > genes and they all have fold change values in the range of 2-3, If i > condense the list based on cut off value of say 2.8 fold change I > still have a list of 200 some genes. How could I eliminate other genes > to include more parameters in the threshold to define them as showing > differential expression. I am not sure if there is any such thing but > I suppose there might some definition which helps to identify genes as > a perfect candidate for being differentially expressed. > > Appreciate your help, > > EJ > > > On 3 December 2013 20:42, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi EJ, > > > On Tuesday, December 03, 2013 10:04:01 AM, EJ [guest] wrote: > > > Dear All, > I used LIMMA for a dataset on Human plus 2.0 arrayto get fold > change values for differentially expressed genes. I have a > long list of 500 some genes with fold changes > 2 from the > topTable function. How can I select genes which are most > differentially expressed from this list? > > > You will have to first define what you mean by 'most > differentially expressed'. If you mean 'largest fold change' then > please see ?topTable, specifically the sort.by <http: sort.by=""> > argument. > > Best, > > Jim > > > > Thank you, > EJ > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 > [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C > [5] LC_TIME=English_India.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods > [8] base > > other attached packages: > [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1 > BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 BiocInstaller_1.10.3 > preprocessCore_1.22.0 > [4] zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Thanks very much for that detailed information James. I did do some pathway analysis using fishers exact test but I now understand that I can only filter out the original list when I subject it to further advanced analysis. It may not work in my favor if I were to reduce some data at the first stage itself. Your reply helps to a great deal. Highly appreciate it. Best regards, Ekta On 3 December 2013 21:45, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi EJ, > > This is the time to revisit the original hypothesis you are testing, or if > you are trying to generate hypotheses, then what hypotheses you would find > interesting. > > In general, a set of differentially expressed genes isn't particularly > compelling, especially when you start to think about Biological relevance > and measurement errors. For example, it's pretty easy to come up with a > single gene that appears to be differentially expressed, but is in fact a > false positive. And what does a single differentially expressed gene (if a > true positive) mean in the greater Biological context of your experiment? > > So univariate statistics aren't that useful IMO in this context, and > trying to reduce your set of genes to a tractable number that you can > eyeballometrically test for 'interestingness' is likely not the way to go. > In other words, taking a list of genes and scanning them to see if you can > find particular genes that you already know are implicated in the process > you are examining is not likely to be a useful exercise. > > Instead, you might re-consider the rationale for doing this experiment and > see if there is something you can do to further that cause. As an example, > perhaps you are trying to find pathways that are perturbed by some > treatment. There are any number of ways to try to tease that sort of > information out; you can do GO hypergeometric tests (or KEGG tests if you > prefer). Or you could look for interesting gene sets at the Broad, and do > GSEA against those. Or maybe you want to mine the data for interesting > relationships using something like WGCNA (a google search will get you > there). Note that WGCNA is particularly useful if you have other > (preferably continuous) phenotypes. > > Given that you apparently have a big signal here, I would recommend trying > to use that signal rather than chopping away at your data simply to reduce > the dimensionality. > > Best, > > Jim > > > On Tuesday, December 03, 2013 10:39:05 AM, Ekta Jain wrote: > >> Thank you James. I actually meant to ask that if I have a list of 500 >> genes and they all have fold change values in the range of 2-3, If i >> condense the list based on cut off value of say 2.8 fold change I >> still have a list of 200 some genes. How could I eliminate other genes >> to include more parameters in the threshold to define them as showing >> differential expression. I am not sure if there is any such thing but >> I suppose there might some definition which helps to identify genes as >> a perfect candidate for being differentially expressed. >> >> Appreciate your help, >> >> EJ >> >> >> On 3 December 2013 20:42, James W. MacDonald <jmacdon@uw.edu>> <mailto:jmacdon@uw.edu>> wrote: >> >> Hi EJ, >> >> >> On Tuesday, December 03, 2013 10:04:01 AM, EJ [guest] wrote: >> >> >> Dear All, >> I used LIMMA for a dataset on Human plus 2.0 arrayto get fold >> change values for differentially expressed genes. I have a >> long list of 500 some genes with fold changes > 2 from the >> topTable function. How can I select genes which are most >> differentially expressed from this list? >> >> >> You will have to first define what you mean by 'most >> differentially expressed'. If you mean 'largest fold change' then >> please see ?topTable, specifically the sort.by <http: sort.by=""> >> >> argument. >> >> Best, >> >> Jim >> >> >> >> Thank you, >> EJ >> >> -- output of sessionInfo(): >> >> R version 3.0.1 (2013-05-16) >> Platform: i386-w64-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 >> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C >> [5] LC_TIME=English_India.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods >> [8] base >> >> other attached packages: >> [1] limma_3.16.7 affy_1.38.1 Biobase_2.20.1 >> BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 BiocInstaller_1.10.3 >> preprocessCore_1.22.0 >> [4] zlibbioc_1.6.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org >> <http: bioconductor.org="">. >> >> _________________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> https://stat.ethz.ch/mailman/__listinfo/bioconductor >> >> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.__science.biology.informatics.__ >> conductor >> >> <http: news.gmane.org="" gmane.science.biology.informatics.="">> conductor> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6