Please comment the way I'm thinking about the way to find differentially expressed genes

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 25 days ago

United States

On Fri, Jul 25, 2014 at 2:32 PM, Kaj Chokeshaiusaha <kaj.chk@gmail.com> wrote: > Dear all, > Thank you very much for your comments. I now feel confident to stick > with the usual approach. > There is one thing that sticks in my mind all the time. This is > probably due to my lack of basic knowledge. I'm wondering about people > who generate sets of data using methods like leave-one-out from their > original data. After that applying test (like limma), and finally > check for top genes most repeated in differentially expressed gene > lists produced by all sets of data (for example, 4 out of 6). > Is this kind of approach better than sticking to the list of > differentially expressed genes list produced by original data? > > In general, you will want to use all your data when you have only 3 samples per condition. Your power will be maximized this way. To answer your question, ad hoc approaches can be useful, but you really have to think about whether or not you can quantify how "good" your gene list is after applying such an approach (what is the p-value or false-discovery-rate). Since you may have trouble doing that for your specific example, I doubt that you gain anything from even attempting it. Sean > Thank you very much in advance for your patience with me. > > With Respects, > Kaj > > 2557-07-25 22:53 GMT+07:00, Sean Davis <sdavis2@mail.nih.gov>: > > Hi, Kaj. > > > > You may be overthinking things a bit. Differential gene expression > > analysis has a lot of history and has developed around the constraints > > imposed by small sample sizes, so most modern tools for doing > differential > > expression analysis will handle your data in a rational and statistically > > sound way. I would considering starting with limma; the user guide is > > excellent and the package is very highly utilized for experiments > > presumably just like yours. I don't want to discourage experimentation, > > but it is often best to start with a known analysis if only for > comparison > > if you do try something more exotic. > > > > Sean > > > > > > > > On Fri, Jul 25, 2014 at 11:20 AM, Kaj Chokeshaiusaha [guest] < > > guest@bioconductor.org> wrote: > > > >> Dear R helpers, > >> > >> I'm a starter in gene expression analysis, and I must apologize everyone > >> in the first place if I'm posting something irritated. My attemp is just > >> to > >> figure out an alternative way to find out differentailly expressed genes > >> in > >> low replicated datasets. > >> > >> In case that, I have very few number of replicated datasets per group > >> (2-3 > >> replicates per group). I'm wondering whether I can generate several > >> datasets from my original datasets I have (using methods like Bootstrap) > >> and then perform the test to find out the lists of differentially > >> expressed > >> genes from my created datasets. After that I count the repeated genes > >> from > >> all lists and pick the top ones as differentially expressed genes. > >> > >> Please comment the idea, I don't want to slip too far in the wrong > >> approach. > >> > >> With Respects, > >> Kaj > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 3.1.0 (2014-04-10) > >> Platform: x86_64-pc-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > >> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] CMA_1.22.0 Biobase_2.24.0 BiocGenerics_0.10.0 > >> [4] e1071_1.6-3 > >> > >> loaded via a namespace (and not attached): > >> [1] class_7.3-10 tools_3.1.0 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > [[alternative HTML version deleted]]

• 1.1k views

ADD COMMENT • link updated 11.5 years ago by Kaj Chokeshaiusaha ▴ 70 • written 11.5 years ago by Sean Davis 21k

0

Entering edit mode

Kaj Chokeshaiusaha ▴ 70

@kaj-chokeshaiusaha-6623

Last seen 10.8 years ago

Thailand

Dear Prof. David, Thank you very much for your patience. Your indication of having three samples really clarify everything. I will follow the usual way. Thank you very much again for your patience and kindness, Kaj 2557-07-26 1:51 GMT+07:00, Sean Davis <sdavis2 at="" mail.nih.gov="">: > On Fri, Jul 25, 2014 at 2:32 PM, Kaj Chokeshaiusaha <kaj.chk at="" gmail.com=""> > wrote: > >> Dear all, >> Thank you very much for your comments. I now feel confident to stick >> with the usual approach. >> There is one thing that sticks in my mind all the time. This is >> probably due to my lack of basic knowledge. I'm wondering about people >> who generate sets of data using methods like leave-one-out from their >> original data. After that applying test (like limma), and finally >> check for top genes most repeated in differentially expressed gene >> lists produced by all sets of data (for example, 4 out of 6). >> Is this kind of approach better than sticking to the list of >> differentially expressed genes list produced by original data? >> >> > In general, you will want to use all your data when you have only 3 samples > per condition. Your power will be maximized this way. > > To answer your question, ad hoc approaches can be useful, but you really > have to think about whether or not you can quantify how "good" your gene > list is after applying such an approach (what is the p-value or > false-discovery-rate). Since you may have trouble doing that for your > specific example, I doubt that you gain anything from even attempting it. > > Sean > > > >> Thank you very much in advance for your patience with me. >> >> With Respects, >> Kaj >> >> 2557-07-25 22:53 GMT+07:00, Sean Davis <sdavis2 at="" mail.nih.gov="">: >> > Hi, Kaj. >> > >> > You may be overthinking things a bit. Differential gene expression >> > analysis has a lot of history and has developed around the constraints >> > imposed by small sample sizes, so most modern tools for doing >> differential >> > expression analysis will handle your data in a rational and >> > statistically >> > sound way. I would considering starting with limma; the user guide is >> > excellent and the package is very highly utilized for experiments >> > presumably just like yours. I don't want to discourage >> > experimentation, >> > but it is often best to start with a known analysis if only for >> comparison >> > if you do try something more exotic. >> > >> > Sean >> > >> > >> > >> > On Fri, Jul 25, 2014 at 11:20 AM, Kaj Chokeshaiusaha [guest] < >> > guest at bioconductor.org> wrote: >> > >> >> Dear R helpers, >> >> >> >> I'm a starter in gene expression analysis, and I must apologize >> >> everyone >> >> in the first place if I'm posting something irritated. My attemp is >> >> just >> >> to >> >> figure out an alternative way to find out differentailly expressed >> >> genes >> >> in >> >> low replicated datasets. >> >> >> >> In case that, I have very few number of replicated datasets per group >> >> (2-3 >> >> replicates per group). I'm wondering whether I can generate several >> >> datasets from my original datasets I have (using methods like >> >> Bootstrap) >> >> and then perform the test to find out the lists of differentially >> >> expressed >> >> genes from my created datasets. After that I count the repeated genes >> >> from >> >> all lists and pick the top ones as differentially expressed genes. >> >> >> >> Please comment the idea, I don't want to slip too far in the wrong >> >> approach. >> >> >> >> With Respects, >> >> Kaj >> >> >> >> >> >> -- output of sessionInfo(): >> >> >> >> R version 3.1.0 (2014-04-10) >> >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> >> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] parallel stats graphics grDevices utils datasets >> >> methods >> >> [8] base >> >> >> >> other attached packages: >> >> [1] CMA_1.22.0 Biobase_2.24.0 BiocGenerics_0.10.0 >> >> [4] e1071_1.6-3 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] class_7.3-10 tools_3.1.0 >> >> >> >> -- >> >> Sent via the guest posting facility at bioconductor.org. >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >

ADD COMMENT • link 11.5 years ago Kaj Chokeshaiusaha ▴ 70

Login before adding your answer.