How do you keep track of your analyses

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 9.6 years ago

Hello, I am doing an increasing number of bioconductor analyses for various people and I am starting to find it difficult to keep track of what I have done previously. A common question six months after the initial analysis is something like "Can you do the same as x but change y". Has anyone got any idea on the best way to do this. The essential components to keep track of are: * input files * R code used * output files * Description of what you aim to do. The two possibilities that I can think of is: 1) Some structured directories e.g. ProjectName_Person /Description.txt /Analysis1_date /InputFiles /Rcode /Output/Outputfiles 2) Some sort of personal wiki like TiddlyWiki Its got to be searchable in some form too. Any experiences in this realm? Many thanks -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

Cancer Cancer • 1.3k views

ADD COMMENT • link updated 15.6 years ago by Bio Sam ▴ 60 • written 15.6 years ago by Daniel Brewer ★ 1.9k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Tue, Sep 23, 2008 at 11:51 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > Hello, > > I am doing an increasing number of bioconductor analyses for various > people and I am starting to find it difficult to keep track of what I > have done previously. A common question six months after the initial > analysis is something like "Can you do the same as x but change y". Has > anyone got any idea on the best way to do this. > > The essential components to keep track of are: > * input files > * R code used > * output files > * Description of what you aim to do. > > The two possibilities that I can think of is: > 1) Some structured directories e.g. > ProjectName_Person > /Description.txt > /Analysis1_date > /InputFiles > /Rcode > /Output/Outputfiles I store all the raw data and sample information in a directory called "Results", R code in a directory called "R" with subdirectories for figures and output (textual). I use ESS/emacs for everything, so it is easy to maintain a "script" file that has every R command that I use contained in it. If I need to rerun an analysis, I can do so by simply running the entire script. However, with ESS/emacs, I can easily submit just the pieces that I need if doing a subset of the original analysis. Another option is to create sweave documents for each project. With Seth Falcon's weaver, it becomes possible to do this for larger projects because caching can be employed. Also, if you need to run a subset of analysis for a quick check, you can always pull out the relevant R code pretty easily. A step that I have not taken is to version-control the R scripts for projects. I do version-control all common R code that I use, however. Using something git or svn is VERY helpful for anyone doing any amount of coding. > 2) Some sort of personal wiki like TiddlyWiki > > Its got to be searchable in some form too. I so use a wiki at times for highly collaborative, complicated, long-term projects where communication is key. However, I find it a bit tedious for the typical gene expression study where the task can be done quickly. Obviously, just my $0.02 worth. Sean

ADD COMMENT • link 15.6 years ago Sean Davis 21k

0

Entering edit mode

Philipp Pagel ▴ 190

@philipp-pagel-2810

Last seen 9.6 years ago

> I am doing an increasing number of bioconductor analyses for various > people and I am starting to find it difficult to keep track of what I > have done previously. A common question six months after the initial > analysis is something like "Can you do the same as x but change y". Has > anyone got any idea on the best way to do this. > > The essential components to keep track of are: > * input files > * R code used > * output files > * Description of what you aim to do. I keep each analysis (e.g. a set of related arrays) its own folder. Furthermore, I use Sweave to keep documentation, code and interpretation together. In order not to clutter the folder too much, I put all input data in one subfolder, images in another ... So far I am very happy wiht that approach. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel

ADD COMMENT • link 15.6 years ago Philipp Pagel ▴ 190

0

Entering edit mode

Michal Blazejczyk ▴ 320

@michal-blazejczyk-2231

Last seen 9.6 years ago

Hi Ali, If you permit a little bit of self-promotion... You may want to look at FlexArray, a Windows application we developed that is a thick wrapper around Bioconductor to do statistical data analysis on expression microarray results: http://genomequebec.mcgill.ca/FlexArray/ FlexArray will not handle huge projects or very complex analyses, but it performs quite well on small- to mid-sized Affy and Illumina gene expression projects. And it is extensible (algorithm plug-ins). One of our guiding principles has been to keep track of everything that is done during an analysis: input data, output data, parameters, R code executed... So you can always go back, review your past analysis steps, or export an analysis as a "protocol", and then apply it to new data, after tweaking some parameters if you wish. Best regards, Michal Blazejczyk FlexArray Lead Developer McGill University and Genome Quebec Innovation Centre 740, Dr Penfield Avenue Montreal (Quebec) Canada H3A 1A4 Phone: 514-398-5187 E-mail: flexarray at genomequebec.ca Internet: genomequebec.mcgill.ca/FlexArray > Hello, > > I am doing an increasing number of bioconductor analyses for various > people and I am starting to find it difficult to keep track of what I > have done previously. A common question six months after the initial > analysis is something like "Can you do the same as x but change y". > Has anyone got any idea on the best way to do this. > > The essential components to keep track of are: > * input files > * R code used > * output files > * Description of what you aim to do. > > The two possibilities that I can think of is: > 1) Some structured directories e.g. > ProjectName_Person > /Description.txt > /Analysis1_date > /InputFiles > /Rcode > /Output/Outputfiles > > 2) Some sort of personal wiki like TiddlyWiki > > Its got to be searchable in some form too. > > Any experiences in this realm? > > Many thanks >

ADD COMMENT • link 15.6 years ago Michal Blazejczyk ▴ 320

0

Entering edit mode

Bio Sam ▴ 60

@bio-sam-1879

Last seen 9.6 years ago

I like to keep a main working directory, then make a sub directory for each project with a directory name containing date, contact name, and project title. This makes it easy to see when I started a project, and keeps things sorted chronologically: /working /working/2008-02-29-Contact_project Inside of each project folder I keep a separate folder for each step of the analysis, with R version so that I know if data files are old or not: /working/2008-02-29-Contact_project/qa_2.7 /working/2008-02-29-Contact_project/filter_2.7 /working/2008-02-29-Contact_project/sam_2.7 /working/2008-02-29-Contact_project/go_2.7 /working/2008-02-29-Contact_project/pathway_2.7 I keep a script file for each step within each folder, and store an RData object with normalized data in the project folder so that it is easily loaded from any sub-folder. I also try to keep a log of all analysis in the project folder so that I can remember what I did. I use Linux and KDE, so Kate (http://kate-editor.org/) is my editor of choice because it is fast, lightweight and supports R syntax highlighting. It also allows you to open many files at once and store "sessions" which makes it easy to automatically open many R scripts for big projects. I backup the folder weekly using Flyback (http://code.google.com/p/flyback/) which supports incremental and automatic backups at the click of a button. Sam On Tue, Sep 23, 2008 at 8:51 AM, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > Hello, > > I am doing an increasing number of bioconductor analyses for various > people and I am starting to find it difficult to keep track of what I > have done previously. A common question six months after the initial > analysis is something like "Can you do the same as x but change y". Has > anyone got any idea on the best way to do this. > > The essential components to keep track of are: > * input files > * R code used > * output files > * Description of what you aim to do. > > The two possibilities that I can think of is: > 1) Some structured directories e.g. > ProjectName_Person > /Description.txt > /Analysis1_date > /InputFiles > /Rcode > /Output/Outputfiles > > 2) Some sort of personal wiki like TiddlyWiki > > Its got to be searchable in some form too. > > Any experiences in this realm? > > Many thanks > > -- > ************************************************************** > Daniel Brewer, Ph.D. > > Institute of Cancer Research > Molecular Carcinogenesis > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the a...{{dropped:2}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 15.6 years ago Bio Sam ▴ 60

Login before adding your answer.