Entering edit mode
Tentative announcement: This course will be withdrawn if
there is insufficient interest
Three day course on Bioconductor (intermediate level)
Instructor: Vincent Carey, Ph.D.
March 5,6,7, 9am - 5pm each day
Inn at Longwood, Boston Massachusetts
342 Longwood Ave, Boston MA, 02115
Tuition: $600 academic, $1200 commercial
Registration form: http://www.biostat.harvard.edu/~carey/form08.pdf
Questions: stvjc at channing.harvard.edu -- please do not post
questions
on this course to the list
A block of sleeping rooms will be available at Inn at Longwood
at approximately $189/night; contact 617 731 4700 after
Feb 5 and mention "Bioconductor conference".
This course provides a hands-on survey of Bioconductor tools
for working with genome scale data. The material targets students
with reasonable facility with R at the command line who wish
to get acquainted with data analysis for various experimental
paradigms. We will cover, among other things:
- the MAQC experimental design and platforms
- the oligo package and new facilities for dealing with
affymetrix chips (expression and DNA)
- illumina expression and SNP chip data
- SQLite facilities for biologic metadata and platform
annotation
- the MLInterfaces package for supervised learning
- the GGtools package for genetics of gene expression
Students who successfully complete the course will be enabled
- to transform raw outputs from affymetrix and illumina platforms
into analyzable ExpressionSets or allied containers,
- to apply various forms of statistical analysis to answer
questions about differential expression and genotype effects
in genome scale data
- to use various annotation resources such as GO and KEGG to
help interpret patterns in genome scale data
using only transparent and fully open source software
Requirements:
* prerequisites: There will be very little background material
provided
on either R or the assays to be studied. We are focusing on working
with digital artifacts of experiments (possibly retrieved from GEO, or
from a core, to which we may apply some QA, or which we accept as
valid numerical data). If you have no prior experience with R but are
interested in the course, be sure to have read Dalgaard, "Introductory
Statistics with R" (Springer) and/or the introductory material on
www.r-project.org.
* equipment: Every student must bring a reasonably modern laptop
computer with a DVD drive or a USB port to allow installation of
several GB of software and data. All software and data are supplied
for windows machines so that all students have identical working
environments. Mac or Linux laptops may be used, but students using
these will be expected to have good mastery of their operating
system so that the majority of students, who use windows, will not
be distracted by idiosyncratic support requests.
Format: Each major topic is addressed in a brief lecture.
A handout is provided with specific exercises and hints/partial
solutions. Students work independently or in teams to solve
exercises; the module concludes with discussion of the solution.
Tentative curriculum
Day 1:
* morning: four technologies in 'cooked' form
- transcript profiling: affy, illumina
- CHiP-chip (yeast)
- SNP-chips + expression
- aCGH + expression
* mid-day: containers: structure, population, methods
- arrays
- gene sets
- browser tracks
* afternoon: workflow components I
- capture
- QA
- preprocessing
Day 2:
* morning: workflow components II: annotation resources
- SQLite representations of array and general metadata annotations
- web services
* mid-day: statistical analysis concepts
- categorical methods
- limma and other regularized methods
- multiple comparisons
* afternoon: exploratory tools: visualization, PCA, clustering
Day 3:
* morning: exercises: MAQC, spike ins, genetics of gene expression
* mid-day: category and enrichment analyses; supervised learning
(MLInterfaces)
* afternoon: reports and audits; reproducible research
- Sweave/odfWeave
The information transmitted in this electronic
communica...{{dropped:9}}