Chip-seq quality control

0

Entering edit mode

Lucia Peixoto ▴ 330

@lucia-peixoto-4203

Last seen 9.7 years ago

Hi, I am new to Chip-seq, my experiment's sequencing has finished, and the read alignment is currently running The experiment was done for histone acetylation, and I have two types of controls: input DNA and unmodified histone. I have two conditions and 6 biological replicates of each condition I wanted some advice on how to perform basic quality control on Chip- seq data using Bioconductor and also some ideas of which kinds of biases people usually observe and I should keep my eyes open for any advice will be greatly appreciated! thanks Lucia [[alternative HTML version deleted]]

Sequencing Sequencing • 1.5k views

ADD COMMENT • link updated 12.6 years ago by Ivan Gregoretti ▴ 310 • written 12.6 years ago by Lucia Peixoto ▴ 330

0

Entering edit mode

Ivan Gregoretti ▴ 310

@ivan-gregoretti-3975

Last seen 9.6 years ago

Canada

Hello Lucia, A proper response to your post would take a lecture rather than an email. I can't do that but I can bullet the main points. I think that it will help you if you are indeed a newcomer to ChIP-seq. 1) Expect 10 million reads per sample for a genome the size of human. 2) Stick to SAM/BAM formats so that you can use well known, publicly available tools. Your best friend is called Picard. 3) Remove duplicates. Again, Picard is your best friend. 4) Create WIG files for all samples, treatments and controls so that you can display them simultaneously on any genome browser. 5) Find peaks with a well documented peak finder. 6) Compute enrichment for all treatments relative to their controls. So, points 4 and 6 are your quality controls at this stage. Once you know what a good immunoprecipitation looks like compared to a bad one, you can start diving into the details. You can invent your own quality indicators. For instance, I compute the proportion of tags inside the 1000 strongest peaks. I do that for BOTH treatment and controls. In my workflow, Bioconductor does not get involved until I reach point 6. Happy ChIPing. Ivan On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto <luciap at="" iscb.org=""> wrote: > Hi, > I am new to Chip-seq, my experiment's sequencing has finished, and the read > alignment is currently running > The experiment ?was done for histone acetylation, and I have two types of > controls: input DNA and unmodified histone. > I have two conditions and 6 biological replicates of each condition > I wanted some advice on how to perform basic quality control on Chip-seq > data using Bioconductor > and also some ideas of which kinds of biases people usually observe and I > should keep my eyes open for > any advice will be greatly appreciated! > thanks > > Lucia > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 12.6 years ago Ivan Gregoretti ▴ 310

0

Entering edit mode

On 10/04/2011 07:33 AM, Ivan Gregoretti wrote: > Hello Lucia, > > A proper response to your post would take a lecture rather than an > email. I can't do that but I can bullet the main points. I think that > it will help you if you are indeed a newcomer to ChIP-seq. > > 1) Expect 10 million reads per sample for a genome the size of human. I'd run some basic QA on your lanes, via ShortRead::qa on the fastq files (or bam if fastq are not available); use FastqSampler if memory is tight (but in general if memory is tight the solution will be to find a larger computer). See http://bioconductor.org/help/workflows/high-throughput-sequencing/ for qa and perhaps other operations common to RNAseq / ChIPseq work flows > > 2) Stick to SAM/BAM formats so that you can use well known, publicly > available tools. Your best friend is called Picard. People can and do use R / Bioconductor for Picard-like tasks. > 3) Remove duplicates. Again, Picard is your best friend. > 4) Create WIG files for all samples, treatments and controls so that > you can display them simultaneously on any genome browser. here for interactive use I would rather use basic R plotting commands, avoiding the round-trip and allowing programmatic interaction. > 5) Find peaks with a well documented peak finder. probably a good suggestion for a one-off or common ChIP; the chipseq vignette http://bioconductor.org/packages/release/bioc/html/chipseq.html provides inspiration for more flexible analysis; packages under the ChIPseq biocViews term (Software --> AssayTechnologies -> HighThroughputSequencing->ChIPSeq) might offer a solution tailored to your ChIP. > 6) Compute enrichment for all treatments relative to their controls. again the chipseq vignette is an alternative source. > > So, points 4 and 6 are your quality controls at this stage. Once you > know what a good immunoprecipitation looks like compared to a bad one, > you can start diving into the details. You can invent your own quality especially at getting a sense for good versus bad results the interactivity of R / Bioconductor seem essential. Martin > indicators. For instance, I compute the proportion of tags inside the > 1000 strongest peaks. I do that for BOTH treatment and controls. > > In my workflow, Bioconductor does not get involved until I reach point 6. > > Happy ChIPing. > > Ivan > > > > > > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at="" iscb.org=""> wrote: >> Hi, >> I am new to Chip-seq, my experiment's sequencing has finished, and the read >> alignment is currently running >> The experiment was done for histone acetylation, and I have two types of >> controls: input DNA and unmodified histone. >> I have two conditions and 6 biological replicates of each condition >> I wanted some advice on how to perform basic quality control on Chip-seq >> data using Bioconductor >> and also some ideas of which kinds of biases people usually observe and I >> should keep my eyes open for >> any advice will be greatly appreciated! >> thanks >> >> Lucia >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 12.6 years ago Martin Morgan 25k

0

Entering edit mode

Thanks very much for the suggestions I will likely have more questions as I start the analysis Lucia On Tue, Oct 4, 2011 at 2:57 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/04/2011 07:33 AM, Ivan Gregoretti wrote: > > Hello Lucia, > > > > A proper response to your post would take a lecture rather than an > > email. I can't do that but I can bullet the main points. I think that > > it will help you if you are indeed a newcomer to ChIP-seq. > > > > 1) Expect 10 million reads per sample for a genome the size of human. > > I'd run some basic QA on your lanes, via ShortRead::qa on the fastq files > (or bam if fastq are not available); use FastqSampler if memory is tight > (but in general if memory is tight the solution will be to find a larger > computer). > > See http://bioconductor.org/help/**workflows/high- throughput-**sequencing/<http: bioconductor.org="" help="" workflows="" high-="" throughput-sequencing=""/>for qa and perhaps other operations common to RNAseq / ChIPseq work flows > > > > > > 2) Stick to SAM/BAM formats so that you can use well known, publicly > > available tools. Your best friend is called Picard. > > People can and do use R / Bioconductor for Picard-like tasks. > > > > 3) Remove duplicates. Again, Picard is your best friend. > > > 4) Create WIG files for all samples, treatments and controls so that > > you can display them simultaneously on any genome browser. > > here for interactive use I would rather use basic R plotting commands, > avoiding the round-trip and allowing programmatic interaction. > > > > 5) Find peaks with a well documented peak finder. > > probably a good suggestion for a one-off or common ChIP; the chipseq > vignette > > http://bioconductor.org/**packages/release/bioc/html/**chipseq.html <http: bioconductor.org="" packages="" release="" bioc="" html="" chipseq.html=""> > > provides inspiration for more flexible analysis; packages under the ChIPseq > biocViews term (Software --> AssayTechnologies -> HighThroughputSequencing-> > **ChIPSeq) might offer a solution tailored to your ChIP. > > > > 6) Compute enrichment for all treatments relative to their controls. > > again the chipseq vignette is an alternative source. > > > > > > So, points 4 and 6 are your quality controls at this stage. Once you > > know what a good immunoprecipitation looks like compared to a bad one, > > you can start diving into the details. You can invent your own quality > > especially at getting a sense for good versus bad results the interactivity > of R / Bioconductor seem essential. > > Martin > > > > indicators. For instance, I compute the proportion of tags inside the > > 1000 strongest peaks. I do that for BOTH treatment and controls. > > > > In my workflow, Bioconductor does not get involved until I reach point 6. > > > > Happy ChIPing. > > > > Ivan > > > > > > > > > > > > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap@iscb.org> wrote: > >> Hi, > >> I am new to Chip-seq, my experiment's sequencing has finished, and the > read > >> alignment is currently running > >> The experiment was done for histone acetylation, and I have two types > of > >> controls: input DNA and unmodified histone. > >> I have two conditions and 6 biological replicates of each condition > >> I wanted some advice on how to perform basic quality control on Chip-seq > >> data using Bioconductor > >> and also some ideas of which kinds of biases people usually observe and > I > >> should keep my eyes open for > >> any advice will be greatly appreciated! > >> thanks > >> > >> Lucia > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________**_________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> > >> Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > >> > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> > > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]

ADD REPLY • link 12.6 years ago Lucia Peixoto ▴ 330

Login before adding your answer.