Fwd: differential binding question

0

Entering edit mode

mali salmon ▴ 370

@mali-salmon-4532

Last seen 7.2 years ago

Israel

Dear Users We have RNA-IP-seq for two conditions with two biological replicates each. So in total we have 8 samples: 2 for condition1 IP 2 for condition1 Input 2 for condition2 IP 2 for condition2 Input We would like to find differential binding between the two conditions which are not influenced from differences in gene expression (Input samples are actually regular RNA-seq). I thought of first finding peak regions (maybe by pooling all IP and all Input samples) and run ChIP-seq peak caller, count how many reads fall within these regions in each of the samples, and then run DESeq and edgeR in order to find differential binding. Is this can be done with edgeR and DESeq (again the Input is different for the two conditions, and we would like to cancel out differential gene expression)? Thanks Mali [[alternative HTML version deleted]]

edgeR DESeq edgeR DESeq • 2.6k views

ADD COMMENT • link updated 14.0 years ago by Rory Stark ▴ 100 • written 14.0 years ago by mali salmon ▴ 370

0

Entering edit mode

Heidi Dvinge ★ 2.0k

@heidi-dvinge-2195

Last seen 11.3 years ago

Dear Mali. > Dear Users > We have RNA-IP-seq for two conditions with two biological replicates each. > So in total we have 8 samples: > 2 for condition1 IP > 2 for condition1 Input > 2 for condition2 IP > 2 for condition2 Input > We would like to find differential binding between the two conditions You might want to have a look at DiffBind, one of the new packages in the latest Bioconductor release. It's designed for differential binding analysis of ChIP, but can probably also be used for RNA-IP. AFAIK, it uses some of the funcitonalities from edgeR and DESeq. HTH \Heidi > which > are not influenced from differences in gene expression (Input samples are > actually regular RNA-seq). > I thought of first finding peak regions (maybe by pooling all IP and all > Input samples) and run ChIP-seq peak caller, count how many reads fall > within these regions in each of the samples, and then run DESeq and edgeR > in order to find differential binding. > Is this can be done with edgeR and DESeq (again the Input is different for > the two conditions, and we would like to cancel out differential gene > expression)? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 14.0 years ago Heidi Dvinge ★ 2.0k

0

Entering edit mode

Rory Stark ▴ 100

@rory-stark-4919

Last seen 11.3 years ago

Hi Mali- You can try this pretty easily using DiffBind. I suggest calling peaks on each IP separately (each IP and its matching RNA-Seq control) and read these four peaksets into DiffBind (you could also use two peak callers and read in all eight peaksets to identify more potential sites). DiffBind lets you derive an overall set of peaks (either a superset of all the peaks, or any that overlap in at least two [or more] peaksets), does the read counting (by default subtracting reads in the matching RNA-seq controls), runs edgeR and/or DESeq to identify differentially bound regions, and offers several plots and reports to characterize the results. A couple of caveats: With only two replicates of each condition, your power to reliably identify significant differences is limited. Also, while the IP reads will be normalized, the control reads will not be (unless you do some normalization separately prior to loading it into DiffBind). However this does seem to be a a good place to start! Cheers- Rory ---------------------------------------------------------------------- ------ Dr. Rory Stark Principal Bioinformatics Analyst Cambridge Research Institute - Cancer Research UK Robinson Way Cambridge CB2 0RE United Kingdom +44 1223 404 311 rory.stark@cancer.org.uk ---------------------------------------------------------------------- ------ On 04/01/2012 13:37, "mali salmon" <shalmom1@gmail.com<mailto:shalmom1@gmail.com>> wrote: Dear Users We have RNA-IP-seq for two conditions with two biological replicates each. So in total we have 8 samples: 2 for condition1 IP 2 for condition1 Input 2 for condition2 IP 2 for condition2 Input We would like to find differential binding between the two conditions which are not influenced from differences in gene expression (Input samples are actually regular RNA-seq). I thought of first finding peak regions (maybe by pooling all IP and all Input samples) and run ChIP-seq peak caller, count how many reads fall within these regions in each of the samples, and then run DESeq and edgeR in order to find differential binding. Is this can be done with edgeR and DESeq (again the Input is different for the two conditions, and we would like to cancel out differential gene expression)? Thanks Mali [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:18}}

ADD COMMENT • link 14.0 years ago Rory Stark ▴ 100

0

Entering edit mode

Thanks Rory and Heidi for replying. Would read subtraction is enough in order to account for the difference in gene expression? Mali On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark@cancer.org.uk> wrote: > Hi Mali- > > You can try this pretty easily using DiffBind. I suggest calling peaks on > each IP separately (each IP and its matching RNA-Seq control) and read > these four peaksets into DiffBind (you could also use two peak callers and > read in all eight peaksets to identify more potential sites). DiffBind lets > you derive an overall set of peaks (either a superset of all the peaks, or > any that overlap in at least two [or more] peaksets), does the read > counting (by default subtracting reads in the matching RNA-seq controls), > runs edgeR and/or DESeq to identify differentially bound regions, and > offers several plots and reports to characterize the results. > > A couple of caveats: With only two replicates of each condition, your > power to reliably identify significant differences is limited. Also, while > the IP reads will be normalized, the control reads will not be (unless you > do some normalization separately prior to loading it into DiffBind). > However this does seem to be a a good place to start! > > Cheers- > Rory > > > -------------------------------------------------------------------- -------- > Dr. Rory Stark > > Principal Bioinformatics Analyst > > Cambridge Research Institute - Cancer Research UK > Robinson Way > Cambridge CB2 0RE > United Kingdom > +44 1223 404 311 > > rory.stark@cancer.org.uk > > -------------------------------------------------------------------- -------- > > On 04/01/2012 13:37, "mali salmon" <shalmom1@gmail.com<mailto:> shalmom1@gmail.com>> wrote: > > Dear Users > We have RNA-IP-seq for two conditions with two biological replicates each. > So in total we have 8 samples: > 2 for condition1 IP > 2 for condition1 Input > 2 for condition2 IP > 2 for condition2 Input > We would like to find differential binding between the two conditions which > are not influenced from differences in gene expression (Input samples are > actually regular RNA-seq). > I thought of first finding peak regions (maybe by pooling all IP and all > Input samples) and run ChIP-seq peak caller, count how many reads fall > within these regions in each of the samples, and then run DESeq and edgeR > in order to find differential binding. > Is this can be done with edgeR and DESeq (again the Input is different for > the two conditions, and we would like to cancel out differential gene > expression)? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.0 years ago mali salmon ▴ 370

0

Entering edit mode

That's the big question Mali! The more I think about it, the less confident I am that it will work. As I understand it you want to control for transcripts whose expression may change but whose affinity (rate at which the protein binds) stays the same. Without the control, higher expression = more transcripts = more RNA pulled down by the IP even at the same affinity. I'm not sure that subtracting the transcripts independent of binding (the RNA-Seq) will work. Besides the normalization issue relating to the RNA-Seq counts, the problem is that there should always be more transcripts in the control (as they include both bound and unbound transcripts) than in the IP (only bound transcripts). So even if the normalization was perfect, the subtraction would always result in a negative number of counts (set to a minimum of 1 count per peak in DiffBind). In this case, subtracting the control is probably too crude. Mark Robinson, an author of edgeR, is doing some interesting work on incorporating copy number information into differential ChIP-Seq analysis, and I am adding it to DiffBind. He is able to cast the problem as a normalization issue. I'm thinking that would be a better approach to your problem: the RNA-Seq gives "copy number" information (overall transcript abundance), and this is incorporated as a normalization term, leaving the differential analysis to identify changes in affinity. I'm working on this right now, so if you are interested you might be a beta tester let me know. I still think it is worth running your data in DiffBind to see how it looks as a start. Cheers- Rory From: mali salmon <shalmom1@gmail.com<mailto:shalmom1@gmail.com>> Date: Wed, 4 Jan 2012 15:09:11 +0000 To: Cancer Research UK <rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] Fwd: differential binding question Thanks Rory and Heidi for replying. Would read subtraction is enough in order to account for the difference in gene expression? Mali On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>> wrote: Hi Mali- You can try this pretty easily using DiffBind. I suggest calling peaks on each IP separately (each IP and its matching RNA-Seq control) and read these four peaksets into DiffBind (you could also use two peak callers and read in all eight peaksets to identify more potential sites). DiffBind lets you derive an overall set of peaks (either a superset of all the peaks, or any that overlap in at least two [or more] peaksets), does the read counting (by default subtracting reads in the matching RNA-seq controls), runs edgeR and/or DESeq to identify differentially bound regions, and offers several plots and reports to characterize the results. A couple of caveats: With only two replicates of each condition, your power to reliably identify significant differences is limited. Also, while the IP reads will be normalized, the control reads will not be (unless you do some normalization separately prior to loading it into DiffBind). However this does seem to be a a good place to start! Cheers- Rory ---------------------------------------------------------------------- ------ Dr. Rory Stark Principal Bioinformatics Analyst Cambridge Research Institute - Cancer Research UK Robinson Way Cambridge CB2 0RE United Kingdom +44 1223 404 311<tel:%2b44%201223%20404%20311> rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk> ---------------------------------------------------------------------- ------ On 04/01/2012 13:37, "mali salmon" <shalmom1@gmail.com<mailto:shalmom1 @gmail.com=""><mailto:shalmom1@gmail.com<mailto:shalmom1@gmail.com>>> wrote: Dear Users We have RNA-IP-seq for two conditions with two biological replicates each. So in total we have 8 samples: 2 for condition1 IP 2 for condition1 Input 2 for condition2 IP 2 for condition2 Input We would like to find differential binding between the two conditions which are not influenced from differences in gene expression (Input samples are actually regular RNA-seq). I thought of first finding peak regions (maybe by pooling all IP and all Input samples) and run ChIP-seq peak caller, count how many reads fall within these regions in each of the samples, and then run DESeq and edgeR in order to find differential binding. Is this can be done with edgeR and DESeq (again the Input is different for the two conditions, and we would like to cancel out differential gene expression)? Thanks Mali [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org><mailto:b ioconductor@r-project.org<mailto:bioconductor@r-project.org="">> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:18}} _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:18}}

ADD REPLY • link 14.0 years ago Rory Stark ▴ 100

0

Entering edit mode

Hi, Just wanted to mention that I'm also quite interested in what comes out of the "accounting for 'copy number'" saga being discussed ... very cool stuff. Keep us posted! -steve On Wed, Jan 4, 2012 at 11:24 AM, Rory Stark <rory.stark at="" cancer.org.uk=""> wrote: > That's the big question Mali! The more I think about it, the less confident I am that it will work. > > As I understand it you want to control for transcripts whose expression may change but whose affinity (rate at which the protein binds) stays the same. Without the control, higher expression = more transcripts = more RNA pulled down by the IP even at the same affinity. I'm not sure that subtracting the transcripts independent of binding (the RNA-Seq) will work. Besides the normalization issue relating to the RNA-Seq counts, the problem ?is that there should always be more transcripts in the control (as they include both bound and unbound transcripts) than in the IP (only bound transcripts). So even if the normalization was perfect, the subtraction would always result in a negative number of counts (set to a minimum of 1 count per peak in DiffBind). > > In this case, subtracting the control is probably too crude. Mark Robinson, an author of edgeR, is doing some interesting work on incorporating copy number information into differential ChIP-Seq analysis, and I am adding it to DiffBind. He is able to cast the problem as a normalization issue. I'm thinking that would be a better approach to your problem: the RNA-Seq gives "copy number" information (overall transcript abundance), and this is incorporated as a normalization term, leaving ?the differential analysis to identify changes in affinity. ?I'm working on this right now, so if you are interested you might be a beta tester ? let me know. > > I still think it is worth running your data in DiffBind to see how it looks as a start. > > Cheers- > Rory > > From: mali salmon <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">> > Date: Wed, 4 Jan 2012 15:09:11 +0000 > To: Cancer Research UK <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> > Cc: "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > Subject: Re: [BioC] Fwd: differential binding question > > Thanks Rory and Heidi for replying. > Would read subtraction is enough in order to account for the difference in gene expression? > Mali > > On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> wrote: > Hi Mali- > > You can try this pretty easily using DiffBind. I suggest calling peaks on each IP separately (each IP and its matching RNA-Seq control) and read these four peaksets into DiffBind (you could also use two peak callers and read in all eight peaksets to identify more potential sites). DiffBind lets you derive an overall set of peaks (either a superset of all the peaks, or any that overlap in at least two [or more] peaksets), does the read counting (by default subtracting reads in the matching RNA-seq controls), runs edgeR and/or DESeq to identify differentially bound regions, and offers several plots and reports to characterize the results. > > A couple of caveats: With only two replicates of each condition, your power to reliably identify significant differences is limited. Also, while the IP reads will be normalized, the control reads will not be (unless you do some normalization separately prior to loading it into DiffBind). However this does seem to be a a good place to start! > > Cheers- > Rory > > -------------------------------------------------------------------- -------- > Dr. Rory Stark > > Principal Bioinformatics ?Analyst > > Cambridge Research Institute - Cancer Research UK > Robinson Way > Cambridge CB2 0RE > United Kingdom > ?+44 1223 404 311<tel:%2b44%201223%20404%20311> > > rory.stark at cancer.org.uk<mailto:rory.stark at="" cancer.org.uk=""> > -------------------------------------------------------------------- -------- > > On 04/01/2012 13:37, "mali salmon" <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com=""><mailto:shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">>> wrote: > > Dear Users > We have RNA-IP-seq for two conditions with two biological replicates each. > So in total we have 8 samples: > 2 for condition1 IP > 2 for condition1 Input > 2 for condition2 IP > 2 for condition2 Input > We would like to find differential binding between the two conditions which > are not influenced from differences in gene expression (Input samples are > actually regular RNA-seq). > I thought of first finding peak regions (maybe by pooling all IP and all > Input samples) and run ChIP-seq peak caller, count how many reads fall > within these regions in each of the samples, and then run DESeq and edgeR > in order to find differential binding. > Is this can be done with edgeR and DESeq (again the Input is different for > the two conditions, and we would like to cancel out differential gene > expression)? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi all, Mali: Very interesting dataset. Rory: As I mentioned offline, I don't think the offset style normalization will work here, since the "inputs" are observations, and have associated variability. Offsets are meant to be fixed quantities, even though in practice they are estimated, but from a lot of data (e.g. TMM). I'm not keen on the idea of direct subtraction -- anyone for a Skellam distribution? What about just fitting a GLM ? If I understand it correctly, this seems appropriate: ------ > g <- rep( 0:1, each=4 ) > p <- rep( c("input","IP"), 4 ) > d <- model.matrix( ~ g*p ) > rownames(d) <- paste(g,p,sep=".") > d (Intercept) g pIP g:pIP 0.input 1 0 0 0 0.IP 1 0 1 0 0.input 1 0 0 0 0.IP 1 0 1 0 1.input 1 1 0 0 1.IP 1 1 1 1 1.input 1 1 0 0 1.IP 1 1 1 1 ------ First three columns account for expression levels and differences between IP/input. Column 4 specifies the difference b/w conditions and should be what you are interested in. Once you have a table of counts, edgeR and DESeq can do this. Of course, the challenge is how you do the counting. I'm not sure what sort of features are expected/possible with this assay, maybe a peak/region finder will do the trick. And, there might be some normalization challenges too. Hope that helps, Mark On 04.01.2012, at 17:24, Rory Stark wrote: > That's the big question Mali! The more I think about it, the less confident I am that it will work. > > As I understand it you want to control for transcripts whose expression may change but whose affinity (rate at which the protein binds) stays the same. Without the control, higher expression = more transcripts = more RNA pulled down by the IP even at the same affinity. I'm not sure that subtracting the transcripts independent of binding (the RNA-Seq) will work. Besides the normalization issue relating to the RNA-Seq counts, the problem is that there should always be more transcripts in the control (as they include both bound and unbound transcripts) than in the IP (only bound transcripts). So even if the normalization was perfect, the subtraction would always result in a negative number of counts (set to a minimum of 1 count per peak in DiffBind). > > In this case, subtracting the control is probably too crude. Mark Robinson, an author of edgeR, is doing some interesting work on incorporating copy number information into differential ChIP-Seq analysis, and I am adding it to DiffBind. He is able to cast the problem as a normalization issue. I'm thinking that would be a better approach to your problem: the RNA-Seq gives "copy number" information (overall transcript abundance), and this is incorporated as a normalization term, leaving the differential analysis to identify changes in affinity. I'm working on this right now, so if you are interested you might be a beta tester ? let me know. > > I still think it is worth running your data in DiffBind to see how it looks as a start. > > Cheers- > Rory > > From: mali salmon <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">> > Date: Wed, 4 Jan 2012 15:09:11 +0000 > To: Cancer Research UK <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> > Cc: "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > Subject: Re: [BioC] Fwd: differential binding question > > Thanks Rory and Heidi for replying. > Would read subtraction is enough in order to account for the difference in gene expression? > Mali > > On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> wrote: > Hi Mali- > > You can try this pretty easily using DiffBind. I suggest calling peaks on each IP separately (each IP and its matching RNA-Seq control) and read these four peaksets into DiffBind (you could also use two peak callers and read in all eight peaksets to identify more potential sites). DiffBind lets you derive an overall set of peaks (either a superset of all the peaks, or any that overlap in at least two [or more] peaksets), does the read counting (by default subtracting reads in the matching RNA-seq controls), runs edgeR and/or DESeq to identify differentially bound regions, and offers several plots and reports to characterize the results. > > A couple of caveats: With only two replicates of each condition, your power to reliably identify significant differences is limited. Also, while the IP reads will be normalized, the control reads will not be (unless you do some normalization separately prior to loading it into DiffBind). However this does seem to be a a good place to start! > > Cheers- > Rory > > -------------------------------------------------------------------- -------- > Dr. Rory Stark > > Principal Bioinformatics Analyst > > Cambridge Research Institute - Cancer Research UK > Robinson Way > Cambridge CB2 0RE > United Kingdom > +44 1223 404 311<tel:%2b44%201223%20404%20311> > > rory.stark at cancer.org.uk<mailto:rory.stark at="" cancer.org.uk=""> > -------------------------------------------------------------------- -------- > > On 04/01/2012 13:37, "mali salmon" <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com=""><mailto:shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">>> wrote: > > Dear Users > We have RNA-IP-seq for two conditions with two biological replicates each. > So in total we have 8 samples: > 2 for condition1 IP > 2 for condition1 Input > 2 for condition2 IP > 2 for condition2 Input > We would like to find differential binding between the two conditions which > are not influenced from differences in gene expression (Input samples are > actually regular RNA-seq). > I thought of first finding peak regions (maybe by pooling all IP and all > Input samples) and run ChIP-seq peak caller, count how many reads fall > within these regions in each of the samples, and then run DESeq and edgeR > in order to find differential binding. > Is this can be done with edgeR and DESeq (again the Input is different for > the two conditions, and we would like to cancel out differential gene > expression)? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.0 years ago Mark Robinson ▴ 880

0

Entering edit mode

I agree with Mark that fitting a GLM using a design matrix with both factors (the condition and whether an IP was done) is the most straighforward way of analyzing this dataset. Reegarding peak calling, you could use the RNA-Seq data to determine the relevant regions as in a standard RNA-Seq experiment, then count the reads in each of the samples (RNA and RIP) for each of the intervals (exons, isoforms, or genes depending on how you do the analysis), then normalize and fit the GLM. -Rory ________________________________________ From: Mark Robinson [mark.robinson@imls.uzh.ch] Sent: 04 January 2012 22:11 To: Rory Stark Cc: mali salmon; bioconductor at r-project.org Subject: Re: [BioC] differential binding question Hi all, Mali: Very interesting dataset. Rory: As I mentioned offline, I don't think the offset style normalization will work here, since the "inputs" are observations, and have associated variability. Offsets are meant to be fixed quantities, even though in practice they are estimated, but from a lot of data (e.g. TMM). I'm not keen on the idea of direct subtraction -- anyone for a Skellam distribution? What about just fitting a GLM ? If I understand it correctly, this seems appropriate: ------ > g <- rep( 0:1, each=4 ) > p <- rep( c("input","IP"), 4 ) > d <- model.matrix( ~ g*p ) > rownames(d) <- paste(g,p,sep=".") > d (Intercept) g pIP g:pIP 0.input 1 0 0 0 0.IP 1 0 1 0 0.input 1 0 0 0 0.IP 1 0 1 0 1.input 1 1 0 0 1.IP 1 1 1 1 1.input 1 1 0 0 1.IP 1 1 1 1 ------ First three columns account for expression levels and differences between IP/input. Column 4 specifies the difference b/w conditions and should be what you are interested in. Once you have a table of counts, edgeR and DESeq can do this. Of course, the challenge is how you do the counting. I'm not sure what sort of features are expected/possible with this assay, maybe a peak/region finder will do the trick. And, there might be some normalization challenges too. Hope that helps, Mark On 04.01.2012, at 17:24, Rory Stark wrote: > That's the big question Mali! The more I think about it, the less confident I am that it will work. > > As I understand it you want to control for transcripts whose expression may change but whose affinity (rate at which the protein binds) stays the same. Without the control, higher expression = more transcripts = more RNA pulled down by the IP even at the same affinity. I'm not sure that subtracting the transcripts independent of binding (the RNA-Seq) will work. Besides the normalization issue relating to the RNA-Seq counts, the problem is that there should always be more transcripts in the control (as they include both bound and unbound transcripts) than in the IP (only bound transcripts). So even if the normalization was perfect, the subtraction would always result in a negative number of counts (set to a minimum of 1 count per peak in DiffBind). > > In this case, subtracting the control is probably too crude. Mark Robinson, an author of edgeR, is doing some interesting work on incorporating copy number information into differential ChIP-Seq analysis, and I am adding it to DiffBind. He is able to cast the problem as a normalization issue. I'm thinking that would be a better approach to your problem: the RNA-Seq gives "copy number" information (overall transcript abundance), and this is incorporated as a normalization term, leaving the differential analysis to identify changes in affinity. I'm working on this right now, so if you are interested you might be a beta tester ? let me know. > > I still think it is worth running your data in DiffBind to see how it looks as a start. > > Cheers- > Rory > > From: mali salmon <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">> > Date: Wed, 4 Jan 2012 15:09:11 +0000 > To: Cancer Research UK <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> > Cc: "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > Subject: Re: [BioC] Fwd: differential binding question > > Thanks Rory and Heidi for replying. > Would read subtraction is enough in order to account for the difference in gene expression? > Mali > > On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark at="" cancer.org.uk<mailto:rory.stark="" at="" cancer.org.uk="">> wrote: > Hi Mali- > > You can try this pretty easily using DiffBind. I suggest calling peaks on each IP separately (each IP and its matching RNA-Seq control) and read these four peaksets into DiffBind (you could also use two peak callers and read in all eight peaksets to identify more potential sites). DiffBind lets you derive an overall set of peaks (either a superset of all the peaks, or any that overlap in at least two [or more] peaksets), does the read counting (by default subtracting reads in the matching RNA-seq controls), runs edgeR and/or DESeq to identify differentially bound regions, and offers several plots and reports to characterize the results. > > A couple of caveats: With only two replicates of each condition, your power to reliably identify significant differences is limited. Also, while the IP reads will be normalized, the control reads will not be (unless you do some normalization separately prior to loading it into DiffBind). However this does seem to be a a good place to start! > > Cheers- > Rory > > -------------------------------------------------------------------- -------- > Dr. Rory Stark > > Principal Bioinformatics Analyst > > Cambridge Research Institute - Cancer Research UK > Robinson Way > Cambridge CB2 0RE > United Kingdom > +44 1223 404 311<tel:%2b44%201223%20404%20311> > > rory.stark at cancer.org.uk<mailto:rory.stark at="" cancer.org.uk=""> > -------------------------------------------------------------------- -------- > > On 04/01/2012 13:37, "mali salmon" <shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com=""><mailto:shalmom1 at="" gmail.com<mailto:shalmom1="" at="" gmail.com="">>> wrote: > > Dear Users > We have RNA-IP-seq for two conditions with two biological replicates each. > So in total we have 8 samples: > 2 for condition1 IP > 2 for condition1 Input > 2 for condition2 IP > 2 for condition2 Input > We would like to find differential binding between the two conditions which > are not influenced from differences in gene expression (Input samples are > actually regular RNA-seq). > I thought of first finding peak regions (maybe by pooling all IP and all > Input samples) and run ChIP-seq peak caller, count how many reads fall > within these regions in each of the samples, and then run DESeq and edgeR > in order to find differential binding. > Is this can be done with edgeR and DESeq (again the Input is different for > the two conditions, and we would like to cancel out differential gene > expression)? > Thanks > Mali > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""><mailto:bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:16}}

ADD REPLY • link 14.0 years ago Rory Stark ▴ 100

0

Entering edit mode

Thank you very much Mark and Rory for your kind help. I'm not a statistician, so by just looking at the design matrix I can't be sure its what I meant, so I hope you understood what I'm looking for, which is differential binding between the two IP conditions after taking into account differences between the two Input conditions. Before consulting you, I just thought of finding differential sites between IP samples of the two conditions, and separately to find differential sites between the Input of the two conditions, and then looking for those sites found to be DE between IP but not between Input. I suppose that fitting this into a single design matrix is much better. Rory - I have a questions regarding the subtraction in DiffBind, does the software account for differences in library size before subtraction? If I understood what you said, normalization is done after subtraction and only for the IP samples. I think subtraction make sense only if the IP and its corresponding Input have similar number of sequenced reads. Do I miss something? Thanks again Mali On Thu, Jan 5, 2012 at 12:11 AM, Mark Robinson <mark.robinson@imls.uzh.ch>wrote: > Hi all, > > Mali: Very interesting dataset. > > Rory: As I mentioned offline, I don't think the offset style normalization > will work here, since the "inputs" are observations, and have associated > variability. Offsets are meant to be fixed quantities, even though in > practice they are estimated, but from a lot of data (e.g. TMM). > > I'm not keen on the idea of direct subtraction -- anyone for a Skellam > distribution? > > What about just fitting a GLM ? If I understand it correctly, this seems > appropriate: > > ------ > > g <- rep( 0:1, each=4 ) > > p <- rep( c("input","IP"), 4 ) > > d <- model.matrix( ~ g*p ) > > rownames(d) <- paste(g,p,sep=".") > > d > (Intercept) g pIP g:pIP > 0.input 1 0 0 0 > 0.IP 1 0 1 0 > 0.input 1 0 0 0 > 0.IP 1 0 1 0 > 1.input 1 1 0 0 > 1.IP 1 1 1 1 > 1.input 1 1 0 0 > 1.IP 1 1 1 1 > ------ > > First three columns account for expression levels and differences between > IP/input. Column 4 specifies the difference b/w conditions and should be > what you are interested in. > Once you have a table of counts, edgeR and DESeq can do this. > > Of course, the challenge is how you do the counting. I'm not sure what > sort of features are expected/possible with this assay, maybe a peak/region > finder will do the trick. And, there might be some normalization > challenges too. > > Hope that helps, > Mark > > > On 04.01.2012, at 17:24, Rory Stark wrote: > > > That's the big question Mali! The more I think about it, the less > confident I am that it will work. > > > > As I understand it you want to control for transcripts whose expression > may change but whose affinity (rate at which the protein binds) stays the > same. Without the control, higher expression = more transcripts = more RNA > pulled down by the IP even at the same affinity. I'm not sure that > subtracting the transcripts independent of binding (the RNA-Seq) will work. > Besides the normalization issue relating to the RNA-Seq counts, the problem > is that there should always be more transcripts in the control (as they > include both bound and unbound transcripts) than in the IP (only bound > transcripts). So even if the normalization was perfect, the subtraction > would always result in a negative number of counts (set to a minimum of 1 > count per peak in DiffBind). > > > > In this case, subtracting the control is probably too crude. Mark > Robinson, an author of edgeR, is doing some interesting work on > incorporating copy number information into differential ChIP-Seq analysis, > and I am adding it to DiffBind. He is able to cast the problem as a > normalization issue. I'm thinking that would be a better approach to your > problem: the RNA-Seq gives "copy number" information (overall transcript > abundance), and this is incorporated as a normalization term, leaving the > differential analysis to identify changes in affinity. I'm working on this > right now, so if you are interested you might be a beta tester let me > know. > > > > I still think it is worth running your data in DiffBind to see how it > looks as a start. > > > > Cheers- > > Rory > > > > From: mali salmon <shalmom1@gmail.com<mailto:shalmom1@gmail.com>> > > Date: Wed, 4 Jan 2012 15:09:11 +0000 > > To: Cancer Research UK <rory.stark@cancer.org.uk<mailto:> rory.stark@cancer.org.uk>> > > Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" < > bioconductor@r-project.org<mailto:bioconductor@r-project.org>> > > Subject: Re: [BioC] Fwd: differential binding question > > > > Thanks Rory and Heidi for replying. > > Would read subtraction is enough in order to account for the difference > in gene expression? > > Mali > > > > On Wed, Jan 4, 2012 at 2:11 PM, Rory Stark <rory.stark@cancer.org.uk> <mailto:rory.stark@cancer.org.uk>> wrote: > > Hi Mali- > > > > You can try this pretty easily using DiffBind. I suggest calling peaks > on each IP separately (each IP and its matching RNA-Seq control) and read > these four peaksets into DiffBind (you could also use two peak callers and > read in all eight peaksets to identify more potential sites). DiffBind lets > you derive an overall set of peaks (either a superset of all the peaks, or > any that overlap in at least two [or more] peaksets), does the read > counting (by default subtracting reads in the matching RNA-seq controls), > runs edgeR and/or DESeq to identify differentially bound regions, and > offers several plots and reports to characterize the results. > > > > A couple of caveats: With only two replicates of each condition, your > power to reliably identify significant differences is limited. Also, while > the IP reads will be normalized, the control reads will not be (unless you > do some normalization separately prior to loading it into DiffBind). > However this does seem to be a a good place to start! > > > > Cheers- > > Rory > > > > > -------------------------------------------------------------------- -------- > > Dr. Rory Stark > > > > Principal Bioinformatics Analyst > > > > Cambridge Research Institute - Cancer Research UK > > Robinson Way > > Cambridge CB2 0RE > > United Kingdom > > +44 1223 404 311<tel:%2b44%201223%20404%20311> > > > > rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk> > > > -------------------------------------------------------------------- -------- > > > > On 04/01/2012 13:37, "mali salmon" <shalmom1@gmail.com<mailto:> shalmom1@gmail.com><mailto:shalmom1@gmail.com<mailto:shalmom1@gmail. com="">>> > wrote: > > > > Dear Users > > We have RNA-IP-seq for two conditions with two biological replicates > each. > > So in total we have 8 samples: > > 2 for condition1 IP > > 2 for condition1 Input > > 2 for condition2 IP > > 2 for condition2 Input > > We would like to find differential binding between the two conditions > which > > are not influenced from differences in gene expression (Input samples are > > actually regular RNA-seq). > > I thought of first finding peak regions (maybe by pooling all IP and all > > Input samples) and run ChIP-seq peak caller, count how many reads fall > > within these regions in each of the samples, and then run DESeq and edgeR > > in order to find differential binding. > > Is this can be done with edgeR and DESeq (again the Input is different > for > > the two conditions, and we would like to cancel out differential gene > > expression)? > > Thanks > > Mali > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org<mailto:bioconductor@r-project.org><mailto:> Bioconductor@r-project.org<mailto:bioconductor@r-project.org>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > NOTICE AND DISCLAIMER > > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > NOTICE AND DISCLAIMER > > This e-mail (including any attachments) is intended for ...{{dropped:18}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 14.0 years ago mali salmon ▴ 370

Login before adding your answer.