Rcade analysis without ChIP-seq replicates

0

Entering edit mode

André Faure ▴ 30

@andre-faure-6389

Last seen 10.6 years ago

Dear all, I'm using the bioconductor package "Rcade" to determine likely direct targets of a transcription factor. I have corresponding ChIP-seq data (one ChIP and one Input sample, no replicates unfortunately) and microarray gene expression data before/after forced expression of the TF. I'm considering all genes in the mouse genome (mm9) and ChIP-seq bins centred on their canonical TSSs (+/-2.5kb). Independent peak-calling analysis shows the TF does indeed occupy active promoters. However the Rcade results show very low posterior probabilities for "active" binding (maximum p.ChIP = 0.16). The B-values for the combined hypothesis (i.e. active binding and differential expression: B.ChIP.DE) are therefore also low: > x <- getRcade(Rcade) > x[1:5,] geneID logfc.DE B.DE id.DE symbol.DE M.ChIP 19937 ENSMUSG00000069008 -0.5714092 6.701309 10601390 Gm5537 -2.892738 20175 ENSMUSG00000070141 0.6631557 4.626032 10398396 Mir494 -3.115882 22435 ENSMUSG00000076145 0.6631557 4.626032 10398396 Mir679 -3.115882 23445 ENSMUSG00000080411 0.6631557 4.626032 10398396 Mir1193 -3.115882 24261 ENSMUSG00000083380 -0.5012910 3.177634 10462193 Gm3244 -2.892738 A.ChIP log.p.ChIP B.ChIP p.DE p.ChIP B.nothing B.DE.only 19937 -23.98005 -1.819439 -1.642562 0.9987722 0.1621167 -6.878385 1.635008 20175 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 22435 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 23445 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 24261 -23.98005 -1.819439 -1.642562 0.9599839 0.1621167 -3.361246 1.413735 B.ChIP.only B.ChIP.DE 19937 -8.521777 -1.644028 20175 -6.479707 -1.685107 22435 -6.479707 -1.685107 23445 -6.479707 -1.685107 24261 -5.031403 -1.691113 Am I doing something wrong / misunderstanding the output or is this simply related to the absence of replicates for the ChIP-seq data? Any help or ideas would be very much appreciated. Many thanks. André Faure European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom > sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcade_1.0.0 Rsamtools_1.10.2 Biostrings_2.26.3 [4] baySeq_1.12.0 GenomicRanges_1.10.7 IRanges_1.16.6 [7] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3 [5] zlibbioc_1.4.0 [[alternative HTML version deleted]]

Transcription Rcade Transcription Rcade • 1.8k views

ADD COMMENT • link updated 11.2 years ago by Jonathan Cairns ▴ 60 • written 11.2 years ago by André Faure ▴ 30

0

Entering edit mode

Jonathan Cairns ▴ 60

@jonathan-cairns-5761

Last seen 10.6 years ago

Hi AndrÃ©, First of all, I note that you are using an old version of R, which means you are using an old version of Rcade (1.0.0 not 1.4.0) - I recommend you update to the latest versions. Lack of replicates is indeed going to decrease the posterior probabilities. This is because without replicates, baySeq cannot estimate any site-specific variances, thus must fall back on its priors. You should still be able to find your TF's "most likely" targets, but I'm afraid baySeq and Rcade are going to be honest with you that you have little confidence in the truth of these targets without further validation! That aside, it looks like the genes with the highest p.ChIP also have negative ChIP log-ratios (M.ChIP). So you'll likely want to get rid of genes with M.ChIP < 0. (I intended the exportOutput() function to be the primary means of getting output; this function does the filtering step automatically.) Should you find that nearly all of your genes have M.ChIP < 0, then there may be a normalization issue. Let me know if this is the case and I'll work on some tweaks to the package. [As an aside, Rcade uses baySeq's two-tailed test, whereas really we want a one-tailed test in this context - if anyone reading this has advice on one-tailed Bayesian differential count analysis then I'd be interested!] Jonathan ________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] Sent: 07 February 2014 15:58 To: bioconductor@r-project.org; jmcairns200@gmail.com Subject: [BioC] Rcade analysis without ChIP-seq replicates Dear all, I'm using the bioconductor package "Rcade" to determine likely direct targets of a transcription factor. I have corresponding ChIP-seq data (one ChIP and one Input sample, no replicates unfortunately) and microarray gene expression data before/after forced expression of the TF. I'm considering all genes in the mouse genome (mm9) and ChIP-seq bins centred on their canonical TSSs (+/-2.5kb). Independent peak-calling analysis shows the TF does indeed occupy active promoters. However the Rcade results show very low posterior probabilities for "active" binding (maximum p.ChIP = 0.16). The B-values for the combined hypothesis (i.e. active binding and differential expression: B.ChIP.DE) are therefore also low: > x <- getRcade(Rcade) > x[1:5,] geneID logfc.DE B.DE id.DE symbol.DE M.ChIP 19937 ENSMUSG00000069008 -0.5714092 6.701309 10601390 Gm5537 -2.892738 20175 ENSMUSG00000070141 0.6631557 4.626032 10398396 Mir494 -3.115882 22435 ENSMUSG00000076145 0.6631557 4.626032 10398396 Mir679 -3.115882 23445 ENSMUSG00000080411 0.6631557 4.626032 10398396 Mir1193 -3.115882 24261 ENSMUSG00000083380 -0.5012910 3.177634 10462193 Gm3244 -2.892738 A.ChIP log.p.ChIP B.ChIP p.DE p.ChIP B.nothing B.DE.only 19937 -23.98005 -1.819439 -1.642562 0.9987722 0.1621167 -6.878385 1.635008 20175 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 22435 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 23445 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 24261 -23.98005 -1.819439 -1.642562 0.9599839 0.1621167 -3.361246 1.413735 B.ChIP.only B.ChIP.DE 19937 -8.521777 -1.644028 20175 -6.479707 -1.685107 22435 -6.479707 -1.685107 23445 -6.479707 -1.685107 24261 -5.031403 -1.691113 Am I doing something wrong / misunderstanding the output or is this simply related to the absence of replicates for the ChIP-seq data? Any help or ideas would be very much appreciated. Many thanks. Andrï¿½ Faure European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom > sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcade_1.0.0 Rsamtools_1.10.2 Biostrings_2.26.3 [4] baySeq_1.12.0 GenomicRanges_1.10.7 IRanges_1.16.6 [7] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3 [5] zlibbioc_1.4.0 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 11.2 years ago Jonathan Cairns ▴ 60

0

Entering edit mode

Hi Jonathan, Many thanks for getting back to me. I have updated to the latest versions of R and Rcade 1.4.0 and the results are looking more sensible (schoolboy error sorry). Maximum p.ChIP = 0.7 and most genes have positive ChIP log-ratios (M.ChIP). I have used the exportRcade() function to obtain the top 1000 genes most likely to have both DE and ChIP signals (cutoffMode="top"), but this list still contains genes with M.ChIP < 0. Do you recommend I filter these out? Thanks, AndrÃ© On 7 Feb 2014, at 22:13, Jonathan Cairns wrote: > Hi AndrÃ©, > > First of all, I note that you are using an old version of R, which means you are using an old version of Rcade (1.0.0 not 1.4.0) - I recommend you update to the latest versions. > > Lack of replicates is indeed going to decrease the posterior probabilities. This is because without replicates, baySeq cannot estimate any site-specific variances, thus must fall back on its priors. You should still be able to find your TF's "most likely" targets, but I'm afraid baySeq and Rcade are going to be honest with you that you have little confidence in the truth of these targets without further validation! > > That aside, it looks like the genes with the highest p.ChIP also have negative ChIP log-ratios (M.ChIP). So you'll likely want to get rid of genes with M.ChIP < 0. (I intended the exportOutput() function to be the primary means of getting output; this function does the filtering step automatically.) > > Should you find that nearly all of your genes have M.ChIP < 0, then there may be a normalization issue. Let me know if this is the case and I'll work on some tweaks to the package. > > [As an aside, Rcade uses baySeq's two-tailed test, whereas really we want a one-tailed test in this context - if anyone reading this has advice on one-tailed Bayesian differential count analysis then I'd be interested!] > > Jonathan > > From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] > Sent: 07 February 2014 15:58 > To: bioconductor@r-project.org; jmcairns200@gmail.com > Subject: [BioC] Rcade analysis without ChIP-seq replicates > > Dear all, > > I'm using the bioconductor package "Rcade" to determine likely direct targets of a transcription factor. I have corresponding ChIP- seq data (one ChIP and one Input sample, no replicates unfortunately) and microarray gene expression data before/after forced expression of the TF. > > I'm considering all genes in the mouse genome (mm9) and ChIP-seq bins centred on their canonical TSSs (+/-2.5kb). Independent peak- calling analysis shows the TF does indeed occupy active promoters. > > However the Rcade results show very low posterior probabilities for "active" binding (maximum p.ChIP = 0.16). The B-values for the combined hypothesis (i.e. active binding and differential expression: B.ChIP.DE) are therefore also low: > > > > x <- getRcade(Rcade) > > x[1:5,] > geneID logfc.DE B.DE id.DE symbol.DE M.ChIP > 19937 ENSMUSG00000069008 -0.5714092 6.701309 10601390 Gm5537 -2.892738 > 20175 ENSMUSG00000070141 0.6631557 4.626032 10398396 Mir494 -3.115882 > 22435 ENSMUSG00000076145 0.6631557 4.626032 10398396 Mir679 -3.115882 > 23445 ENSMUSG00000080411 0.6631557 4.626032 10398396 Mir1193 -3.115882 > 24261 ENSMUSG00000083380 -0.5012910 3.177634 10462193 Gm3244 -2.892738 > A.ChIP log.p.ChIP B.ChIP p.DE p.ChIP B.nothing B.DE.only > 19937 -23.98005 -1.819439 -1.642562 0.9987722 0.1621167 -6.878385 1.635008 > 20175 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 > 22435 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 > 23445 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 > 24261 -23.98005 -1.819439 -1.642562 0.9599839 0.1621167 -3.361246 1.413735 > B.ChIP.only B.ChIP.DE > 19937 -8.521777 -1.644028 > 20175 -6.479707 -1.685107 > 22435 -6.479707 -1.685107 > 23445 -6.479707 -1.685107 > 24261 -5.031403 -1.691113 > > > Am I doing something wrong / misunderstanding the output or is this simply related to the absence of replicates for the ChIP-seq data? > > Any help or ideas would be very much appreciated. > > Many thanks. > > Andrï¿½ Faure > European Bioinformatics Institute (EMBL-EBI) > European Molecular Biology Laboratory > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD > United Kingdom > > > > > > > sessionInfo() > R version 2.15.3 (2013-03-01) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Rcade_1.0.0 Rsamtools_1.10.2 Biostrings_2.26.3 > [4] baySeq_1.12.0 GenomicRanges_1.10.7 IRanges_1.16.6 > [7] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3 > [5] zlibbioc_1.4.0 > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago André Faure ▴ 30

0

Entering edit mode

Hi AndrÃ©, Apologies for the delay responding. I'm not sure why genes with M.ChIP < 0 are not being filtered out; I'll take a look when I get a chance and then get back to you. In any event, these are probably not true TF targets, so I would remove them if I were you. Jonathan ________________________________ From: Andre Faure [ajfaure@gmail.com] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] Sent: 10 February 2014 10:43 To: Jonathan Cairns Cc: bioconductor@r-project.org; jmcairns200@gmail.com Subject: Re: [BioC] Rcade analysis without ChIP-seq replicates Hi Jonathan, Many thanks for getting back to me. I have updated to the latest versions of R and Rcade 1.4.0 and the results are looking more sensible (schoolboy error sorry). Maximum p.ChIP = 0.7 and most genes have positive ChIP log-ratios (M.ChIP). I have used the exportRcade() function to obtain the top 1000 genes most likely to have both DE and ChIP signals (cutoffMode="top"), but this list still contains genes with M.ChIP < 0. Do you recommend I filter these out? Thanks, AndrÃ© On 7 Feb 2014, at 22:13, Jonathan Cairns wrote: Hi AndrÃ©, First of all, I note that you are using an old version of R, which means you are using an old version of Rcade (1.0.0 not 1.4.0) - I recommend you update to the latest versions. Lack of replicates is indeed going to decrease the posterior probabilities. This is because without replicates, baySeq cannot estimate any site-specific variances, thus must fall back on its priors. You should still be able to find your TF's "most likely" targets, but I'm afraid baySeq and Rcade are going to be honest with you that you have little confidence in the truth of these targets without further validation! That aside, it looks like the genes with the highest p.ChIP also have negative ChIP log-ratios (M.ChIP). So you'll likely want to get rid of genes with M.ChIP < 0. (I intended the exportOutput() function to be the primary means of getting output; this function does the filtering step automatically.) Should you find that nearly all of your genes have M.ChIP < 0, then there may be a normalization issue. Let me know if this is the case and I'll work on some tweaks to the package. [As an aside, Rcade uses baySeq's two-tailed test, whereas really we want a one-tailed test in this context - if anyone reading this has advice on one-tailed Bayesian differential count analysis then I'd be interested!] Jonathan ________________________________ From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [bioconductor-bounces@r-project.org] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] Sent: 07 February 2014 15:58 To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>; jmcairns200@gmail.com<mailto:jmcairns200@gmail.com> Subject: [BioC] Rcade analysis without ChIP-seq replicates Dear all, I'm using the bioconductor package "Rcade" to determine likely direct targets of a transcription factor. I have corresponding ChIP-seq data (one ChIP and one Input sample, no replicates unfortunately) and microarray gene expression data before/after forced expression of the TF. I'm considering all genes in the mouse genome (mm9) and ChIP-seq bins centred on their canonical TSSs (+/-2.5kb). Independent peak-calling analysis shows the TF does indeed occupy active promoters. However the Rcade results show very low posterior probabilities for "active" binding (maximum p.ChIP = 0.16). The B-values for the combined hypothesis (i.e. active binding and differential expression: B.ChIP.DE<http: b.chip.de="">) are therefore also low: > x <- getRcade(Rcade) > x[1:5,] geneID logfc.DE<http: logfc.de=""> B.DE<http: b.de=""> id.DE<http: id.de=""> symbol.DE<http: symbol.de=""> M.ChIP 19937 ENSMUSG00000069008 -0.5714092 6.701309 10601390 Gm5537 -2.892738 20175 ENSMUSG00000070141 0.6631557 4.626032 10398396 Mir494 -3.115882 22435 ENSMUSG00000076145 0.6631557 4.626032 10398396 Mir679 -3.115882 23445 ENSMUSG00000080411 0.6631557 4.626032 10398396 Mir1193 -3.115882 24261 ENSMUSG00000083380 -0.5012910 3.177634 10462193 Gm3244 -2.892738 A.ChIP log.p.ChIP B.ChIP p.DE<http: p.de=""> p.ChIP B.nothing B.DE.only 19937 -23.98005 -1.819439 -1.642562 0.9987722 0.1621167 -6.878385 1.635008 20175 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 22435 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 23445 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 24261 -23.98005 -1.819439 -1.642562 0.9599839 0.1621167 -3.361246 1.413735 B.ChIP.only B.ChIP.DE<http: b.chip.de=""> 19937 -8.521777 -1.644028 20175 -6.479707 -1.685107 22435 -6.479707 -1.685107 23445 -6.479707 -1.685107 24261 -5.031403 -1.691113 Am I doing something wrong / misunderstanding the output or is this simply related to the absence of replicates for the ChIP-seq data? Any help or ideas would be very much appreciated. Many thanks. Andrï¿½ Faure European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom > sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rcade_1.0.0 Rsamtools_1.10.2 Biostrings_2.26.3 [4] baySeq_1.12.0 GenomicRanges_1.10.7 IRanges_1.16.6 [7] BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3 [5] zlibbioc_1.4.0 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Jonathan Cairns ▴ 60

0

Entering edit mode

Ok thanks Jonathan - I will remove these genes then. AndrÃ© On 13 Feb 2014, at 20:03, Jonathan Cairns wrote: > Hi AndrÃ©, > > Apologies for the delay responding. > > I'm not sure why genes with M.ChIP < 0 are not being filtered out; I'll take a look when I get a chance and then get back to you. In any event, these are probably not true TF targets, so I would remove them if I were you. > > Jonathan > From: Andre Faure [ajfaure@gmail.com] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] > Sent: 10 February 2014 10:43 > To: Jonathan Cairns > Cc: bioconductor@r-project.org; jmcairns200@gmail.com > Subject: Re: [BioC] Rcade analysis without ChIP-seq replicates > > Hi Jonathan, > > Many thanks for getting back to me. > > I have updated to the latest versions of R and Rcade 1.4.0 and the results are looking more sensible (schoolboy error sorry). > > Maximum p.ChIP = 0.7 and most genes have positive ChIP log-ratios (M.ChIP). > > I have used the exportRcade() function to obtain the top 1000 genes most likely to have both DE and ChIP signals (cutoffMode="top"), but this list still contains genes with M.ChIP < 0. Do you recommend I filter these out? > > Thanks, > > AndrÃ© > > > > On 7 Feb 2014, at 22:13, Jonathan Cairns wrote: > >> Hi AndrÃ©, >> >> First of all, I note that you are using an old version of R, which means you are using an old version of Rcade (1.0.0 not 1.4.0) - I recommend you update to the latest versions. >> >> Lack of replicates is indeed going to decrease the posterior probabilities. This is because without replicates, baySeq cannot estimate any site-specific variances, thus must fall back on its priors. You should still be able to find your TF's "most likely" targets, but I'm afraid baySeq and Rcade are going to be honest with you that you have little confidence in the truth of these targets without further validation! >> >> That aside, it looks like the genes with the highest p.ChIP also have negative ChIP log-ratios (M.ChIP). So you'll likely want to get rid of genes with M.ChIP < 0. (I intended the exportOutput() function to be the primary means of getting output; this function does the filtering step automatically.) >> >> Should you find that nearly all of your genes have M.ChIP < 0, then there may be a normalization issue. Let me know if this is the case and I'll work on some tweaks to the package. >> >> [As an aside, Rcade uses baySeq's two-tailed test, whereas really we want a one-tailed test in this context - if anyone reading this has advice on one-tailed Bayesian differential count analysis then I'd be interested!] >> >> Jonathan >> >> From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of AndrÃ© Faure [andrefau@ebi.ac.uk] >> Sent: 07 February 2014 15:58 >> To: bioconductor@r-project.org; jmcairns200@gmail.com >> Subject: [BioC] Rcade analysis without ChIP-seq replicates >> >> Dear all, >> >> I'm using the bioconductor package "Rcade" to determine likely direct targets of a transcription factor. I have corresponding ChIP- seq data (one ChIP and one Input sample, no replicates unfortunately) and microarray gene expression data before/after forced expression of the TF. >> >> I'm considering all genes in the mouse genome (mm9) and ChIP-seq bins centred on their canonical TSSs (+/-2.5kb). Independent peak- calling analysis shows the TF does indeed occupy active promoters. >> >> However the Rcade results show very low posterior probabilities for "active" binding (maximum p.ChIP = 0.16). The B-values for the combined hypothesis (i.e. active binding and differential expression: B.ChIP.DE) are therefore also low: >> >> >> > x <- getRcade(Rcade) >> > x[1:5,] >> geneID logfc.DE B.DE id.DE symbol.DE M.ChIP >> 19937 ENSMUSG00000069008 -0.5714092 6.701309 10601390 Gm5537 -2.892738 >> 20175 ENSMUSG00000070141 0.6631557 4.626032 10398396 Mir494 -3.115882 >> 22435 ENSMUSG00000076145 0.6631557 4.626032 10398396 Mir679 -3.115882 >> 23445 ENSMUSG00000080411 0.6631557 4.626032 10398396 Mir1193 -3.115882 >> 24261 ENSMUSG00000083380 -0.5012910 3.177634 10462193 Gm3244 -2.892738 >> A.ChIP log.p.ChIP B.ChIP p.DE p.ChIP B.nothing B.DE.only >> 19937 -23.98005 -1.819439 -1.642562 0.9987722 0.1621167 -6.878385 1.635008 >> 20175 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 >> 22435 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 >> 23445 -23.75691 -1.845462 -1.673543 0.9903014 0.1579523 -4.799496 1.613386 >> 24261 -23.98005 -1.819439 -1.642562 0.9599839 0.1621167 -3.361246 1.413735 >> B.ChIP.only B.ChIP.DE >> 19937 -8.521777 -1.644028 >> 20175 -6.479707 -1.685107 >> 22435 -6.479707 -1.685107 >> 23445 -6.479707 -1.685107 >> 24261 -5.031403 -1.691113 >> >> >> Am I doing something wrong / misunderstanding the output or is this simply related to the absence of replicates for the ChIP-seq data? >> >> Any help or ideas would be very much appreciated. >> >> Many thanks. >> >> Andrï¿½ Faure >> European Bioinformatics Institute (EMBL-EBI) >> European Molecular Biology Laboratory >> Wellcome Trust Genome Campus >> Hinxton >> Cambridge CB10 1SD >> United Kingdom >> >> >> >> >> >> > sessionInfo() >> R version 2.15.3 (2013-03-01) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] Rcade_1.0.0 Rsamtools_1.10.2 Biostrings_2.26.3 >> [4] baySeq_1.12.0 GenomicRanges_1.10.7 IRanges_1.16.6 >> [7] BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3 >> [5] zlibbioc_1.4.0 >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago André Faure ▴ 30

Login before adding your answer.