Question: edgeR package : question about 'exactTest' results
0
7.6 years ago by
NEIL-BERNET Helen30 wrote:
Dear all, I use edgeR for differential analysis of ChIP-seq densities. I would like to know how to export the results from 'exactTest', including p-values and FDR. Using 'topTags', I only have the 10 first results, with p-values and FDR. But when I do 'write.table(mytest$table, )', it gives me only the complete set of p-values, but not the FDR. Thanks for your help, Best regards, helen [[alternative HTML version deleted]] edger • 939 views ADD COMMENTlink modified 7.6 years ago by Rory Stark100 • written 7.6 years ago by NEIL-BERNET Helen30 Answer: edgeR package : question about 'exactTest' results 0 7.6 years ago by Rory Stark100 Rory Stark100 wrote: Hi helen- After calling exactTest, you can call topTags with the result. topTags takes a parameter, n, which is the number of results to return -- if you set it to the total number of sites you are comparing, you will get them all back with their associated stats. I end up doing something like: > results = topTags(de$db,nrow(db$counts))$table to get everything back. If you are using edgeR to do differential binding analysis of ChIP-seq data, you may also want to check out the DiffBind package, which add functions to make this sort of thing easier. Cheers- Rory On 25/01/2012 15:14, "NEIL-BERNET Helen" <helen.neil-bernet at="" cea.fr=""> wrote: >Dear all, > >I use edgeR for differential analysis of ChIP-seq densities. >I would like to know how to export the results from 'exactTest', >including p-values and FDR. >Using 'topTags', I only have the 10 first results, with p-values and FDR. >But when I do 'write.table(mytest$table, ?)', it gives me only the >complete set of p-values, but not the FDR. > >Thanks for your help, > >Best regards, > >helen > > > > [[alternative HTML version deleted]] > NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:16}} ADD COMMENTlink written 7.6 years ago by Rory Stark100 Rory, Thanks for your answer. I've already tried DiffBind, it gave me roughly the same results as edgeR, hopefully! Indeed, it's easier for ChIP-seq, but I couldn't address 2 things: 1) I need to use quantile normalization (with limma package) prior to differential analysis, I'm not sure that's possible to use directly my own normalized counts. Moreover, I don't use peak calling, because I study specific regions, for example promoter proximal regions, so I don't have any affinity score. 2) I couldn't find a way to analyze my data without replicates, and that's a critical point for ChIP-seq, most of the times, we don't have any replicates! If you can answer these questions, I would really appreciate to work with DiffBind instead of edgeR!! Thank you very much, helen Le 25/01/12 16:33, << Rory Stark >> <rory.stark at="" cancer.org.uk=""> a ?crit : >Hi helen- > >After calling exactTest, you can call topTags with the result. topTags >takes a parameter, n, which is the number of results to return -- if you >set it to the total number of sites you are comparing, you will get them >all back with their associated stats. I end up doing something like: > >> results = topTags(de$db,nrow(db$counts))$table > > >to get everything back. > >If you are using edgeR to do differential binding analysis of ChIP- seq >data, you may also want to check out the DiffBind package, which add >functions to make this sort of thing easier. > >Cheers- >Rory > >On 25/01/2012 15:14, "NEIL-BERNET Helen" <helen.neil-bernet at="" cea.fr=""> wrote: > >>Dear all, >> >>I use edgeR for differential analysis of ChIP-seq densities. >>I would like to know how to export the results from 'exactTest', >>including p-values and FDR. >>Using 'topTags', I only have the 10 first results, with p-values and FDR. >>But when I do 'write.table(mytest$table, ?)', it gives me only the >>complete set of p-values, but not the FDR. >> >>Thanks for your help, >> >>Best regards, >> >>helen >> >> >> >> [[alternative HTML version deleted]] >> > > >NOTICE AND DISCLAIMER >This e-mail (including any attachments) is intended for the above- named >person(s). If you are not the intended recipient, notify the sender >immediately, delete this email from your system and do not disclose or >use for any purpose. > >We may monitor all incoming and outgoing emails in line with current >legislation. We have taken steps to ensure that this email and >attachments are free from any virus, but it remains your responsibility >to ensure that viruses do not adversely affect you. >Cancer Research UK >Registered in England and Wales >Company Registered Number: 4325234. >Registered Charity Number: 1089464 and Scotland SC041666 >Registered Office Address: Angel Building, 407 St John Street, London >EC1V 4AD. ADD REPLYlink written 7.6 years ago by NEIL-BERNET Helen30 Hi helen- Addressing first the use of pre-defined regions instead of peak callers, this is possible with DiffBind. The easiest way to to write out the regions in a single file (chr, start, end, 1) and specify the same file for all the samples. Alternatively, if you are creating the region file dynamically, you can create a single "dummy" peak file and specify it for each sample, e.g. chr1 1 5 1 chr1 10 15 1 Then, in the call to dba.counts, specify your dynamically computed set of regions as the "peaks" parameter, and DiffBind will use those regions. We do this all the time, e.g. get all the transcription start sites, then take windows 1000bp upstream and 500bp downstream of each TSS, and pass that in to dba.count -- no peak calling needed. The alternative normalization issue is more difficult. I assume you are quantile normalizing the read counts for all your regions prior to sending them to edgeR (by creating a DGEList). Do you do the TMM normalization after that (calcNormFactors) or somehow fake out the normalization factors? Or use another normalization method? edgeR includes an "upperquartile" normalization option, is that not sufficient? I'd really like to understand what you do, as we are working on the next version of DiffBind and providing more flexibility in normalization is an area we are looking at. Finally, replicates. While DiffBind does let you run with no replicates (it just gives you a nasty warning), you really, really, really need replicates for this kind of analysis. Really. edgeR needs replicates to compute dispersion for the negative binomial -- without replicates it just sets the dispersion to zero. Even assuming that technical (sequencing) effects are negligible, the biological variation and experimental variance in ChIP efficiency is such that you can have no confidence that any differential binding you find is meaningful without replicates. ChIP, thanks in part to its use of antibodies, has greater experimental variation than RNA expression assays, and a differential expression analysis without replicates would be laughed out of any peer-reviewed journal. We need to get away from it being acceptable to consider a comparative ChIP-seq analysis without replicates -- even "just to see if there's anything there". We need to encourage the lab scientists to price this in; here at CRI we basically refuse to do anything but the most simple analysis on non-replicated ChIPs. Cheers- Rory On 26/01/2012 10:44, "NEIL-BERNET Helen" <helen.neil-bernet at="" cea.fr=""> wrote: >Rory, > >Thanks for your answer. >I've already tried DiffBind, it gave me roughly the same results as edgeR, >hopefully! >Indeed, it's easier for ChIP-seq, but I couldn't address 2 things: > >1) I need to use quantile normalization (with limma package) prior to >differential analysis, I'm not sure that's possible to use directly my own >normalized counts. Moreover, I don't use peak calling, because I study >specific regions, for example promoter proximal regions, so I don't have >any affinity score. > >2) I couldn't find a way to analyze my data without replicates, and that's >a critical point for ChIP-seq, most of the times, we don't have any >replicates! >If you can answer these questions, I would really appreciate to work with >DiffBind instead of edgeR!! > >Thank you very much, > >helen > > > >Le 25/01/12 16:33, << Rory Stark >> <rory.stark at="" cancer.org.uk=""> a ?crit : > >>Hi helen- >> >>After calling exactTest, you can call topTags with the result. topTags >>takes a parameter, n, which is the number of results to return -- if you >>set it to the total number of sites you are comparing, you will get them >>all back with their associated stats. I end up doing something like: >> >>> results = topTags(de$db,nrow(db$counts))$table >> >> >>to get everything back. >> >>If you are using edgeR to do differential binding analysis of ChIP- seq >>data, you may also want to check out the DiffBind package, which add >>functions to make this sort of thing easier. >> >>Cheers- >>Rory >> >>On 25/01/2012 15:14, "NEIL-BERNET Helen" <helen.neil-bernet at="" cea.fr=""> >>wrote: >> >>>Dear all, >>> >>>I use edgeR for differential analysis of ChIP-seq densities. >>>I would like to know how to export the results from 'exactTest', >>>including p-values and FDR. >>>Using 'topTags', I only have the 10 first results, with p-values and >>>FDR. >>>But when I do 'write.table(mytest$table, ?)', it gives me only the >>>complete set of p-values, but not the FDR. >>> >>>Thanks for your help, >>> >>>Best regards, >>> >>>helen >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >> >> >>NOTICE AND DISCLAIMER >>This e-mail (including any attachments) is intended for the above- named >>person(s). If you are not the intended recipient, notify the sender >>immediately, delete this email from your system and do not disclose or >>use for any purpose. >> >>We may monitor all incoming and outgoing emails in line with current >>legislation. We have taken steps to ensure that this email and >>attachments are free from any virus, but it remains your responsibility >>to ensure that viruses do not adversely affect you. >>Cancer Research UK >>Registered in England and Wales >>Company Registered Number: 4325234. >>Registered Charity Number: 1089464 and Scotland SC041666 >>Registered Office Address: Angel Building, 407 St John Street, London >>EC1V 4AD. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:16}} ADD REPLYlink written 7.6 years ago by Rory Stark100 Answer: edgeR package : question about 'exactTest' results 0 7.6 years ago by Mark Robinson870 Mark Robinson870 wrote: Hi Helen, You can do this manually. Using the example from the exactTest() docs: ----- # generate raw counts from NB, create list object y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4) rownames(y) <- paste("Gene",1:nrow(y),sep=".") group <- factor(c(1,1,2,2)) d <- DGEList(counts=y,group=group,lib.size=rep(1000,4)) # estimate dispersions and find differences in expression d <- estimateCommonDisp(d) d <- estimateTagwiseDisp(d) de <- exactTest(d) ----- Then, something like: outtab <- cbind(de$table, FDR=p.adjust(de$table$p.value,method="BH")) write.table(outtab, "my.txt" ) or: tt <- topTags(de, n=Inf) write.table(tt, "my.txt" ) The latter is sorted, the former isn't. Hope that helps. Regards, Mark On 25.01.2012, at 16:14, NEIL-BERNET Helen wrote: > Dear all, > > I use edgeR for differential analysis of ChIP-seq densities. > I would like to know how to export the results from 'exactTest', including p-values and FDR. > Using 'topTags', I only have the 10 first results, with p-values and FDR. > But when I do 'write.table(mytest$table, ?)', it gives me only the complete set of p-values, but not the FDR. > > Thanks for your help, > > Best regards, > > helen > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ADD COMMENTlink written 7.6 years ago by Mark Robinson870 Mark, topTags(de, n=Inf) is perfect! I could obtain the full table. Thanks a lot! helen Le 25/01/12 16:28, ? Mark Robinson ? <mark.robinson at="" imls.uzh.ch=""> a ?crit : >Hi Helen, > >You can do this manually. Using the example from the exactTest() docs: > >----- > # generate raw counts from NB, create list object > y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4) > rownames(y) <- paste("Gene",1:nrow(y),sep=".") > group <- factor(c(1,1,2,2)) > d <- DGEList(counts=y,group=group,lib.size=rep(1000,4)) > > # estimate dispersions and find differences in expression > d <- estimateCommonDisp(d) > d <- estimateTagwiseDisp(d) > de <- exactTest(d) >----- > >Then, something like: > >outtab <- cbind(de$table, FDR=p.adjust(de$table$p.value,method="BH")) >write.table(outtab, "my.txt" ) > >or: > >tt <- topTags(de, n=Inf) >write.table(tt, "my.txt" ) > >The latter is sorted, the former isn't. Hope that helps. > >Regards, >Mark > > >On 25.01.2012, at 16:14, NEIL-BERNET Helen wrote: > >> Dear all, >> >> I use edgeR for differential analysis of ChIP-seq densities. >> I would like to know how to export the results from 'exactTest', >>including p-values and FDR. >> Using 'topTags', I only have the 10 first results, with p-values and >>FDR. >> But when I do 'write.table(mytest\$table, ?)', it gives me only the >>complete set of p-values, but not the FDR. >> >> Thanks for your help, >> >> Best regards, >> >> helen >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > >---------- >Prof. Dr. Mark Robinson >Bioinformatics >Institute of Molecular Life Sciences >University of Zurich >Winterthurerstrasse 190 >8057 Zurich >Switzerland > >v: +41 44 635 4848 >f: +41 44 635 6898 >e: mark.robinson at imls.uzh.ch >o: Y11-J-16 >w: http://tiny.cc/mrobin >