Question: DiffBind - overlap between different peak callers
0
6.6 years ago by
Paolo Kunderfranco350 wrote:
Dear Rory, I finally analyze my data with DiffBind and it seems working well, I was wondering whether is possible or it will be possible in the next release to export in a bed file (wig file) consensus peaks overlapping between two conditions, in my example consensus peaks called by two different peak callers. Cheers, Paolo
diffbind • 1.8k views
modified 6.6 years ago by Rory Stark100 • written 6.6 years ago by Paolo Kunderfranco350
Answer: DiffBind - overlap between different peak callers
0
6.6 years ago by
Rory Stark100
Rory Stark100 wrote:
Hello Paolo- There are a couple of ways of getting the consensus peaksets. If you already have added them to your DBA object (e.g. by using dba.peakset), you can retrieve the overlapping set (and/or write it to a file) using dba.pekaset with the bRetrieve and writefile options: peaks = dba.peakset(DBA, peaksetnum, bRetrieve=T, writefile="overlappingpeaks.bed") You can also get the overlapping peaks using dba.overlap: overlaps = dba.overlap(DBA,c(peaksetnum1,peaksetnum2)) write.table(overlaps$inAll,file="overlappingpeaks.bed") Also, I've added a new feature to dba.peakset to automatically add a set of consensus peaksets all at once. For example, to add the consensus peaksets for all samples that differ only in what peakcaller was used: DBA = dba.peakset(DBA, consensus = -DBA_CALLER) Likewise to add consensus peaksets for each sample type by taking a consensus of their replicates: DBA = dba.peakset(DBA, consensus = -DBA_REPLICATE) In this case, the minus sign means to combine peaksets that have the same metadata EXCEPT a specific attributes (CALLER or REPLICATE). You can also add consensus peaksets for all sets of samples that share specific metadata, eg.: DBA = dba.peakset(DBA, consensus= = c(DBA_TISSUE, DBA_CONDITION)) You can look at just the added peaksets by using the Consensus mask: dba.show(DBA,DBA$masks$Consensus) Hope this is helpful- Rory On 14/09/2012 11:00, "bioconductor-request@r-project.org<mailto :bioconductor-request@r-project.org="">" <bioconductor- request@r-project.org<mailto:bioconductor-request@r-project.org="">> wrote: Send Bioconductor mailing list submissions to bioconductor@r-project.org<mailto:bioconductor@r-project.org> To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request@r-project.org<mailto:bioconductor- request@r-project.org=""> You can reach the person managing the list at bioconductor-owner@r-project.org<mailto:bioconductor- owner@r-project.org=""> When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Re: limma separate channel analysis of your data (Sanogo, Yibayiri O) 2. Re: finding end of file in FASTA file (Martin Morgan) 3. Set plot title in makeVennDiagram (ChIPpeakAnno) (Ant?nio Miguel de Jesus Domingues) 4. Re: Set plot title in makeVennDiagram (ChIPpeakAnno) (Zhu, Lihua (Julie)) 5. Set plot title in makeVennDiagram (ChIPpeakAnno) (Ant?nio Miguel de Jesus Domingues) 6. Re: Set plot title in makeVennDiagram (ChIPpeakAnno) (Zhu, Lihua (Julie)) 7. creating GSEA files using biomart (Juliet Hannah) 8. Re: creating GSEA files using biomart (Steffen Durinck) 9. Re: GWASTools mendelList Constructor (Samuel Younkin) 10. DiffBind - sample sheet for multiple replicates and peak callers (Ant?nio Miguel de Jesus Domingues) 11. Re: creating GSEA files using biomart (Juliet Hannah) 12. Re: topGO: full-length descriptions in graph output (James W. MacDonald) 13. How to design matrix that account to effect of transgenic event in edgeR ? (Sermsawat Tunlaya-Anukit) 14. Re: limma separate channel analysis of your data (Gordon K Smyth) 15. edgeR: confusing BCV plot (Gordon K Smyth) 16. Install devel version of edgeR (Gordon K Smyth) 17. Re: HTqPCR error (Heidi Dvinge) 18. Re: Install devel version of edgeR (Gordon K Smyth) 19. Re: DiffBind - overlap between different peak callers (Paolo Kunderfranco) 20. Re: DiffBind - sample sheet for multiple replicates and peak (Paolo Kunderfranco) 21. DiffBind - error in dba.count (Ant?nio Miguel de Jesus Domingues) ---------------------------------------------------------------------- Message: 1 Date: Thu, 13 Sep 2012 10:12:58 +0000 From: "Sanogo, Yibayiri O" <sanogo@illinois.edu<mailto:sanogo@illinois.edu>> To: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> Cc: Bioconductor mailing list <bioconductor@r-project.org<mailto:bioconductor@r-project.org>>, Osee Sanogo <sanogo@life.illinois.edu<mailto:sanogo@life.illinois.edu>> Subject: Re: [BioC] limma separate channel analysis of your data Message-ID: <dd4a5aa12039fb46b3ac4eafff27533c1122fcc4@citesmbx4.ad.uillino is.edu<mailto:dd4a5aa12039fb46b3ac4eafff27533c1122fcc4@citesmbx4.ad.ui="" llinois.edu="">> Content-Type: text/plain Dear Gordon, In the code below, design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) shouldn't we include the intracorrelation coefficient in the single channel model? Is the following code correct? corfit <- intraspotCorrelation(MA, design) fit <- lmscFit(MA,design, correlation=corfit$consensus) Please let me know if this is the right way of doing it. Thank you. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>] Sent: Tuesday, September 11, 2012 8:13 PM To: Sanogo, Yibayiri O Cc: Osee Sanogo; Bioconductor mailing list Subject: limma separate channel analysis of your data Dear Osee, Thank you for attaching the revised design of your experiment. I found it somewhat tricky to analyse, so my recommendations are more involved than I usually give. I think that you will need to perform a separate channel analysis, otherwise it will not be possible to make baseline comparisons between the brain parts. I also think that you may need to re-normalize your data using the limma package. You say that the data has been "loess/scale normalized into an expression set", but I don't think that scale normalization should be necessary for Agilent data, and an expression set is not fully suitable for two colour data. The following code assumes that you have normalized the data into an MAList object using the limma package. I will call it 'MA'. I strongly recommend that you use normexp background correction and loess normalization as described in the limma User's Guide for GenePix data. The first step is to put your targets data into separate channel format: targets <- read.delim("Design_Brain.txt") Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) Treat <- ifelse(Treat>0,"Exp","Ctrl") Treat <- factor(Treat) Brain <- rep(targets$Brain.Part,each=2,len=48) Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) Dye <- rep(0:1,len=48) data.frame(Fish,Brain,Treat,Dye) Now you can fit a linear model, correcting for probe-specific dye- bias, correcting for any differences between the fish, and computing Treatment vs Control effects for each brain part: design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) Now you can extract genes that have a treatment effect. To find genes that have a treatment vs control effect in telencephalon, you would use: topTable(fit, coef="BrainT:TreatExp") For genes DE for treatment vs control in brain stem, it would be topTable(fit, coef="BrainBS:TreatExp") And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: Dear Gordon, >From what you said, it seems that I am oversimplifying my experiment by attempting to analyze it with RankProd, which doesn't offer the option for complex modeling. Could you please explain to me how I could analyze the experiment using Limma? Please let me know if you'd like me to provide further details of the experiment. Thank you so much. Osee On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Osee, No, you can't use removeBatchEffect to control for dye bias. Can you ignore the dye effect? Not in general, but who knows? Your experiment seems too complex to be properly analysed using RankProd. For one thing, it seems clear that you have obtained multiple parts of the brain from the same biological replicates, meaning that your samples are paired by fish number. I could explain how to analyse this experiment using limma. However, if you are determined that you will use RankProd, it might be best to email the authors of that package for advice. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Sun, 9 Sep 2012, Osee Sanogo wrote: Dear Gordon, Thank you for getting back to me about my questions. My experiment is trying to identify differentially expressed genes in four regions of the brain in response to a stressor. I have 6 biol. replicates in each brain region for the control and experimental groups in each region, and the comparison is being done within brain region (i.e., T control vs T exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were run in two-color Agilent Array. You're right that the design I sent was from the separate channel analysis, in which I attempted to account for array and dye effect, and then run the data in RankProd. Now I know that this is not right. Ok, I will use the single channel analysis in Limma. I still would like to run the two-channel data (ratios) in RankProd, as my previous experience found this useful for my dara (low replicate numbers). My questions: 1) Could I use RemoveBatchEffect to control for dye bias before running the two-channel data in RankProd? If yes, how should I do this using the RemoveBatch Effect function? 2) I found that about 3% of my probes have dye effect. Can I omit controlling for dye effect without compromising the result of my analysis? The data were loess/scale normalized into an expression set (Data_RP). Here is the design of the experiment FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length 1 1T.gpr 1 -1 1 2 T 39 0.63 2 2T.gpr -1 1 2 1 T 39 0.63 3 3T.gpr 1 -1 3 4 T 39 0.63 4 4T.gpr -1 1 4 3 T 39 0.63 5 5T.gpr 1 -1 5 6 T 39 0.63 6 6T.gpr -1 1 6 5 T NA NA 7 1D.gpr -1 1 1 5 D 47 1.21 8 2D.gpr 1 -1 2 4 D 47 1.21 9 3D.gpr -1 1 3 1 D 47 1.21 10 4D.gpr 1 -1 4 6 D 47 1.21 11 5D.gpr -1 1 5 3 D 47 1.21 12 6D.gpr 1 -1 6 2 D NA NA 13 1C.gpr 1 -1 1 4 C 47 1.31 14 2C.gpr -1 1 2 3 C 47 1.31 15 3C.gpr 1 -1 3 6 C 47 1.31 16 4C.gpr -1 1 4 5 C 47 1.31 17 5C.gpr 1 -1 5 2 C 47 1.31 18 6C.gpr -1 1 6 1 C NA NA 19 1BS.gpr -1 1 1 1 BS 89 1.44 20 2BS.gpr 1 -1 2 2 BS 89 1.44 21 3BS.gpr -1 1 3 3 BS 89 1.44 22 4BS.gpr 1 -1 4 4 BS NA NA 23 5BS.gpr -1 1 5 5 BS NA NA 24 6BS.gpr 1 -1 6 6 BS NA NA Thank you for your help and please let me know if you need further explanation of the experiment. Best regards, Osee On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Osee, You are attempting to do a number of things that don't seem correct to me. First, you seem to attempting a separate channel analysis of two color microarray data, but ignoring the pairing of the red and green channels. It isn't correct to do this. I don't see any way to use RankProd, or any other package designed for independent samples, correctly in this context. If you must do a separate channel analysis, you would be better off using the separate channel analysis facilities of the limma package. Second, when you set batch=rep(1,24), you are defining a batch that consists of every array in your data set. Obviously it doesn't make sense to remove batch effects unless there are at least two batches. Third, I don't follow your design matrix. It would be better to go back to the start, and describe in more basic terms what is the nature of your data and what comparison you are trying to make. Best wishes Gordon Date: Sat, 8 Sep 2012 11:40:45 +0000 From: "Sanogo, Yibayiri O" <sanogo@illinois.edu<mailto:sanogo@illinois.edu>> To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package Dear Members of the list, (I apologize for posting this again -I sent it earlier to the list but from another account and I was listed me as non-Member -and Member I am since 2008:-)). I have been using Rank Prod to analyze Agilent two-color data. However, I would like to remove the dye effect prior to analysis. I read on the forum that RemoveBatchEffect should be used in the Limma linear model, a type of modeling that is not in Rank Product. I have two questions: 1) Would it be appropriate to use RemoveBatchEffect to correct for dye effect prior to running the expression data using Rank Prod? 2) a) If no, what other function could I use to do this? b) If yes, I would like a help with the correct design and how to properly indicate the batch. Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are different areas of the brain): design1 BS C D T 1 0 0 0 1 2 0 0 0 -1 3 0 0 0 1 4 0 0 0 -1 5 0 0 0 1 6 0 0 0 -1 7 0 0 -1 0 8 0 0 1 0 9 0 0 -1 0 10 0 0 1 0 11 0 0 -1 0 12 0 0 1 0 13 0 1 0 0 14 0 -1 0 0 15 0 1 0 0 16 0 -1 0 0 17 0 1 0 0 18 0 -1 0 0 19 -1 0 0 0 20 1 0 0 0 21 -1 0 0 0 22 1 0 0 0 23 -1 0 0 0 24 1 0 0 0 attr(,"assign") [1] 1 1 1 1 I've tried this (Data_RP are my data, the M values of the expression set): DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, design=design1) but it is returning an error message " Error in contr.sum(levels(batch)) : not enough degrees of freedom to define contrasts" Please help me correct this code. Thank you so much for your help. Osee -- -- -- Y. Osee Sanogo Integrative Biology Institute for Genomic Biology University of Illinois at Urbana 505 S. Goodwin Ave Urbana, IL-61801 Tel: 217-333 2308 (Office) 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}} ------------------------------ Message: 2 Date: Thu, 13 Sep 2012 05:58:35 -0700 From: Martin Morgan <mtmorgan@fhcrc.org<mailto:mtmorgan@fhcrc.org>> To: "Jack [guest]" <guest@bioconductor.org<mailto:guest@bioconductor.org>> Cc: mailme842@gmail.com<mailto:mailme842@gmail.com>, bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] finding end of file in FASTA file Message-ID: <5051D87B.7060307@fhcrc.org<mailto:5051d87b.7060307@fhcrc.org>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 09/13/2012 01:42 AM, Jack [guest] wrote: library(ShortRead) fastadata <- readFasta("fastafolder", "fa$") file <- tempfile() writeFasta(fastadata, file) var1 <- readLines(file) while(countlength(tmp <- readLines(file, n = -1)) > 0) { #do something } I want the while loop to run till the end of file is reached, but the while statement dosent work. Thanks for help. Hi Jack -- if the goal is to read the fasta file in chunks, use a 'connection' that can remember the current location. After running the following to get a reproducible example fasta file library(ShortRead) example(readFasta) fl = dir(analysisPath(sp), "s_1_sequence.txt", full=TRUE) we can create a connection and open it, and the do our loop reading 500 lines at a time con <- file(fl); open(con) while(length(res <- readLines(con, n=500))) cat(length(res), "\n") close(con) which prints out 500 500 24 Unfortunately, readFasta doesn't work on connections (that would be a worthwhile feature request). There is also FaFile in Rsamtools, try example(FaFile) FaFile is most useful when the fasta file would benefit from being indexed, e.g., hundreds of contigs, but might also be useful for your purposes. Martin Regards Jack -- output of sessionInfo(): sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5 Rsamtools_1.8.6 lattice_0.20-10 Biostrings_2.24.1 GenomicRanges_1.8.13 [8] IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3 stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ------------------------------ Message: 3 Date: Thu, 13 Sep 2012 15:03:09 +0200 From: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno) Message-ID: <capacvobkwtdhug9m0e1prpk3qpfctmaqc6vemqzpi4npago7ca@mail.gmai l.com<mailto:capacvobkwtdhug9m0e1prpk3qpfctmaqc6vemqzpi4npago7ca@mail.="" gmail.com="">> Content-Type: text/plain Hi Bioconducters, I am comparing 2 list of ChIP peaks obtained with different peak callers and would like to change some of the graphical options of the plot created by makeVennDiagram. Crucially I would like to add a title to the plot. Is there any way to change the graphical options of makeVennDiagram? Cheers, Ant?nio -- -- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]] ------------------------------ Message: 4 Date: Thu, 13 Sep 2012 13:35:40 +0000 From: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> To: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>, "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno) Message-ID: <cc775a0a.2227%julie.zhu@umassmed.edu<mailto:cc775a0a.2227 %julie.zhu@umassmed.edu="">> Content-Type: text/plain; charset="iso-8859-1" Anotonio, Thanks for the great suggestion! Currently makeVennDiagram only supports limited plotting options, i.e., cex and count.col. I will add the title in my to do list. Best regards, Julie On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: Hi Bioconducters, I am comparing 2 list of ChIP peaks obtained with different peak callers and would like to change some of the graphical options of the plot created by makeVennDiagram. Crucially I would like to add a title to the plot. Is there any way to change the graphical options of makeVennDiagram? Cheers, Ant?nio ------------------------------ Message: 5 Date: Thu, 13 Sep 2012 15:41:32 +0200 From: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno) Message-ID: <capacvoak_6meyap5j+kcwhjx- u2nfwm3dpmql="_-zCdNfbG3_A@mail.gmail.com&lt;mailto:CAPaCvoAK_6MEYAP5J" +kcwhjx-u2nfwm3dpmql="_-zCdNfbG3_A@mail.gmail.com">> Content-Type: text/plain Hi Julie, Thanks! If it is not asking too much and how much work it takes, but adding colours to the background of the circles would also be nice for presentations and papers. Basically some possibilities along the lines of the R package VennDiagram (http://www.biomedcentral.com/1471-2105/12/35) Good package you've developed btw. Ant?nio On 13 September 2012 15:35, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>wrote: Anotonio, Thanks for the great suggestion! Currently makeVennDiagram only supports limited plotting options, i.e., cex and count.col. I will add the title in my to do list. Best regards, Julie On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: > Hi Bioconducters, > > I am comparing 2 list of ChIP peaks obtained with different peak callers > and would like to change some of the graphical options of the plot created > by makeVennDiagram. Crucially I would like to add a title to the plot. > > Is there any way to change the graphical options of makeVennDiagram? > > Cheers, > Ant?nio -- -- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]] ------------------------------ Message: 6 Date: Thu, 13 Sep 2012 13:45:32 +0000 From: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> To: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno) Message-ID: <cc775c59.222c%julie.zhu@umassmed.edu<mailto:cc775c59.222c %julie.zhu@umassmed.edu="">> Content-Type: text/plain Will do! Thanks for the positive feedback and great suggestion! Best regards, Julie On 9/13/12 9:41 AM, "Ant?nio Miguel de Jesus Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: Hi Julie, Thanks! If it is not asking too much and how much work it takes, but adding colours to the background of the circles would also be nice for presentations and papers. Basically some possibilities along the lines of the R package VennDiagram (http://www.biomedcentral.com/1471-2105/12/35) Good package you've developed btw. Ant?nio On 13 September 2012 15:35, Zhu, Lihua (Julie) <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> wrote: Anotonio, Thanks for the great suggestion! Currently makeVennDiagram only supports limited plotting options, i.e., cex and count.col. I will add the title in my to do list. Best regards, Julie On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: Hi Bioconducters, I am comparing 2 list of ChIP peaks obtained with different peak callers and would like to change some of the graphical options of the plot created by makeVennDiagram. Crucially I would like to add a title to the plot. Is there any way to change the graphical options of makeVennDiagram? Cheers, Ant?nio [[alternative HTML version deleted]] ------------------------------ Message: 7 Date: Thu, 13 Sep 2012 11:29:06 -0400 From: Juliet Hannah <juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] creating GSEA files using biomart Message-ID: <calzuzrtgppvifd4+n43bg7efgvrl8uq0nghxe4wd+gouow3fug@mail.gmai l.com<mailto:calzuzrtgppvifd4+n43bg7efgvrl8uq0nghxe4wd+gouow3fug@mail.="" gmail.com="">> Content-Type: text/plain; charset=ISO-8859-1 All, I am trying to create the GSEA chip file. This example uses Affy data, and the chip file is already available. I'm doing this as an exercise in preparation for other platforms. The chip file should look like: Probe Set ID Gene Symbol Gene Title 244901_at ORF25 hypothetical protein 244902_at NAD4L NADH dehydrogenase subunit 4L 244912_at CCB382 cytochrome c biogenesis orf382 244919_at CCB203 cytochrome c biogenesis orf203 244925_at NAD7 NADH dehydrogenase subunit 7 How can I obtain the third column from biomart. I tried searching the attributes, but couldn't find the right name. Is it a matter of trial and error to find the correct attribute, or are there systematic ways to find it. Here is what I have so far: library("biomaRt") probeSets <- c("219666_at", "220547_s_at", "218034_at") ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters = "affy_hg_u133a",values = probeSets, mart = ensembl) Also, does anyone have any suggestions regarding how to handle the duplicates (seen in this example) with respect to GSEA. Thanks, Juliet Hannah ------------------------------ Message: 8 Date: Thu, 13 Sep 2012 08:42:45 -0700 From: Steffen Durinck <durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>> To: Juliet Hannah <juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>> Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] creating GSEA files using biomart Message-ID: <capb0=vm67-3=paowcbk18etqjuukbre5+zbnonmqoqcbsyfpvq@mail.gmai l.com<mailto:capb0="Vm67-3=PaOWcBK18EtQJuUKBRE5+zBNONMQoqCbSYfPvQ@mail." gmail.com="">> Content-Type: text/plain Hi Juliet, The third attribute you're looking for is 'description': idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol","description"), filters ="affy_hg_u133a",values = probeSets, mart = ensembl) Gives: affy_hg_u133a hgnc_symbol description 1 219666_at MS4A6A membrane-spanning 4-domains, subfamily A, member 6A [Source:HGNC Symbol;Acc:13375] 2 220547_s_at FAM35B family with sequence similarity 35, member B [Source:HGNC Symbol;Acc:31425] 3 218034_at FIS1 fission 1 (mitochondrial outer membrane) homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:21689] 4 220547_s_at FAM35B2 family with sequence similarity 35, member B2 (pseudogene) [Source:HGNC Symbol;Acc:34038] 5 220547_s_at FAM35A family with sequence similarity 35, member A [Source:HGNC Symbol;Acc:28773] There is no systematic way to figure out with attribute name you need to use all you have is the attribute name and a description of the attribute. The more you get used to looking at those, the easier it gets to figure out which one you need and once you know the attributes you need, often you'll be using a similar set of attributes most of the time It is interesting to see in your example that one probeset maps to three different but closely related genes. In the past I thought Ensembl would remove such unambiguous mappers. I think the best to do in this case is to remove all probes that map to multiple genes as there is no way to tell which gene you'll be measuring. I'll report this example to the Ensembll team as they used to do this for us. Cheers, Steffen On Thu, Sep 13, 2012 at 8:29 AM, Juliet Hannah <juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>wrote: All, I am trying to create the GSEA chip file. This example uses Affy data, and the chip file is already available. I'm doing this as an exercise in preparation for other platforms. The chip file should look like: Probe Set ID Gene Symbol Gene Title 244901_at ORF25 hypothetical protein 244902_at NAD4L NADH dehydrogenase subunit 4L 244912_at CCB382 cytochrome c biogenesis orf382 244919_at CCB203 cytochrome c biogenesis orf203 244925_at NAD7 NADH dehydrogenase subunit 7 How can I obtain the third column from biomart. I tried searching the attributes, but couldn't find the right name. Is it a matter of trial and error to find the correct attribute, or are there systematic ways to find it. Here is what I have so far: library("biomaRt") probeSets <- c("219666_at", "220547_s_at", "218034_at") ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters = "affy_hg_u133a",values = probeSets, mart = ensembl) Also, does anyone have any suggestions regarding how to handle the duplicates (seen in this example) with respect to GSEA. Thanks, Juliet Hannah _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ------------------------------ Message: 9 Date: Thu, 13 Sep 2012 11:53:28 -0400 From: Samuel Younkin <syounkin@jhsph.edu<mailto:syounkin@jhsph.edu>> To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] GWASTools mendelList Constructor Message-ID: <50520178.7020205@jhsph.edu<mailto:50520178.7020205@jhsph.edu>> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Thanks Stephanie, your response was very helpful. One thing I found was that the offspring, father and mother vectors must be character vectors, not factor vectors. If they are factors mendelList() will not work. One must be careful not to let them be coerced into factors, say during construction of the data frame. The inclusion of stringsAsFactors=FALSE is necessary in the below code. Thanks, Sam On 09/12/2012 11:37 AM, Stephanie M. Gogarten wrote: Hi Sam, Sorry that the documentation for this function is not very clear - I will work on revising it for the next release at the beginning of October. The offspring, mother, and father vectors should be subject IDs, which are not required to be integers. The scanID vector should be the scan ID of the offspring, which should be integers. I suspect your problem might be that all subjects with genotypes must have an entry in all the vector arguments to mendelList - think of the arguments as being columns of a data frame that contains all your subjects. This is because the mendelList function needs to know the scan IDs of the father and mother as well, and all the vector arguments must have the same length. Here is a toy example of some IDs that produce a valid mendelList object: > dat <- data.frame(family=c(1,1,1), offspring=c("a","b","c"), father=c("b",0,0), mother=c("c",0,0), sex=c("M","M","F"), scanID=1:3, stringsAsFactors=FALSE) > dat family offspring father mother sex scanID 1 1 a b c M 1 2 1 b 0 0 M 2 3 1 c 0 0 F 3 > mendelList(dat$family, dat$offspring, dat$father, dat$mother, dat$sex, dat$scanID)$1 $1$a offspring father mother 1 1 2 3 attr(,"class") [1] "mendelList" If you only include the first row, without separate entries for the father and mother, you get a NULL: > dat <- dat[1,] > dat family offspring father mother sex scanID 1 1 a b c M 1 > mendelList(dat$family, dat$offspring, dat$father, dat$mother, dat$sex, dat$scanID) NULL If this doesn't solve your problem, please send a small example of your code that gives a NULL result. Stephanie -------- Original Message -------- Subject: [BioC] GWASTools mendelList Constructor Date: Wed, 12 Sep 2012 10:36:58 -0400 From: Samuel Younkin <syounkin@jhsph.edu<mailto:syounkin@jhsph.edu>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> BioC, I am trying to use the Mendelian error checking feature of the Bioconductor package GWASTools. When I try to create a mendelList object using the constructor function mendelList(), it consistently returns NULL with no warning or error message. I have tried changing the subject ids to character, or integer, or numeric, but I continue to get NULL. Without any error message to work with I am at a loss. Any advice? Should the offspring, mother and father vectors be scan ids or scan/subject names? My scan names are not numbers and so these vectors would not satisfy the requirement in the manual of "a vector of offspring/father/mother ID numbers." Should the scanID vector be the scan id of the offspring? Any advice would be greatly appreciated. Thanks, Sam > sessionInfo() R version 2.15.1 Patched (2012-07-01 r59713) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] MinimumDistance_1.1.20 CleftCNVAssoc_0.0.1 GenomicRanges_1.9.63 [4] IRanges_1.15.43 GWASTools_1.2.1 sandwich_2.2-9 [7] zoo_1.7-7 GWASExactHW_1.0 ncdf_1.6.6 [10] Biobase_2.17.6 BiocGenerics_0.3.1 BiocInstaller_1.4.7 loaded via a namespace (and not attached): [1] affyio_1.25.0 annotate_1.35.3 AnnotationDbi_1.19.35 [4] Biostrings_2.25.8 bit_1.1-8 codetools_0.2-8 [7] crlmm_1.15.18 DBI_0.2-5 DNAcopy_1.31.1 [10] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0 [13] genefilter_1.39.0 grid_2.15.1 iterators_1.0.6 [16] lattice_0.20-6 lmtest_0.9-30 msm_1.1.1 [19] mvtnorm_0.9-9992 oligoClasses_1.19.41 preprocessCore_1.19.0 [22] RSQLite_0.11.1 SNPchip_2.3.13 splines_2.15.1 [25] stats4_2.15.1 survival_2.36-14 tools_2.15.1 [28] VanillaICE_1.19.23 XML_3.9-4 xtable_1.7-0 [31] zlibbioc_1.3.0 > _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 10 Date: Thu, 13 Sep 2012 18:06:05 +0200 From: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] DiffBind - sample sheet for multiple replicates and peak callers Message-ID: <capacvobu89nekmnu- 2jojczybqwdhd="ut=b5O4hL10k+axXgNQ@mail.gmail.com&lt;mailto" :capacvobu89nekmnu-2jojczybqwdhd="ut=b5O4hL10k+axXgNQ@mail.gmail.com">> Content-Type: text/plain Hi all, I am trying to use DiffBind to compare peaks called in control vs condition. I have 2 replicates for each and I've also called peaks using 2 different peak callers (to wi, MACS and QuEST). I've also prepared a sample data sheet that looks like this: SampleID Tissue Factor Condition Replicate Peak.caller bamReads bamControl Peaks control Hela TF wt 1 MACS path path path control Hela TF wt 1 QuEST path path path control2 Hela TF wt 2 MACS path path path control 2 Hela TF wt 2 QuEST path path path (and the same for the conditions) My plan was to load all the data and then using diffbind selecte a set of common peaks for the peak callers before proceeding with the analysis. However, when I load the data (data = dba(sampleSheet="samplesheet.csv")) the peaks for each caller are not recognized as a different variable. How I can do that and is this silly? I could also derive a set of common peaks independently but it would be neat to do it all with the same package and that seems to be possible but I could not find how to do it in the documentation. Thanks, Ant?nio -- -- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]] ------------------------------ Message: 11 Date: Thu, 13 Sep 2012 12:08:32 -0400 From: Juliet Hannah <juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>> To: Steffen Durinck <durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>> Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] creating GSEA files using biomart Message-ID: <calzuzrqhues-gbza- q1jth0pgm6ohggbha_dqvwuug3edv1wbg@mail.gmail.com<mailto:calzuzrqhues-="" gbza-q1jth0pgm6ohggbha_dqvwuug3edv1wbg@mail.gmail.com="">> Content-Type: text/plain; charset=ISO-8859-1 Thanks Steffen for the helpful answers. "description", how embarrassing! On Thu, Sep 13, 2012 at 11:42 AM, Steffen Durinck <durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>> wrote: Hi Juliet, The third attribute you're looking for is 'description': idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol","description"), filters ="affy_hg_u133a",values = probeSets, mart = ensembl) Gives: affy_hg_u133a hgnc_symbol description 1 219666_at MS4A6A membrane-spanning 4-domains, subfamily A, member 6A [Source:HGNC Symbol;Acc:13375] 2 220547_s_at FAM35B family with sequence similarity 35, member B [Source:HGNC Symbol;Acc:31425] 3 218034_at FIS1 fission 1 (mitochondrial outer membrane) homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:21689] 4 220547_s_at FAM35B2 family with sequence similarity 35, member B2 (pseudogene) [Source:HGNC Symbol;Acc:34038] 5 220547_s_at FAM35A family with sequence similarity 35, member A [Source:HGNC Symbol;Acc:28773] There is no systematic way to figure out with attribute name you need to use all you have is the attribute name and a description of the attribute. The more you get used to looking at those, the easier it gets to figure out which one you need and once you know the attributes you need, often you'll be using a similar set of attributes most of the time It is interesting to see in your example that one probeset maps to three different but closely related genes. In the past I thought Ensembl would remove such unambiguous mappers. I think the best to do in this case is to remove all probes that map to multiple genes as there is no way to tell which gene you'll be measuring. I'll report this example to the Ensembll team as they used to do this for us. Cheers, Steffen On Thu, Sep 13, 2012 at 8:29 AM, Juliet Hannah <juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>> wrote: All, I am trying to create the GSEA chip file. This example uses Affy data, and the chip file is already available. I'm doing this as an exercise in preparation for other platforms. The chip file should look like: Probe Set ID Gene Symbol Gene Title 244901_at ORF25 hypothetical protein 244902_at NAD4L NADH dehydrogenase subunit 4L 244912_at CCB382 cytochrome c biogenesis orf382 244919_at CCB203 cytochrome c biogenesis orf203 244925_at NAD7 NADH dehydrogenase subunit 7 How can I obtain the third column from biomart. I tried searching the attributes, but couldn't find the right name. Is it a matter of trial and error to find the correct attribute, or are there systematic ways to find it. Here is what I have so far: library("biomaRt") probeSets <- c("219666_at", "220547_s_at", "218034_at") ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters = "affy_hg_u133a",values = probeSets, mart = ensembl) Also, does anyone have any suggestions regarding how to handle the duplicates (seen in this example) with respect to GSEA. Thanks, Juliet Hannah _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 12 Date: Thu, 13 Sep 2012 13:34:24 -0400 From: "James W. MacDonald" <jmacdon@uw.edu<mailto:jmacdon@uw.edu>> To: "Frank Schwach [guest]" <guest@bioconductor.org<mailto:guest@bioconductor.org>> Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] topGO: full-length descriptions in graph output Message-ID: <50521920.5000104@uw.edu<mailto:50521920.5000104@uw.edu>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Frank, On 9/13/2012 4:29 AM, Frank Schwach [guest] wrote: Hi, does anybody know if there is any way in the topGO package to force full-length descriptions of the GO terms to be printed in the graph? By default, only about 20 characters of the descriptions are printed. If you look at the help page for showSigOfNodes(), there is an argument .NO.CHAR that is listed as an internal function. This controls the number of characters that are shown in the graph, and defaults to 20. You can bump that up to whatever value you see fit, but note that it *is* listed as an internal function, so you are stepping into the deep end of the pool - if things don't work the way you want, it will likely be up to you to figure out why and fix it. Best, Jim Thanks for your help! Frank -- output of sessionInfo(): R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=C [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] topGO_1.16.2 SparseM_0.85 GO.db_2.4.1 [4] RSQLite_0.9-1 DBI_0.2-5 AnnotationDbi_1.10.1 [7] Biobase_2.8.0 graph_1.26.0 loaded via a namespace (and not attached): [1] grid_2.11.1 lattice_0.18-8 tools_2.11.1 -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------ Message: 13 Date: Thu, 13 Sep 2012 14:29:12 -0400 From: Sermsawat Tunlaya-Anukit <stunlay@ncsu.edu<mailto:stunlay@ncsu.edu>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] How to design matrix that account to effect of transgenic event in edgeR ? Message-ID: <cahvcje69smyfrui_r1cqd33fe_ygm7drut7dx468wjae6vgueq@mail.gmai l.com<mailto:cahvcje69smyfrui_r1cqd33fe_ygm7drut7dx468wjae6vgueq@mail.="" gmail.com="">> Content-Type: text/plain I would like to ask about design matrix in GLM model in edgeR. I have group of transgenic down regulate gene. We have 3 lines of transgnics per down regulate gene which different level (event) of down regulate gene and in each line we have 3 replicate. This means we have 9 sample per 1 gene downregulate. This experiment also have double down regulate (2 genes down regulate at the same time). We would like to find gene that change when we down regulate target gene. I create event1-9 to seperate the transgenic event. I try design matrix as show below and can't perform GLM. It show " Error in solve.default(R, t(beta)) : system is computationally singular: reciprocal condition number = 3.13057e-17 If i change the design metric to column1-3 (effect of mean,geneA and geneB), this new design matric work well. How can i desing matric that account for effect of transgnic event? mean geneA geneB event1 event2 event3 event4 event5 event6 event7 event8 event9 wt 1 0 0 0 0 0 0 0 0 0 0 0 wt 1 0 0 0 0 0 0 0 0 0 0 0 wt 1 0 0 0 0 0 0 0 0 0 0 0 down_a 1 1 0 1 0 0 0 0 0 0 0 0 down_a 1 1 0 1 0 0 0 0 0 0 0 0 down_a 1 1 0 1 0 0 0 0 0 0 0 0 down_a 1 1 0 0 1 0 0 0 0 0 0 0 down_a 1 1 0 0 1 0 0 0 0 0 0 0 down_a 1 1 0 0 1 0 0 0 0 0 0 0 down_a 1 1 0 0 0 1 0 0 0 0 0 0 down_a 1 1 0 0 0 1 0 0 0 0 0 0 down_a 1 1 0 0 0 1 0 0 0 0 0 0 down_b 1 0 1 0 0 0 1 0 0 0 0 0 down_b 1 0 1 0 0 0 1 0 0 0 0 0 down_b 1 0 1 0 0 0 1 0 0 0 0 0 down_b 1 0 1 0 0 0 0 1 0 0 0 0 down_b 1 0 1 0 0 0 0 1 0 0 0 0 down_b 1 0 1 0 0 0 0 1 0 0 0 0 down_b 1 0 1 0 0 0 0 0 1 0 0 0 down_b 1 0 1 0 0 0 0 0 1 0 0 0 down_b 1 0 1 0 0 0 0 0 1 0 0 0 down_ab 1 1 1 0 0 0 0 0 0 1 0 0 down_ab 1 1 1 0 0 0 0 0 0 1 0 0 down_ab 1 1 1 0 0 0 0 0 0 1 0 0 down_ab 1 1 1 0 0 0 0 0 0 0 1 0 down_ab 1 1 1 0 0 0 0 0 0 0 1 0 down_ab 1 1 1 0 0 0 0 0 0 0 1 0 down_ab 1 1 1 0 0 0 0 0 0 0 0 1 down_ab 1 1 1 0 0 0 0 0 0 0 0 1 down_ab 1 1 1 0 0 0 0 0 0 0 0 1 Thank you very much Sermsawat T. [[alternative HTML version deleted]] ------------------------------ Message: 14 Date: Fri, 14 Sep 2012 08:57:36 +1000 (AUS Eastern Standard Time) From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> To: "Sanogo, Yibayiri O" <sanogo@illinois.edu<mailto:sanogo@illinois.edu>> Cc: Bioconductor mailing list <bioconductor@r-project.org<mailto:bioconductor@r-project.org>>, Osee Sanogo <sanogo@life.illinois.edu<mailto:sanogo@life.illinois.edu>> Subject: Re: [BioC] limma separate channel analysis of your data Message-ID: <pine.wnt.4.64.1209140856110.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140856110.6004@pc765.wehi.edu.au="">> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Dear Osee, Yes, your code is correct. I forgot to add the intraspotCorrelation estimation. Best wishes Gordon On Thu, 13 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, In the code below, design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) shouldn't we include the intracorrelation coefficient in the single channel model? Is the following code correct? corfit <- intraspotCorrelation(MA, design) fit <- lmscFit(MA,design, correlation=corfit$consensus) Please let me know if this is the right way of doing it. Thank you. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>] Sent: Tuesday, September 11, 2012 8:13 PM To: Sanogo, Yibayiri O Cc: Osee Sanogo; Bioconductor mailing list Subject: limma separate channel analysis of your data Dear Osee, Thank you for attaching the revised design of your experiment. I found it somewhat tricky to analyse, so my recommendations are more involved than I usually give. I think that you will need to perform a separate channel analysis, otherwise it will not be possible to make baseline comparisons between the brain parts. I also think that you may need to re-normalize your data using the limma package. You say that the data has been "loess/scale normalized into an expression set", but I don't think that scale normalization should be necessary for Agilent data, and an expression set is not fully suitable for two colour data. The following code assumes that you have normalized the data into an MAList object using the limma package. I will call it 'MA'. I strongly recommend that you use normexp background correction and loess normalization as described in the limma User's Guide for GenePix data. The first step is to put your targets data into separate channel format: targets <- read.delim("Design_Brain.txt") Treat <- as.vector(t(targets[,c("Cy3","Cy5")])) Treat <- ifelse(Treat>0,"Exp","Ctrl") Treat <- factor(Treat) Brain <- rep(targets$Brain.Part,each=2,len=48) Fish <- factor(rep(targets$Fish.Number,each=2,len=48)) Dye <- rep(0:1,len=48) data.frame(Fish,Brain,Treat,Dye) Now you can fit a linear model, correcting for probe-specific dye- bias, correcting for any differences between the fish, and computing Treatment vs Control effects for each brain part: design <- model.matrix(~Dye+Fish+Brain+Brain:Treat) fit <- lmscFit(MA,design) fit <- eBayes(fit, trend=TRUE) Now you can extract genes that have a treatment effect. To find genes that have a treatment vs control effect in telencephalon, you would use: topTable(fit, coef="BrainT:TreatExp") For genes DE for treatment vs control in brain stem, it would be topTable(fit, coef="BrainBS:TreatExp") And so on. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote: Dear Gordon, Attached please find the revised design of my experiment. Thank you for your willingness to help me analyze these data. Best regards, Osee ________________________________________ From: Sanogo, Yibayiri O Sent: Monday, September 10, 2012 9:51 PM To: Gordon K Smyth; Osee Sanogo Cc: Bioconductor mailing list Subject: RE: Advice with RemoveBatchEffect and Rank Product package Dear Gordon, Please ignore the last two columns (weight and length) of the design for now: I think they are screwed up due to some kind of sorting. I will have to double check them (look in the notebook which is not here now). Only different brain regions (T, D, C, BS) belonging to the same fish should have same weight, length. The other columns are: Fish.Number= corresponds to fish ID, the individual fish (biological replicates) Slide= the Agilent 4X44k slides on each the arrays were located Brain.Part= the different brain regions that were dissected and hybridized (within region). They were T (telencephalon), D (diencephalon), C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control and "1" is experimental. FileName Cy3 Cy5 Fish Number Slide Brain Part 1T.gpr 1 -1 1 2 T 2T.gpr -1 1 2 1 T 3T.gpr 1 -1 3 4 T 4T.gpr -1 1 4 3 T 5T.gpr 1 -1 5 6 T 6T.gpr -1 1 6 5 T 1D.gpr -1 1 1 5 D 2D.gpr 1 -1 2 4 D 3D.gpr -1 1 3 1 D 4D.gpr 1 -1 4 6 D 5D.gpr -1 1 5 3 D 6D.gpr 1 -1 6 2 D 1C.gpr 1 -1 1 4 C 2C.gpr -1 1 2 3 C 3C.gpr 1 -1 3 6 C 4C.gpr -1 1 4 5 C 5C.gpr 1 -1 5 2 C 6C.gpr -1 1 6 1 C 1BS.gpr -1 1 1 1 BS 2BS.gpr 1 -1 2 2 BS 3BS.gpr -1 1 3 3 BS 4BS.gpr 1 -1 4 4 BS 6BS.gpr 1 -1 6 6 BS Thank you for your willingness to help. I will stay up a bit so that I can respond faster to your questions given the time difference between us. Thank you so much. Osee ________________________________________ From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>] Sent: Monday, September 10, 2012 6:34 PM To: Osee Sanogo Cc: Sanogo, Yibayiri O; Bioconductor mailing list Subject: Re: Advice with RemoveBatchEffect and Rank Product package Dear Osee, Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight and Length mean in your experiment? If the first six arrays are from biologically independent samples, why do they have the same weight and length? Best wishes Gordon On Mon, 10 Sep 2012, Osee Sanogo wrote: Dear Gordon, >From what you said, it seems that I am oversimplifying my experiment by attempting to analyze it with RankProd, which doesn't offer the option for complex modeling. Could you please explain to me how I could analyze the experiment using Limma? Please let me know if you'd like me to provide further details of the experiment. Thank you so much. Osee On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Osee, No, you can't use removeBatchEffect to control for dye bias. Can you ignore the dye effect? Not in general, but who knows? Your experiment seems too complex to be properly analysed using RankProd. For one thing, it seems clear that you have obtained multiple parts of the brain from the same biological replicates, meaning that your samples are paired by fish number. I could explain how to analyse this experiment using limma. However, if you are determined that you will use RankProd, it might be best to email the authors of that package for advice. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Sun, 9 Sep 2012, Osee Sanogo wrote: Dear Gordon, Thank you for getting back to me about my questions. My experiment is trying to identify differentially expressed genes in four regions of the brain in response to a stressor. I have 6 biol. replicates in each brain region for the control and experimental groups in each region, and the comparison is being done within brain region (i.e., T control vs T exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were run in two-color Agilent Array. You're right that the design I sent was from the separate channel analysis, in which I attempted to account for array and dye effect, and then run the data in RankProd. Now I know that this is not right. Ok, I will use the single channel analysis in Limma. I still would like to run the two-channel data (ratios) in RankProd, as my previous experience found this useful for my dara (low replicate numbers). My questions: 1) Could I use RemoveBatchEffect to control for dye bias before running the two-channel data in RankProd? If yes, how should I do this using the RemoveBatch Effect function? 2) I found that about 3% of my probes have dye effect. Can I omit controlling for dye effect without compromising the result of my analysis? The data were loess/scale normalized into an expression set (Data_RP). Here is the design of the experiment FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length 1 1T.gpr 1 -1 1 2 T 39 0.63 2 2T.gpr -1 1 2 1 T 39 0.63 3 3T.gpr 1 -1 3 4 T 39 0.63 4 4T.gpr -1 1 4 3 T 39 0.63 5 5T.gpr 1 -1 5 6 T 39 0.63 6 6T.gpr -1 1 6 5 T NA NA 7 1D.gpr -1 1 1 5 D 47 1.21 8 2D.gpr 1 -1 2 4 D 47 1.21 9 3D.gpr -1 1 3 1 D 47 1.21 10 4D.gpr 1 -1 4 6 D 47 1.21 11 5D.gpr -1 1 5 3 D 47 1.21 12 6D.gpr 1 -1 6 2 D NA NA 13 1C.gpr 1 -1 1 4 C 47 1.31 14 2C.gpr -1 1 2 3 C 47 1.31 15 3C.gpr 1 -1 3 6 C 47 1.31 16 4C.gpr -1 1 4 5 C 47 1.31 17 5C.gpr 1 -1 5 2 C 47 1.31 18 6C.gpr -1 1 6 1 C NA NA 19 1BS.gpr -1 1 1 1 BS 89 1.44 20 2BS.gpr 1 -1 2 2 BS 89 1.44 21 3BS.gpr -1 1 3 3 BS 89 1.44 22 4BS.gpr 1 -1 4 4 BS NA NA 23 5BS.gpr -1 1 5 5 BS NA NA 24 6BS.gpr 1 -1 6 6 BS NA NA Thank you for your help and please let me know if you need further explanation of the experiment. Best regards, Osee On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Osee, You are attempting to do a number of things that don't seem correct to me. First, you seem to attempting a separate channel analysis of two color microarray data, but ignoring the pairing of the red and green channels. It isn't correct to do this. I don't see any way to use RankProd, or any other package designed for independent samples, correctly in this context. If you must do a separate channel analysis, you would be better off using the separate channel analysis facilities of the limma package. Second, when you set batch=rep(1,24), you are defining a batch that consists of every array in your data set. Obviously it doesn't make sense to remove batch effects unless there are at least two batches. Third, I don't follow your design matrix. It would be better to go back to the start, and describe in more basic terms what is the nature of your data and what comparison you are trying to make. Best wishes Gordon Date: Sat, 8 Sep 2012 11:40:45 +0000 From: "Sanogo, Yibayiri O" <sanogo@illinois.edu<mailto:sanogo@illinois.edu>> To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package Dear Members of the list, (I apologize for posting this again -I sent it earlier to the list but from another account and I was listed me as non-Member -and Member I am since 2008:-)). I have been using Rank Prod to analyze Agilent two-color data. However, I would like to remove the dye effect prior to analysis. I read on the forum that RemoveBatchEffect should be used in the Limma linear model, a type of modeling that is not in Rank Product. I have two questions: 1) Would it be appropriate to use RemoveBatchEffect to correct for dye effect prior to running the expression data using Rank Prod? 2) a) If no, what other function could I use to do this? b) If yes, I would like a help with the correct design and how to properly indicate the batch. Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are different areas of the brain): design1 BS C D T 1 0 0 0 1 2 0 0 0 -1 3 0 0 0 1 4 0 0 0 -1 5 0 0 0 1 6 0 0 0 -1 7 0 0 -1 0 8 0 0 1 0 9 0 0 -1 0 10 0 0 1 0 11 0 0 -1 0 12 0 0 1 0 13 0 1 0 0 14 0 -1 0 0 15 0 1 0 0 16 0 -1 0 0 17 0 1 0 0 18 0 -1 0 0 19 -1 0 0 0 20 1 0 0 0 21 -1 0 0 0 22 1 0 0 0 23 -1 0 0 0 24 1 0 0 0 attr(,"assign") [1] 1 1 1 1 I've tried this (Data_RP are my data, the M values of the expression set): DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL, design=design1) but it is returning an error message " Error in contr.sum(levels(batch)) : not enough degrees of freedom to define contrasts" Please help me correct this code. Thank you so much for your help. Osee -- -- -- Y. Osee Sanogo Integrative Biology Institute for Genomic Biology University of Illinois at Urbana 505 S. Goodwin Ave Urbana, IL-61801 Tel: 217-333 2308 (Office) 217-417 9593 (Cell) ______________________________________________________________________ The information in this email is confidential and inte...{{dropped:10}} ------------------------------ Message: 15 Date: Fri, 14 Sep 2012 09:09:55 +1000 (AUS Eastern Standard Time) From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> To: Natasha Sahgal <nsahgal@well.ox.ac.uk<mailto:nsahgal@well.ox.ac.uk>> Cc: Bioconductor mailing list <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: [BioC] edgeR: confusing BCV plot Message-ID: <pine.wnt.4.64.1209140905340.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140905340.6004@pc765.wehi.edu.au="">> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Date: Wed, 12 Sep 2012 13:43:25 +0000 From: Natasha Sahgal <nsahgal@well.ox.ac.uk<mailto:nsahgal@well.ox.ac.uk>> To: "James W. MacDonald" <jmacdon@uw.edu<mailto:jmacdon@uw.edu>> Cc: "'bioconductor@r-project.org<mailto:'bioconductor@r-project.org>'" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] edgeR: confusing BCV plot Dear Jim, Regarding the BCV plots, what I did not understand was the strange profile (at least strange to me), and the low coefficients of BV. Based on some figures from the user guide, it appeared to be very different - an increase towards the higher logCPM. 1) I'm not sure how to interpret these and if it is a good thing or not? (perhaps I have misunderstood the concept of the BCV) Suggests to me that there is something unusual about your data. Especially the J shape at very low counts, which I have not seen before for RNA-seq data. 2) How does this affect the differential expression, if at all? Genes that have larger estimated dispersions are less likely to be judged to be differentially expressed. Gordon Re the filtering, for some reason I was under the impression increasing the counts per million would reduce (if not remove) zero counts in all samples. I agree with what you say about half the samples being unconstrained. I had 3 filters here, just to see what the difference would be. I still need to figure out the best or optimal one to use. Many Thanks and Best Wishes, Natasha ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ------------------------------ Message: 16 Date: Fri, 14 Sep 2012 09:15:03 +1000 (AUS Eastern Standard Time) From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> To: Javier Sim?n-S?nchez <simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>> Cc: Bioconductor mailing list <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: [BioC] Install devel version of edgeR Message-ID: <pine.wnt.4.64.1209140912460.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140912460.6004@pc765.wehi.edu.au="">> Content-Type: text/plain; charset="x-unknown"; Format="flowed" Dear Javier, This is a question for the Bioc managers, but the following page might answer your question: http://www.bioconductor.org/developers/useDevel/ Best wishes Gordon On Thu, 13 Sep 2012, Javier Sim?n-S?nchez wrote: Hello Gordon, Thanks a lot for your reply. How can I upgrade to the latest version of edge R? I have checked on the list and it looks like i need version 2.99.3. However, the one you provide at http://www.bioconductor.org/packages/2.10/bioc/html/edgeR.html is edgeR_2.6.12. How can I get the needed version? Thanks a lot for your help. On Thu, Sep 13, 2012 at 2:43 AM, Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Javier, This error has been discused a number of times on this list. The solution is to upgrade edgeR to the current devel version. Also please see the Bioconductor posting guide: http://www.bioconductor.org/**help/mailing- list/posting-**guide/<http: www.bioconductor.org="" help="" mailing-list="" posting-guide=""/> Best wishes Gordon Date: Tue, 11 Sep 2012 13:04:55 +0200 From: Javier Sim?n-S?nchez <simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] edgeR GLM error Hello, My name is Javier Sim?n Seanchez and I'm a post-doc at the VUmc in Amsterdam. The reason of this e-mail is that im running edgeR in an expression dataset and getting the following error when calculating the GLM common dispersion: *Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in subscripted assignments * Im running cases versus controls and I want to modulate for different tissues. How can I overcome this error? Thanks a lot in advance -- Javier Simon-Sanchez ______________________________**______________________________**______ ____ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________**______________________________**______ ____ -- Javier Simon-Sanchez ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}} ------------------------------ Message: 17 Date: Fri, 14 Sep 2012 00:26:10 +0100 From: "Heidi Dvinge" <heidi@ebi.ac.uk<mailto:heidi@ebi.ac.uk>> To: "Juan Fern?ndez Tajes" <jfernandezt@udc.es<mailto:jfernandezt@udc.es>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] HTqPCR error Message-ID: <dab804a28f53356c21982e871dcfe0d7.squirrel@webmail.ebi.ac.uk<m ailto:dab804a28f53356c21982e871dcfe0d7.squirrel@webmail.ebi.ac.uk="">> Content-Type: text/plain;charset=utf-8 Dear Heidi, After trying the two combinations that you talked about, just the second one worked, i.e: raw1 <- readCtData(files="Prueba_Blanco001.txt", n.features=96, n.data=96, column.info=list(flag=9, feature=5, type=6, Ct=7, position=1), skip=12, sep="\t") the first one, gave the same previous error. temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96, + format="BioMark") Error en $<-.data.frame(*tmp*, "Call", value = character(0)) : replacement has 0 rows, data has 9216 Okay, so it looks like the order and/or column names in your file is different from the Fluidigm files the readCtData is based on, and the n.features/n.data parameters are working as they should (phew). Could you perhaps send me the header of your file, so I can see what the different is? \Heidi Thanks Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biolog?a Celular y Molecular Facultad de Ciencias-Universidade da Coru?a Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es<mailto:jfernandezt@udc.es> ---------------------------------------------------------------- De: "Heidi Dvinge" <heidi@ebi.ac.uk<mailto:heidi@ebi.ac.uk>> Para: "\"Juan Fern?ndez Tajes\"" <jfernandezt@udc.es<mailto:jfernandezt@udc.es>> Enviados: Jueves, 13 de Septiembre 2012 2:56:43 Asunto: Re: HTqPCR error Hi Juan, Dear Heidi, My name is Juan Fernandez-Tajes and I?m using HTqPCR for analyzing some Biomark 96*96 data. When I tried the following command I got an error: raw1 <- readCtData(files="Prueba_Blanco001.txt", format="BioMark", n.features=96, n.data=96) Error en $<-.data.frame(*tmp*, "Call", value = character(0)) : replacement has 0 rows, data has 9216 However when I import the data with the following command it works: temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96, column.info=list(flag=9, feature=5, type=6, Ct=7, position=1), skip=12, sep="\t") raw <- changeCtLayout(temp, sample.order=rep(1:96, each=96)) Any ideas? Not at the top of my head. However, you actually change two things here, namely both the specification of n.features/n.data and using 'format' vs 'column.info'. To help me figure out which of these two things causes the error, can you try running the remaining 2 combinations? I.e.: temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96, format="BioMark") raw1 <- readCtData(files="Prueba_Blanco001.txt", n.features=96, n.data=96, column.info=list(flag=9, feature=5, type=6, Ct=7, position=1), skip=12, sep="\t") Thanks \Heidi Many thanks in advance Juan --------------------------------------------------------------- Juan Fernandez Tajes, ph. D Grupo XENOMAR Departamento de Biolog?a Celular y Molecular Facultad de Ciencias-Universidade da Coru?a Tlf. +34 981 167000 ext 2030 e-mail: jfernandezt@udc.es<mailto:jfernandezt@udc.es> ---------------------------------------------------------------- ------------------------------ Message: 18 Date: Fri, 14 Sep 2012 09:50:35 +1000 (AUS Eastern Standard Time) From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> To: Javier Sim?n-S?nchez <simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>> Cc: Bioconductor mailing list <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] Install devel version of edgeR Message-ID: <pine.wnt.4.64.1209140949370.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140949370.6004@pc765.wehi.edu.au="">> Content-Type: text/plain; charset="x-unknown"; Format="flowed" Alternatively you can download from: http://bioconductor.org/packages/2.11/bioc/html/edgeR.html Gordon On Fri, 14 Sep 2012, Gordon K Smyth wrote: Dear Javier, This is a question for the Bioc managers, but the following page might answer your question: http://www.bioconductor.org/developers/useDevel/ Best wishes Gordon On Thu, 13 Sep 2012, Javier Sim?n-S?nchez wrote: Hello Gordon, Thanks a lot for your reply. How can I upgrade to the latest version of edge R? I have checked on the list and it looks like i need version 2.99.3. However, the one you provide at http://www.bioconductor.org/packages/2.10/bioc/html/edgeR.html is edgeR_2.6.12. How can I get the needed version? Thanks a lot for your help. On Thu, Sep 13, 2012 at 2:43 AM, Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote: Dear Javier, This error has been discused a number of times on this list. The solution is to upgrade edgeR to the current devel version. Also please see the Bioconductor posting guide: http://www.bioconductor.org/**help/mailing- list/posting-**guide/<http: www.bioconductor.org="" help="" mailing-list="" posting-guide=""/> Best wishes Gordon Date: Tue, 11 Sep 2012 13:04:55 +0200 From: Javier Sim?n-S?nchez <simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] edgeR GLM error Hello, My name is Javier Sim?n Seanchez and I'm a post-doc at the VUmc in Amsterdam. The reason of this e-mail is that im running edgeR in an expression dataset and getting the following error when calculating the GLM common dispersion: *Error in beta[k, ] <- betaj[decr, ] : NAs are not allowed in subscripted assignments * Im running cases versus controls and I want to modulate for different tissues. How can I overcome this error? Thanks a lot in advance -- Javier Simon-Sanchez ______________________________**______________________________**______ ____ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. ______________________________**______________________________**______ ____ -- Javier Simon-Sanchez ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}} ------------------------------ Message: 19 Date: Fri, 14 Sep 2012 11:08:11 +0200 From: Paolo Kunderfranco <paolo.kunderfranco@gmail.com<mailto:paolo.kunderfranco@gmail.com>> To: Rory Stark <rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] DiffBind - overlap between different peak callers Message-ID: <cagxwfc9ztp+3qqyw7ds4+qcyufnbmij+7hzlcnrzohzamhzirg@mail.gmai l.com<mailto:cagxwfc9ztp+3qqyw7ds4+qcyufnbmij+7hzlcnrzohzamhzirg@mail.="" gmail.com="">> Content-Type: text/plain; charset=ISO-8859-1 Dear Rory, I finally analyze my data with DiffBind and it seems working well, I was wondering whether is possible or it will be possible in the next release to export in a bed file (wig file) consensus peaks overlapping between two conditions, in my example consensus peaks called by two different peak callers. Cheers, Paolo ------------------------------ Message: 20 Date: Fri, 14 Sep 2012 11:23:26 +0200 From: Paolo Kunderfranco <paolo.kunderfranco@gmail.com<mailto:paolo.kunderfranco@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: [BioC] DiffBind - sample sheet for multiple replicates and peak Message-ID: <cagxwfc- bywww9mjmr7nadcofnlj="ZyM3sssbc2mYNyDhE5+haQ@mail.gmail.com&lt;mailto" :cagxwfc-bywww9mjmr7nadcofnlj="ZyM3sssbc2mYNyDhE5+haQ@mail.gmail.com">> Content-Type: text/plain; charset=ISO-8859-1 Dear Antonio I think that this problem was resolved in previous messagges, have a look to: https://stat.ethz.ch/pipermail/bioconductor/2012-August/047351.html? For the complete code you could browse August mailing list. SampleID should be unique for each sample, and moreover also bam file file should be unique, you should make a copy of all your bamReads and bamControl In my example, I compared 2 different peak callers in 3 cell lines: SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks mES_H3K27me3_m,ES,H3K27,mES_H3K27me3,1,reads/H3K27me3/ES_H3K27me3_m.be d,reads/H3K27me3/ES_input_m.bed,peaks/H3K27me3_ES_M.bed CMp_H3K27me3_m,CMN,H3K27,CMp_H3K27me3,1,reads/H3K27me3/CMN_H3K27me3_m. bed,reads/H3K27me3/CMN_input_m.bed,peaks/H3K27me3_CMN_M.bed CMa_H3K27me3_m,CMA,H3K27,CMa_H3K27me3,1,reads/H3K27me3/CMA_H3K27me3_m. bed,reads/H3K27me3/CMA_input_m.bed,peaks/H3K27me3_CMA_M.bed mES_H3K27me3_s,ES,H3K27,mES_H3K27me3,2,reads/H3K27me3/ES_H3K27me3_s.be d,reads/H3K27me3/ES_input_s.bed,peaks/H3K27me3_ES_S.bed CMp_H3K27me3_s,CMN,H3K27,CMp_H3K27me3,2,reads/H3K27me3/CMN_H3K27me3_s. bed,reads/H3K27me3/CMN_input_s.bed,peaks/H3K27me3_CMN_S.bed CMa_H3K27me3_s,CMA,H3K27,CMa_H3K27me3,2,reads/H3K27me3/CMA_H3K27me3_s. bed,reads/H3K27me3/CMA_input_s.bed,peaks/H3K27me3_CMA_S.bed Like this it should work, Cheers, Paolo ------------------------------ Message: 21 Date: Fri, 14 Sep 2012 11:58:09 +0200 From: Ant?nio Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> To: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: [BioC] DiffBind - error in dba.count Message-ID: <capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq@mail.gmai l.com<mailto:capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq@mail.="" gmail.com="">> Content-Type: text/plain Hi again, I am trying DiffBind and loaded my data that looks like this: H3K4m3 4 Samples, 13203 sites in matrix (13792 total): ID Tissue Factor Condition Peak.caller Replicate Intervals 1 wt1 Hela H3K4me3 control1 raw 1 14111 2 wt2 Hela H3K4me3 control2 raw 2 13771 3 treat1 Hela H3K4me3 condition1 raw 1 14865 4 treat2 Hela H3K4me3 condition2 raw 2 13393 But I ran into problems trying to calculate the affinity scores with dba.count: H3K4m3 = dba.count(H3K4m3) Error in cond$counts : \$ operator is invalid for atomic vectors In addition: Warning message: In mclapply(arglist, fn, ..., mc.preschedule = FALSE) : 6 function calls resulted in an error The peaks are in bed files (chr, start, end, score) and the reads are in SAM format. Can anyone help me with this? Cheers. Ant?nio sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DiffBind_1.0.9 Biobase_2.14.0 loaded via a namespace (and not attached): [1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7 edgeR_2.4.6 [5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0 limma_3.10.3 [9] zlibbioc_1.0.1 On 13 September 2012 18:06, Ant?nio Miguel de Jesus Domingues < amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: Hi all, I am trying to use DiffBind to compare peaks called in control vs condition. I have 2 replicates for each and I've also called peaks using 2 different peak callers (to wi, MACS and QuEST). I've also prepared a sample data sheet that looks like this: SampleID Tissue Factor Condition Replicate Peak.caller bamReads bamControl Peaks control Hela TF wt 1 MACS path path path control Hela TF wt 1 QuEST path path path control2 Hela TF wt 2 MACS path path path control 2 Hela TF wt 2 QuEST path path path (and the same for the conditions) My plan was to load all the data and then using diffbind selecte a set of common peaks for the peak callers before proceeding with the analysis. However, when I load the data (data = dba(sampleSheet="samplesheet.csv")) the peaks for each caller are not recognized as a different variable. How I can do that and is this silly? I could also derive a set of common peaks independently but it would be neat to do it all with the same package and that seems to be possible but I could not find how to do it in the documentation. Thanks, Ant?nio -- -- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology -- -- Ant?nio Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]] ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 115, Issue 14 ********************************************* NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. [[alternative HTML version deleted]]