Bioconductor Digest, Vol 121, Issue 13

Entering edit mode

Matthew Thornton ▴ 350

@matthew-thornton-5564

Last seen 6 weeks ago

USA, Los Angeles, USC

"bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: Send Bioconductor mailing list submissions to bioconductor at r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at r-project.org You can reach the person managing the list at bioconductor-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. edgeR: paired samples DGE and GLM (Alessandra [guest]) 2. Re: methyAnalysis - changes to GenoSet (Kasper Daniel Hansen) 3. QuasiSeq vs DSS (Richard Friedman) 4. Re: methyAnalysis - changes to GenoSet (Tim Triche, Jr.) 5. displaying the image with EBImage (Nima [guest]) 6. Help on Illumina HumanHT12 v4 chromosome filtering (Kemal Akat) 7. Re: QuasiSeq vs DSS (Ryan C. Thompson) 8. Re: QuasiSeq vs DSS (Richard Friedman) 9. Re: QuasiSeq vs DSS (Ryan C. Thompson) 10. Re: edgeR: paired samples DGE and GLM (Ryan C. Thompson) 11. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest (Zhuzhu Zhang) 12. Subsampling of BAM/SAM alignments (Julian Gehring) 13. DESeq2 (Wolfgang Huber) 14. Re: displaying the image with EBImage (Wolfgang Huber) 15. Re: displaying the image with EBImage (Nima Ahmadian) 16. Re: displaying the image with EBImage (Andrzej Ole?) 17. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest (Gordon K Smyth) 18. negative correlation from limma's duplicateCorrelation (Gordon K Smyth) 19. anova like test in edgeR (Gordon K Smyth) 20. Re: displaying the image with EBImage (Nima Ahmadian) 21. Re: Negative parameter error when using goseq (Nadia Davidson) 22. Re: anova like test in edgeR (Gordon K Smyth) 23. Re: Subsampling of BAM/SAM alignments (Martin Morgan) 24. ArrayExpress Question (Amy Vandiver) 25. Error in getGEO command with GSEMatrix=TRUE (Marwaha, Shruti (marwahsi)) 26. biomaRt access for NCBIM37 build of mouse genome (chris_utah) 27. Re: anova like test in edgeR (Ryan C. Thompson) 28. Re: DESeq2 (Ryan C. Thompson) 29. Re: biomaRt access for NCBIM37 build of mouse genome (Hans-Rudolf Hotz) 30. Re: DESeq2 (Michael Love) 31. Re: Error in getGEO command with GSEMatrix=TRUE (Sean Davis) ---------------------------------------------------------------------- Message: 1 Date: Tue, 12 Mar 2013 04:20:08 -0700 (PDT) From: "Alessandra [guest]" <guest@bioconductor.org> To: bioconductor at r-project.org, alessandra.breschi at crg.es Subject: [BioC] edgeR: paired samples DGE and GLM Message-ID: <20130312112008.0529C1434F2 at mamba.fhcrc.org> Hi, I am trying to use edgeR to compute differential gene expression. I have a quite simple experimental design: labExpId Patient sex Treatment sample.4 1 F A sample.3 2 M A sample.5 2 M B sample.7 3 F A sample.0 4 M A sample.8 3 F B sample.1 4 M B sample.2 1 F B I am comparing the effect of treatment across all patients, and it is fine when I apply exactTest. Then I wanted to include also the Patient in my model and I try using GLM, but I found several problems. So, I tried to apply GLM only including the Treatment just to troubleshoot. I follow the instructions on page 22 from the user's guide revised in December 2012. My counts are stored in a matrix called m. >colnames(m) [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8" [7] "sample.0" "sample.1" >design=model.matrix(~0+Treatment) > design TreatmentA TreatmentB sample.4 1 0 sample.3 1 0 sample.5 0 1 sample.7 1 0 sample.0 1 0 sample.8 0 1 sample.1 0 1 sample.2 0 1 >M = DGEList(counts=na.omit(m)) >cpm.m <- cpm(M) >M <- M[rowSums(cpm.m >1) >=1,] >M <- calcNormFactors(M) >M <- estimateGLMCommonDisp(M, design, verbose=T) > names(M) [1] "counts" "samples" "abundance" [4] "logCPM" "common.dispersion" First thing I notice, I don't see pseudo.counts and other attributes I saw when calling estimateCommonDisp(). > M <- estimateGLMTrendedDisp(M, design) > M <- estimateGLMTagwiseDisp(M, design) > fit <- glmFit(M, design) > lrt <- glmLRT(fit, contrast=c(-1,1)) > res = topTags(lrt, n=nrow(lrt))$table > head(res) logFC logCPM LR PValue FDR ENSG00000244115 15.512665 2.266789 35.30878 2.813602e-09 3.482958e-05 ENSG00000256329 -17.760869 4.514903 32.89653 9.719681e-09 6.015996e-05 ENSG00000223638 11.777617 -1.467638 18.46673 1.728967e-05 7.134295e-02 ENSG00000134321 -5.328058 5.959204 16.76318 4.234703e-05 1.310535e-01 ENSG00000196861 7.776972 -1.722111 15.83143 6.924259e-05 1.494412e-01 ENSG00000229292 11.368590 -1.876601 15.74621 7.243290e-05 1.494412e-01 > m[rownames(head(res)),] sample.2 sample.3 sample.4 sample.5 sample.7 sample.8 ENSG00000244115 0 0 0 0 0 6412 ENSG00000256329 0 0 0 32070 0 0 ENSG00000223638 0 0 2 0 0 0 ENSG00000134321 1178 590 890 83130 754 1116 ENSG00000196861 0 3 363 0 0 53 ENSG00000229292 0 0 0 0 0 0 sample.0 sample.1 ENSG00000244115 0 4415 ENSG00000256329 0 0 ENSG00000223638 434 550 ENSG00000134321 852 1092 ENSG00000196861 500 57 ENSG00000229292 344 406 I realized that it may be a problem of the ordering of the design matrix rows > colnames(m) [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" "sample.8" [7] "sample.0" "sample.1" > rownames(design) [1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0" "sample.8" [7] "sample.1" "sample.2" so I redo it forcing the row names of my design matrix to be the same order as the column names in the matrix count. Now my design matrix looks like: > design TreatmentA TreatmentB sample.2 0 1 sample.3 1 0 sample.4 1 0 sample.5 0 1 sample.7 1 0 sample.8 0 1 sample.0 1 0 sample.1 0 1 I execute exactly the same commands as before. Again I cannot see the pseudocounts. > names(M) [1] "counts" "samples" "abundance" [4] "logCPM" "common.dispersion" > head(res) logFC logCPM LR PValue FDR ENSG00000074855 -32.17749 0.2759965 2121.104 0 0 ENSG00000076351 -32.17749 -0.4306713 1549.345 0 0 ENSG00000105552 -32.17749 -0.4864164 1502.658 0 0 ENSG00000106991 -32.17749 0.4622641 2144.947 0 0 ENSG00000123009 -32.17749 -0.2422835 1719.625 0 0 ENSG00000133216 -32.17749 0.2206127 2127.575 0 0 I get very different results still from the exactTest with no GLM. I attribute this to the missing pseudocounts in the DGEList object? > m[rownames(head(res)),] sample.2 sample.3 sample.4 sample.5 sample.7 sample.8 ENSG00000074855 0 948 902 0 676 0 ENSG00000076351 0 494 522 0 538 0 ENSG00000105552 0 490 454 0 486 0 ENSG00000106991 0 1094 798 0 698 0 ENSG00000123009 0 556 520 0 566 0 ENSG00000133216 0 808 762 0 730 0 sample.0 sample.1 ENSG00000074855 1166 0 ENSG00000076351 698 0 ENSG00000105552 764 0 ENSG00000106991 1733 0 ENSG00000123009 980 0 ENSG00000133216 1304 0 I see that these genes have all zero counts for TreatmentB and this explains the low logFC, but I don't get this when I don't use GLM at all (exactTest). So in summary, my questions are: 1) Does the ordering of rows in the design matrix have to be the same as the column order in the count matrix, or the sample name is taken into account so the order should not matter? 2) Do you have any idea why I don't get pseudocounts after estimating the dispersion, and is it really this that causes such low logFC? I am looking forward to hearing from you. Many thanks in advance, Ale -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] plyr_1.7.1 optparse_1.0.1 ggplot2_0.9.3.1 reshape2_1.2.1 [5] edgeR_3.0.8 limma_3.14.4 loaded via a namespace (and not attached): [1] MASS_7.3-17 RColorBrewer_1.0-5 colorspace_1.1-1 dichromat_1.2-4 [5] digest_0.5.2 getopt_1.19.1 grid_2.15.0 gtable_0.1.2 [9] labeling_0.1 munsell_0.3 proto_0.3-9.2 scales_0.2.3 [13] stringr_0.6 tools_2.15.0 -- Sent via the guest posting facility at bioconductor.org. ------------------------------ Message: 2 Date: Tue, 12 Mar 2013 09:57:52 -0400 From: Kasper Daniel Hansen <kasperdanielhansen@gmail.com> To: Lavinia Gordon <lavinia.gordon at="" mcri.edu.au=""> Cc: Zilbauer Matthias <mz304 at="" medschl.cam.ac.uk="">, "dupan at gmal.com" <dupan at="" gmal.com="">, "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] methyAnalysis - changes to GenoSet Message-ID: <cac2h7usdeop-zsi_oxp3asneqkg5msoeuys62hh96w9qhqupfa at="" mail.gmail.com=""> Content-Type: text/plain; charset=ISO-8859-1 Lavina, I don't really have time to really look into all of this, but note the function mapToGenome which associates a MethylSet with genomic coordinates. So you can do gmSet = mapToGenome(mSet) and now you have granges(gmSet) # locations getMeth(gmSet) # methylation channel etc. Note that the transformation is non-invertible because a few CpGs have no associated genomic locations and they are dropped (well, you can use drop=FALSE). Kasper On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon <lavinia.gordon at="" mcri.edu.au=""> wrote: > Argh!, thanks Martin, sorry about that. > > So in that case I think I've worked out a (rather ugly) way to force minfi data into a MethyGenoSet object for use within the package methyAnalysis: > > Library(minfi) > # mset is minfi object, e.g. used in dmpFinder code > myexprs <- getM(mSet) > mymeth <- getMeth(mSet) > myunmeth <- getUnmeth(mSet) > > # using Tim Triches helpful code, thanks! > library(IlluminaHumanMethylation450kprobe) > data(IlluminaHumanMethylation450kprobe) > library(GenomicRanges) > chs = levels(IlluminaHumanMethylation450kprobe$chr) > names(chs) = paste('chr',chs,sep='') > CpGs.450k = with(IlluminaHumanMethylation450kprobe, > GRanges(paste('chr',chr,sep=''), > IRanges(start=site, width=2, names=Probe_ID), > strand=strand)) > > CpGs.450k.df <- as.data.frame(CpGs.450k) > > mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start, end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)), space=CpGs.450k.df$seqnames) > > mymethy.obj <- new("MethyGenoSet", locData=mylocData, assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"), unmethylated=new("matrix"), detection=new("matrix"))) > > # make sure the CpG ids in mylocData agree with what is in CpGs.450k.df > > exprs(mymethy.obj) <- myexprs > methylated(mymethy.obj) <- mymeth > unmethylated(mymethy.obj) <- myunmeth > # had to add this afterwards > > methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250) > > I haven't run through all of the MethyAnalysis functions so the MethyGenoSet object may need more information added to it, in other slots, for it to work properly. > > > Lavinia Gordon > Senior Research Officer > Quantitative Sciences Core, Bioinformatics > > Murdoch Childrens Research Institute > The Royal Children's Hospital > Flemington Road Parkville Victoria 3052 Australia > T 03 8341 6221 > www.mcri.edu.au > > > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Sent: Saturday, 9 March 2013 1:49 AM > To: Lavinia Gordon > Cc: bioconductor at r-project.org; dupan at gmal.com > Subject: Re: [BioC] methyAnalysis - changes to GenoSet > > On 03/07/2013 08:23 PM, Lavinia Gordon wrote: >> Dear Bioc users, >> I have just tried the example code supplied with the package methyAnalysis and am getting an error: >> >>> ################################################### >>> ### code chunk number 3: Slide-window smoothing of the DNA >>> methylation data ################################################### >>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize = >>> 250) >> Smoothing Chromosome 21 ... >> >> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for >> function ?[?, target signature ?MethyGenoSet#character#character#missing?. >> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message: >> The ranges method on a GenoSet is depricated. Please use space(locData(x)) or seqnames(locData(x)) as appropriate for RangedData or GRanges. >>> methyGenoset.sm >> Error: object 'methyGenoset.sm' not found > > typo -- methyGenoSet.sm > > Martin > >>> sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 >> [2] LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 >> [4] LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 >> [6] LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C >> [8] LC_NAME=C >> [9] LC_ADDRESS=C >> [10] LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 >> [12] LC_IDENTIFICATION=C >> >> attached base packages: >> [1] grid stats graphics grDevices >> [5] utils datasets methods base >> >> other attached packages: >> [1] methyAnalysis_1.0.0 org.Hs.eg.db_2.8.0 >> [3] RSQLite_0.11.2 DBI_0.2-5 >> [5] AnnotationDbi_1.20.5 Biobase_2.18.0 >> [7] IRanges_1.16.6 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] affy_1.36.1 affyio_1.26.0 >> [3] annotate_1.36.0 BiocInstaller_1.8.3 >> [5] biomaRt_2.14.0 Biostrings_2.26.3 >> [7] biovizBase_1.6.2 bitops_1.0-5 >> [9] BSgenome_1.26.1 cluster_1.14.3 >> [11] colorspace_1.2-1 dichromat_2.0-0 >> [13] genefilter_1.40.0 GenomicFeatures_1.10.2 >> [15] GenomicRanges_1.10.7 genoset_1.10.1 >> [17] Gviz_1.2.1 Hmisc_3.10-1 >> [19] KernSmooth_2.23-9 labeling_0.1 >> [21] lattice_0.20-13 lumi_2.10.0 >> [23] MASS_7.3-23 Matrix_1.0-11 >> [25] methylumi_2.4.0 mgcv_1.7-22 >> [27] munsell_0.4 nleqslv_2.0 >> [29] nlme_3.1-108 parallel_2.15.2 >> [31] plyr_1.8 preprocessCore_1.20.0 >> [33] RColorBrewer_1.0-5 RCurl_1.95-4.1 >> [35] Rsamtools_1.10.2 rtracklayer_1.18.2 >> [37] scales_0.2.3 splines_2.15.2 >> [39] stats4_2.15.2 stringr_0.6.2 >> [41] survival_2.37-4 tools_2.15.2 >> [43] XML_3.95-0.2 xtable_1.7-1 >> [45] zlibbioc_1.4.0 >>> >> >> With regards, >> >> Lavinia Gordon >> Senior Research Officer >> Quantitative Sciences Core, Bioinformatics >> >> Murdoch Childrens Research Institute >> The Royal Children's Hospital >> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 >> www.mcri.edu.au<http: www.mcri.edu.au=""/> >> >> >> ______________________________________________________________________ >> This email has been scanned by the Symantec Email Security.cloud service. >> For more information please visit http://www.symanteccloud.com >> ______________________________________________________________________ >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > If you have any question, please contact MCRI IT Helpdesk for further assistance. > ______________________________________________________________________ > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 3 Date: Tue, 12 Mar 2013 10:05:02 -0400 From: Richard Friedman <friedman@cancercenter.columbia.edu> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: [BioC] QuasiSeq vs DSS Message-ID: <e41ffaa1-4082-41d3-88b1-034f5586683e at="" cancercenter.columbia.edu=""> Content-Type: text/plain; charset=us-ascii Dear List. The papers on DSS (included in Bioconductor): Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013 Apr;14(2):232-43. and QuasiSeq (included in CRAN): Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012 both give evidence of superior performance to edgeR (if I understand them correctly). Have the two methods been compared? Can the 2 methods been combined (with DSS estimating the dispersion used in the quasi-negative bionomial disribution used in QuasiSeq)? I would appreciate any insight with respect to what is the overall best method for differential expression in RNASeq available at present. Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Initiative in Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ Fritz Lang. Didn't he do "Star Trek". -Rose Friedman, age 16 ------------------------------ Message: 4 Date: Tue, 12 Mar 2013 08:32:19 -0700 From: "Tim Triche, Jr." <tim.triche@gmail.com> To: Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> Cc: Zilbauer Matthias <mz304 at="" medschl.cam.ac.uk="">, Lavinia Gordon <lavinia.gordon at="" mcri.edu.au="">, "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, "dupan at gmal.com" <dupan at="" gmal.com=""> Subject: Re: [BioC] methyAnalysis - changes to GenoSet Message-ID: <fa98bc57-3830-4b9c-8c77-df5b0cc9a177 at="" gmail.com=""> Content-Type: text/plain; charset=utf-8 Right, my original intent was simply to coerce the SummarizedExperiment-based containers from methylumi and minfi into a genoset for Pan's package. That reminds me, I need to update the documentation for 450kProbe to note that it is essentially obsolete, since both methylumi and minfi can emit SummarizedExperiments. The only mop-up now is for me to send in a few minfi patches for processing. The innards of minfi are tidier than... Others. --t On Mar 12, 2013, at 6:57 AM, Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> wrote: > Lavina, > > I don't really have time to really look into all of this, but note the > function mapToGenome which associates a MethylSet with genomic > coordinates. So you can do > > gmSet = mapToGenome(mSet) > > and now you have > > granges(gmSet) # locations > getMeth(gmSet) # methylation channel > > etc. Note that the transformation is non-invertible because a few > CpGs have no associated genomic locations and they are dropped (well, > you can use drop=FALSE). > > Kasper > > On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon > <lavinia.gordon at="" mcri.edu.au=""> wrote: >> Argh!, thanks Martin, sorry about that. >> >> So in that case I think I've worked out a (rather ugly) way to force minfi data into a MethyGenoSet object for use within the package methyAnalysis: >> >> Library(minfi) >> # mset is minfi object, e.g. used in dmpFinder code >> myexprs <- getM(mSet) >> mymeth <- getMeth(mSet) >> myunmeth <- getUnmeth(mSet) >> >> # using Tim Triches helpful code, thanks! >> library(IlluminaHumanMethylation450kprobe) >> data(IlluminaHumanMethylation450kprobe) >> library(GenomicRanges) >> chs = levels(IlluminaHumanMethylation450kprobe$chr) >> names(chs) = paste('chr',chs,sep='') >> CpGs.450k = with(IlluminaHumanMethylation450kprobe, >> GRanges(paste('chr',chr,sep=''), >> IRanges(start=site, width=2, names=Probe_ID), >> strand=strand)) >> >> CpGs.450k.df <- as.data.frame(CpGs.450k) >> >> mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start, end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)), space=CpGs.450k.df$seqnames) >> >> mymethy.obj <- new("MethyGenoSet", locData=mylocData, assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"), unmethylated=new("matrix"), detection=new("matrix"))) >> >> # make sure the CpG ids in mylocData agree with what is in CpGs.450k.df >> >> exprs(mymethy.obj) <- myexprs >> methylated(mymethy.obj) <- mymeth >> unmethylated(mymethy.obj) <- myunmeth >> # had to add this afterwards >> >> methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250) >> >> I haven't run through all of the MethyAnalysis functions so the MethyGenoSet object may need more information added to it, in other slots, for it to work properly. >> >> >> Lavinia Gordon >> Senior Research Officer >> Quantitative Sciences Core, Bioinformatics >> >> Murdoch Childrens Research Institute >> The Royal Children's Hospital >> Flemington Road Parkville Victoria 3052 Australia >> T 03 8341 6221 >> www.mcri.edu.au >> >> >> -----Original Message----- >> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] >> Sent: Saturday, 9 March 2013 1:49 AM >> To: Lavinia Gordon >> Cc: bioconductor at r-project.org; dupan at gmal.com >> Subject: Re: [BioC] methyAnalysis - changes to GenoSet >> >> On 03/07/2013 08:23 PM, Lavinia Gordon wrote: >>> Dear Bioc users, >>> I have just tried the example code supplied with the package methyAnalysis and am getting an error: >>> >>>> ################################################### >>>> ### code chunk number 3: Slide-window smoothing of the DNA >>>> methylation data ################################################### >>>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize = >>>> 250) >>> Smoothing Chromosome 21 ... >>> >>> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for >>> function ?[?, target signature ?MethyGenoSet#character#character#missing?. >>> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message: >>> The ranges method on a GenoSet is depricated. Please use space(locData(x)) or seqnames(locData(x)) as appropriate for RangedData or GRanges. >>>> methyGenoset.sm >>> Error: object 'methyGenoset.sm' not found >> >> typo -- methyGenoSet.sm >> >> Martin >> >>>> sessionInfo() >>> R version 2.15.2 (2012-10-26) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 >>> [2] LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 >>> [4] LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 >>> [6] LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C >>> [8] LC_NAME=C >>> [9] LC_ADDRESS=C >>> [10] LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 >>> [12] LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] grid stats graphics grDevices >>> [5] utils datasets methods base >>> >>> other attached packages: >>> [1] methyAnalysis_1.0.0 org.Hs.eg.db_2.8.0 >>> [3] RSQLite_0.11.2 DBI_0.2-5 >>> [5] AnnotationDbi_1.20.5 Biobase_2.18.0 >>> [7] IRanges_1.16.6 BiocGenerics_0.4.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affy_1.36.1 affyio_1.26.0 >>> [3] annotate_1.36.0 BiocInstaller_1.8.3 >>> [5] biomaRt_2.14.0 Biostrings_2.26.3 >>> [7] biovizBase_1.6.2 bitops_1.0-5 >>> [9] BSgenome_1.26.1 cluster_1.14.3 >>> [11] colorspace_1.2-1 dichromat_2.0-0 >>> [13] genefilter_1.40.0 GenomicFeatures_1.10.2 >>> [15] GenomicRanges_1.10.7 genoset_1.10.1 >>> [17] Gviz_1.2.1 Hmisc_3.10-1 >>> [19] KernSmooth_2.23-9 labeling_0.1 >>> [21] lattice_0.20-13 lumi_2.10.0 >>> [23] MASS_7.3-23 Matrix_1.0-11 >>> [25] methylumi_2.4.0 mgcv_1.7-22 >>> [27] munsell_0.4 nleqslv_2.0 >>> [29] nlme_3.1-108 parallel_2.15.2 >>> [31] plyr_1.8 preprocessCore_1.20.0 >>> [33] RColorBrewer_1.0-5 RCurl_1.95-4.1 >>> [35] Rsamtools_1.10.2 rtracklayer_1.18.2 >>> [37] scales_0.2.3 splines_2.15.2 >>> [39] stats4_2.15.2 stringr_0.6.2 >>> [41] survival_2.37-4 tools_2.15.2 >>> [43] XML_3.95-0.2 xtable_1.7-1 >>> [45] zlibbioc_1.4.0 >>> >>> With regards, >>> >>> Lavinia Gordon >>> Senior Research Officer >>> Quantitative Sciences Core, Bioinformatics >>> >>> Murdoch Childrens Research Institute >>> The Royal Children's Hospital >>> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221 >>> www.mcri.edu.au<http: www.mcri.edu.au=""/> >>> >>> >>> ______________________________________________________________________ >>> This email has been scanned by the Symantec Email Security.cloud service. >>> For more information please visit http://www.symanteccloud.com >>> ______________________________________________________________________ >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> >> ______________________________________________________________________ >> This email has been scanned by the Symantec Email Security.cloud service. >> For more information please visit http://www.symanteccloud.com >> >> If you have any question, please contact MCRI IT Helpdesk for further assistance. >> ______________________________________________________________________ >> >> ______________________________________________________________________ >> This email has been scanned by the Symantec Email Security.cloud service. >> For more information please visit http://www.symanteccloud.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 5 Date: Tue, 12 Mar 2013 09:30:32 -0700 (PDT) From: "Nima [guest]" <guest@bioconductor.org> To: bioconductor at r-project.org, ahmadian.n at gmail.com Subject: [BioC] displaying the image with EBImage Message-ID: <20130312163032.69E5714352C at mamba.fhcrc.org> Dear All I have problem with displaying my images in R, all functions are working except display my OS is windows 7 but Mozilla Firefox gives this message: Oops the page you are looking for could not be found would you please help me in this regard?? Regards Nima -- output of sessionInfo(): R version 2.15.3 (2013-03-01) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] EBImage_4.0.0 loaded via a namespace (and not attached): [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4 -- Sent via the guest posting facility at bioconductor.org. ------------------------------ Message: 6 Date: Tue, 12 Mar 2013 16:35:10 +0000 From: Kemal Akat <kakat@mail.rockefeller.edu> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: [BioC] Help on Illumina HumanHT12 v4 chromosome filtering Message-ID: <25FA2F2F-5525-4206-9515-C45CF82AE54F at rockefeller.edu> Content-Type: text/plain; charset="us-ascii" Hi all, I am running into problems when I want to remove probes targeting chromosome Y. I can reproduce this behavior with the data from the beadarrayExampleData package: library(beadarray) library(illuminaHumanv4.db) library(beadarrayExampleData) data(exampleSummaryData) ## filter for probe quality ids = as.character(featureNames(exampleSummaryData)) qual = unlist(mget(ids, illuminaHumanv4PROBEQUALITY, ifnotfound = NA)) rem = qual == "No match" | qual == "Bad" | is.na(qual) exampleSummaryData_filt = exampleSummaryData[!rem] ## get chromosome location for remaining probes ids = as.character(featureNames(exampleSummaryData_filt)) chr = unlist(mget(ids, illuminaHumanv4CHR, ifnotfound = NA)) ## filter out probes targeting the Y chromosome rem_chr= chr == "Y" exampleSummaryData_filt = exampleSummaryData_filt[!rem_chr] Error in obj[i, , ..., drop = drop] : (subscript) logical subscript too long Calls: [ ... [ -> callNextMethod -> .nextMethod -> lapply -> FUN The eSet (Illumina) object starts with R> dim(exampleSummaryData) Features Samples Channels 49576 12 2 probes. After quality filtering R> dim(exampleSummaryData_filt) ## after filtering for probe quality Features Samples Channels 30084 12 2 R> Now, for the second filtering (ids = as.character(featureNames(exampleSummaryData_filt) etc.): R> length(ids) [1] 30084 R> length(chr) [1] 30109 Where are the extra probes coming from? I tried to match only the ones in exampleSummaryData_filt), idx = match(ids, names(chr)) chr = chr[idx] but this led to other problems. Maybe I am missing something obvious? Thank you for any hints or help! Kemal R> sessionInfo() R Under development (unstable) (2013-02-06 r61857) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] beadarrayExampleData_1.0.5 illuminaHumanv4.db_1.16.0 org.Hs.eg.db_2.9.0 RSQLite_0.11.2 [5] DBI_0.2-5 AnnotationDbi_1.21.13 beadarray_2.9.2 ggplot2_0.9.3 [9] Biobase_2.19.3 BiocGenerics_0.5.6 setwidth_1.0-1 loaded via a namespace (and not attached): [1] AnnotationForge_1.1.10 BeadDataPackR_1.11.0 colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 [6] grid_3.0.0 gtable_0.1.2 IRanges_1.17.36 labeling_0.1 limma_3.15.15 [11] MASS_7.3-23 munsell_0.4 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 [16] reshape2_1.2.2 scales_0.2.3 stats4_3.0.0 stringr_0.6.2 tools_3.0.0 R> ------------------------------ Message: 7 Date: Tue, 12 Mar 2013 09:59:41 -0700 From: "Ryan C. Thompson" <rct@thompsonclan.org> To: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] QuasiSeq vs DSS Message-ID: <513F5EFD.3060606 at thompsonclan.org> Content-Type: text/plain Dear Rich, From what I can tell, it should be possible. The development version of DESeq2 implements the DSS "squeezing" method combined with edgeR's Cox-Reid dispersion estimation. You could use DESeq2 to estimate dispersions, and then copy those dispersion values into an edgeR DGEList object. Then you can use edgeR::glmQLFTest, which implements (approximately) the QuasiSeq method. I have not had time yet to investigate putting these packages together in this way, but it is something I plan to look at. I'm certain that the combination is technically possible, and I'm reasonably sure that the result would be statistically meaningful. -Ryan Thompson On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at="" cancercenter.columbia.edu="" <mailto:friedman="" at="" cancercenter.columbia.edu="">> wrote: Dear List. The papers on DSS (included in Bioconductor): Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2013 Apr;14(2):232-43. and QuasiSeq (included in CRAN): Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012 both give evidence of superior performance to edgeR (if I understand them correctly). Have the two methods been compared? Can the 2 methods been combined (with DSS estimating the dispersion used in the quasi-negative bionomial disribution used in QuasiSeq)? I would appreciate any insight with respect to what is the overall best method for differential expression in RNASeq available at present. Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Initiative in Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu <mailto:friedman at="" cancercenter.columbia.edu=""> http://cancercenter.columbia.edu/~friedman/ <http: cancercenter.columbia.edu="" %7efriedman=""/> Fritz Lang. Didn't he do "Star Trek". -Rose Friedman, age 16 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ------------------------------ Message: 8 Date: Tue, 12 Mar 2013 13:11:25 -0400 From: Richard Friedman <friedman@cancercenter.columbia.edu> To: "Ryan C. Thompson" <rct at="" thompsonclan.org=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] QuasiSeq vs DSS Message-ID: <961B56D4-1114-47E9-B6E0-935BA09E9EDF at cancercenter.columbia.edu> Content-Type: text/plain; charset=us-ascii Dear Ryan, Thank you for your response. 3 questions: 1. If I had just a simple pairwise comparison is it known DSS or QuasiSeq better? 2. I was unaware that an approximate implementation of QuasiSeq was available in edgeR. If so, is it known hor it compare to the ordinairy EdgeR on the one hand and the full QuasiSeq on the other. 3. And I guess that the third question is for Gordon - Is using DSS and QuasiSeq (or EdgeR) together desireable and if so, are there plans to incorporate DSS into QuasiSeq (EdgeR). My note was planning ahead. I will still be in the microarray world for a more few weeks before I return to learning RNASeq. I wanted to know what the best practice is. If you (or anybody out there) develops a script to meld the two methods, I am sure that it would be interesting to the list. Best wishes, Rich On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote: > Dear Rich, > From what I can tell, it should be possible. The development version of DESeq2 implements the DSS "squeezing" method combined with edgeR's Cox-Reid dispersion estimation. You could use DESeq2 to estimate dispersions, and then copy those dispersion values into an edgeR DGEList object. Then you can use edgeR::glmQLFTest, which implements (approximately) the QuasiSeq method. > I have not had time yet to investigate putting these packages together in this way, but it is something I plan to look at. I'm certain that the combination is technically possible, and I'm reasonably sure that the result would be statistically meaningful. > -Ryan Thompson > On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at="" cancercenter.columbia.edu=""> wrote: > Dear List. > > The papers on DSS (included in Bioconductor): > > Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves > differential expression detection in RNA-seq data. Biostatistics. 2013 > Apr;14(2):232-43. > > and QuasiSeq (included in CRAN): > > Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression > in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. > Stat Appl Genet Mol Biol. 2012 > > both give evidence of superior performance to edgeR (if I understand them correctly). > > Have the two methods been compared? > Can the 2 methods been combined (with DSS estimating the dispersion used in > the quasi-negative bionomial disribution used in QuasiSeq)? > > I would appreciate any insight with respect to what is the overall best > method for differential expression in RNASeq available at present. > > Thanks and best wishes, > Rich > > > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > Fritz Lang. Didn't he do "Star Trek". > -Rose Friedman, age 16 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 9 Date: Tue, 12 Mar 2013 12:15:25 -0700 From: "Ryan C. Thompson" <rct@thompsonclan.org> To: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] QuasiSeq vs DSS Message-ID: <513F7ECD.8090402 at thompsonclan.org> Content-Type: text/plain; charset=UTF-8; format=flowed Here is a quote from Gordon Smyth a few months ago in response to a question of mine, which I think neatly summarizes the relationship between QuasiSeq and edgeR::glmQLFTest: > glmQLFTest() and QuasiSeq were developed independently with the same > idea, hence we got together to write the paper. If you use common > dispersion to fit the linear model, then glmQLFTest() is like > NegBinQLShrink. If you use trended dispersion to fit the linear model > (recommended), then glmQLFTest() is like NegBinQLSpline. They are not > quite identical however. glmQLTest() leverages the functionality of > the edgeR and limma packages, whereas QuasiSeq has used its own > implementations of everything. The latter are described in the paper. (Gordon, I hope you don't mind me posting this publically. Hopefully it saves you the trouble of rewriting it.) -Ryan On Tue 12 Mar 2013 10:11:25 AM PDT, Richard Friedman wrote: > > Dear Ryan, > > Thank you for your response. > 3 questions: > 1. If I had just a simple pairwise comparison is it known DSS or > QuasiSeq better? > 2. I was unaware that an approximate implementation of QuasiSeq was > available in > edgeR. If so, is it known hor it compare to the ordinairy EdgeR on the > one hand and the > full QuasiSeq on the other. > 3. And I guess that the third question is for Gordon - Is using DSS > and QuasiSeq (or EdgeR) together > desireable and if so, are there plans to incorporate DSS into QuasiSeq > (EdgeR). > > My note was planning ahead. I will still be in the microarray world > for a more few weeks > before I return to learning RNASeq. I wanted to know what the best > practice is. > If you (or anybody out there) develops a script to meld the two > methods, I am sure that > it would be interesting to the list. > > Best wishes, > Rich > > > > > > On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote: > >> >> Dear Rich, >> From what I can tell, it should be possible. The development version >> of DESeq2 implements the DSS "squeezing" method combined with edgeR's >> Cox-Reid dispersion estimation. You could use DESeq2 to estimate >> dispersions, and then copy those dispersion values into an edgeR >> DGEList object. Then you can use edgeR::glmQLFTest, which implements >> (approximately) the QuasiSeq method. >> I have not had time yet to investigate putting these packages >> together in this way, but it is something I plan to look at. I'm >> certain that the combination is technically possible, and I'm >> reasonably sure that the result would be statistically meaningful. >> -Ryan Thompson >> On Mar 12, 2013 7:06 AM, "Richard Friedman" >> <friedman at="" cancercenter.columbia.edu=""> wrote: >> Dear List. >> >> The papers on DSS (included in Bioconductor): >> >> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves >> differential expression detection in RNA-seq data. Biostatistics. 2013 >> Apr;14(2):232-43. >> >> and QuasiSeq (included in CRAN): >> >> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential >> expression >> in RNA-sequence data using quasi-likelihood with shrunken dispersion >> estimates. >> Stat Appl Genet Mol Biol. 2012 >> >> both give evidence of superior performance to edgeR (if I understand >> them correctly). >> >> Have the two methods been compared? >> Can the 2 methods been combined (with DSS estimating the dispersion >> used in >> the quasi-negative bionomial disribution used in QuasiSeq)? >> >> I would appreciate any insight with respect to what is the overall best >> method for differential expression in RNASeq available at present. >> >> Thanks and best wishes, >> Rich >> >> >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ >> Columbia Initiative in Systems Biology >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> Fritz Lang. Didn't he do "Star Trek". >> -Rose Friedman, age 16 >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > ------------------------------ Message: 10 Date: Tue, 12 Mar 2013 12:26:31 -0700 From: "Ryan C. Thompson" <rct@thompsonclan.org> To: "Alessandra [guest]" <guest at="" bioconductor.org=""> Cc: bioconductor at r-project.org, alessandra.breschi at crg.es Subject: Re: [BioC] edgeR: paired samples DGE and GLM Message-ID: <513F8167.8090900 at thompsonclan.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi, See this previous discussion for a discussion of the pseudocounts: https://stat.ethz.ch/pipermail/bioconductor/2013-March/051182.html Only the non-GLM method makes use of pseudocounts, so it's not important that they are missing from the GLM analysis. I believe the design rows and count columns *do* need to match up in the same order. To do a paired test, you can just use model.matrix(~0+Treatment+Patient) and proceed as you already have. The GLM method does give non-identical results, and in general it seems that people are assuming that the GLM method is better overall, though as always, proof is scarce when sequencing is so expensive. Hope this answers your questions. -Ryan Thompson On 03/12/2013 04:20 AM, Alessandra [guest] wrote: > Hi, > > I am trying to use edgeR to compute differential gene expression. > > I have a quite simple experimental design: > labExpId Patient sex Treatment > sample.4 1 F A > sample.3 2 M A > sample.5 2 M B > sample.7 3 F A > sample.0 4 M A > sample.8 3 F B > sample.1 4 M B > sample.2 1 F B > > > I am comparing the effect of treatment across all patients, and it is fine when I apply exactTest. Then I wanted to include also the Patient in my model and I try using GLM, but I found several problems. So, I tried to apply GLM only including the Treatment just to troubleshoot. I follow the instructions on page 22 from the user's guide revised in December 2012. > > My counts are stored in a matrix called m. > >> colnames(m) > [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8" > [7] "sample.0" "sample.1" > >> design=model.matrix(~0+Treatment) >> design > TreatmentA TreatmentB > sample.4 1 0 > sample.3 1 0 > sample.5 0 1 > sample.7 1 0 > sample.0 1 0 > sample.8 0 1 > sample.1 0 1 > sample.2 0 1 > >> M = DGEList(counts=na.omit(m)) >> cpm.m <- cpm(M) >> M <- M[rowSums(cpm.m >1) >=1,] >> M <- calcNormFactors(M) >> M <- estimateGLMCommonDisp(M, design, verbose=T) >> names(M) > [1] "counts" "samples" "abundance" > [4] "logCPM" "common.dispersion" > > First thing I notice, I don't see pseudo.counts and other attributes I saw when calling estimateCommonDisp(). > >> M <- estimateGLMTrendedDisp(M, design) >> M <- estimateGLMTagwiseDisp(M, design) >> fit <- glmFit(M, design) >> lrt <- glmLRT(fit, contrast=c(-1,1)) >> res = topTags(lrt, n=nrow(lrt))$table >> head(res) > logFC logCPM LR PValue FDR > ENSG00000244115 15.512665 2.266789 35.30878 2.813602e-09 3.482958e-05 > ENSG00000256329 -17.760869 4.514903 32.89653 9.719681e-09 6.015996e-05 > ENSG00000223638 11.777617 -1.467638 18.46673 1.728967e-05 7.134295e-02 > ENSG00000134321 -5.328058 5.959204 16.76318 4.234703e-05 1.310535e-01 > ENSG00000196861 7.776972 -1.722111 15.83143 6.924259e-05 1.494412e-01 > ENSG00000229292 11.368590 -1.876601 15.74621 7.243290e-05 1.494412e-01 > >> m[rownames(head(res)),] > sample.2 sample.3 sample.4 sample.5 sample.7 sample.8 > ENSG00000244115 0 0 0 0 0 6412 > ENSG00000256329 0 0 0 32070 0 0 > ENSG00000223638 0 0 2 0 0 0 > ENSG00000134321 1178 590 890 83130 754 1116 > ENSG00000196861 0 3 363 0 0 53 > ENSG00000229292 0 0 0 0 0 0 > > sample.0 sample.1 > ENSG00000244115 0 4415 > ENSG00000256329 0 0 > ENSG00000223638 434 550 > ENSG00000134321 852 1092 > ENSG00000196861 500 57 > ENSG00000229292 344 406 > > I realized that it may be a problem of the ordering of the design matrix rows > >> colnames(m) > [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" "sample.8" > [7] "sample.0" "sample.1" >> rownames(design) > [1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0" "sample.8" > [7] "sample.1" "sample.2" > > so I redo it forcing the row names of my design matrix to be the same order as the column names in the matrix count. > Now my design matrix looks like: > >> design > TreatmentA TreatmentB > sample.2 0 1 > sample.3 1 0 > sample.4 1 0 > sample.5 0 1 > sample.7 1 0 > sample.8 0 1 > sample.0 1 0 > sample.1 0 1 > > I execute exactly the same commands as before. > Again I cannot see the pseudocounts. >> names(M) > [1] "counts" "samples" "abundance" > [4] "logCPM" "common.dispersion" > >> head(res) > logFC logCPM LR PValue FDR > ENSG00000074855 -32.17749 0.2759965 2121.104 0 0 > ENSG00000076351 -32.17749 -0.4306713 1549.345 0 0 > ENSG00000105552 -32.17749 -0.4864164 1502.658 0 0 > ENSG00000106991 -32.17749 0.4622641 2144.947 0 0 > ENSG00000123009 -32.17749 -0.2422835 1719.625 0 0 > ENSG00000133216 -32.17749 0.2206127 2127.575 0 0 > > I get very different results still from the exactTest with no GLM. > I attribute this to the missing pseudocounts in the DGEList object? > >> m[rownames(head(res)),] > sample.2 sample.3 sample.4 sample.5 sample.7 sample.8 > ENSG00000074855 0 948 902 0 676 0 > ENSG00000076351 0 494 522 0 538 0 > ENSG00000105552 0 490 454 0 486 0 > ENSG00000106991 0 1094 798 0 698 0 > ENSG00000123009 0 556 520 0 566 0 > ENSG00000133216 0 808 762 0 730 0 > sample.0 sample.1 > ENSG00000074855 1166 0 > ENSG00000076351 698 0 > ENSG00000105552 764 0 > ENSG00000106991 1733 0 > ENSG00000123009 980 0 > ENSG00000133216 1304 0 > > I see that these genes have all zero counts for TreatmentB and this explains the low logFC, but I don't get this when I don't use GLM at all (exactTest). > > So in summary, my questions are: > > 1) Does the ordering of rows in the design matrix have to be the same as the column order in the count matrix, > or the sample name is taken into account so the order should not matter? > > 2) Do you have any idea why I don't get pseudocounts after estimating the dispersion, and is it really this that causes such low logFC? > > I am looking forward to hearing from you. > > Many thanks in advance, > Ale > > > -- output of sessionInfo(): > > R version 2.15.0 (2012-03-30) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] plyr_1.7.1 optparse_1.0.1 ggplot2_0.9.3.1 reshape2_1.2.1 > [5] edgeR_3.0.8 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] MASS_7.3-17 RColorBrewer_1.0-5 colorspace_1.1-1 dichromat_1.2-4 > [5] digest_0.5.2 getopt_1.19.1 grid_2.15.0 gtable_0.1.2 > [9] labeling_0.1 munsell_0.3 proto_0.3-9.2 scales_0.2.3 > [13] stringr_0.6 tools_2.15.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 11 Date: Tue, 12 Mar 2013 16:14:58 -0400 From: Zhuzhu Zhang <zhuzhuz@email.unc.edu> To: Gordon K Smyth <smyth at="" wehi.edu.au=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and exactTest Message-ID: <513F8CC2.5070300 at email.unc.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear Gordon, Thank you very much for your reply. That was greatly helpful. For prion.df, would you recommend that the user use the default value in the current version, or choose it differently for different datasets since it varies as the data size changes? If the latter, would the same principle of choosing prior.n be applied? Thanks, Zhuzhu On 3/8/13 2:06 AM, Gordon K Smyth wrote: > Dear Zhuzhu, > > The edgeR User's guide gives case studies of how we intend edgeR to be > used. You will see that we expect users to generally use the default > parameters. Although there are a great many parameters that can be > varied in principle in the edgeR functions, we not expect them to be > changed in most analyses. > > One exception was the prior.n argument to estimateTagwiseDisp. It was > never our intention that a fixed value for prior.n would be used for > datasets of different sizes, so previous versions of the User's Guide > used to explain how to set this parameter. Several versions ago, we > eliminated the prior.n argument and replaced it with prior.df so that > the default behavior was what we intended. > > Over time we have reduced the default value for prior.df somewhat. > Larger values of prior.df give more priority to genes with large fold > changes between groups. Smaller values of prior.df give more priority > to genes with small dispersion values, i.e., to genes that are > consistent between replicates. > > You ask about exactTest(). To learn more about the different > rejection regions, you should start by reading the help page for > exactTest(), but I doubt that this is causing you a problem. The > theory behind the original exact negative binomial test of Robinson > and Smyth (2008) presupposed that the conditional distibution of the > counts given the genewise total was unimodal (like a binomial > distribution). This is true for typical dispersion values, but is not > true for very large dispersion values. Hence the original exactTest > can give totally innappropriate results when the dispersion is very, > very large. The new rejection region is similar to the original when > the dispersion is smallish, but gives sensible results in any > situation. Hence it should be used. The old rejection region is > preserved as an option only for backward compatability. > > Best wishes > Gordon > > >> Date: Wed, 06 Mar 2013 19:22:06 -0500 >> From: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu=""> >> To: bioconductor at r-project.org >> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and >> exactTest >> >> Dear All, >> >> I'm running edgeR 3.0.8 and notice that the results are considerably >> different than those from older versions. I realized that it was likely >> due to the different prior.df I used in Function estimateTagwiseDisp, >> thanks to an earlier discussion on the list- >> https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html >> >> I have a few more questions: >> >> 1. How to choose a proper prior.df (or prior.n)? >> >> 2. How is the new default method of exactText different from the old >> default (rejection.region = "smallp")? How does it improve the >> performance? >> >> 3. In general, what parameters should I tune for different datasets, >> when using function estimateTagwiseDisp and exactTest? I used the >> defaults but realized that they may not be most appropriate. >> >> Thank you for your time and attention. Any suggestions and comments >> would be extremely helpful and appreciated. >> >> Thanks, >> Zhuzhu >> > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}} ------------------------------ Message: 12 Date: Tue, 12 Mar 2013 21:32:03 +0100 From: Julian Gehring <julian.gehring@embl.de> To: bioconductor at r-project.org Subject: [BioC] Subsampling of BAM/SAM alignments Message-ID: <513F90C3.3020301 at embl.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi, Is there an efficient function to randomly subsample alignments from a BAM/SAM within R? In the best case, an equivalent to 'FastqSampler' in 'ShortRead'. If not, what would be a clever way of doing this with the bioc framework, without having to read the whole file first? Best wishes Julian ------------------------------ Message: 13 Date: Tue, 12 Mar 2013 21:51:39 +0100 From: Wolfgang Huber <whuber@embl.de> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: [BioC] DESeq2 Message-ID: <592E98F1-5FEF-47F0-B9F8-FA485F4489B0 at embl.de> Content-Type: text/plain; charset=us-ascii Dear DESeq users, Mike Love, Simon Anders and I have been updating the DESeq package. This resulted in the package DESeq2, which is available from the development branch, and scheduled for the next release: http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html For several release cycles, the original package (DESeq) will be maintained at its current functionality, in order to not disrupt the workflows of DESeq users. For new projects, we recommend using DESeq2. Major innovations are: * Base class: SummarizedExperiment is used as the superclass for storing the data, rather than eSet. This allows closer integration with upstream workflows involving GRanges and summarizeOverlaps, and facilitates downstream analyses of the genomic regions of interest. * Simplified workflow: the wrapper function DESeq() performs all steps for a differential expression analysis. The individual steps are of course also accessible. * More powerful statistics: incorporation of prior distributions into the estimation of dispersions and fold changes (empirical-Bayes shrinkage). The dispersion shrinkage improves power compared to the old DESeq. The fold changes shrinkage help moderate the otherwise large spread in log fold changes for genes with low counts, while it has negligible effect on genes with high counts; it may be particularly useful for visualisation, clustering, classification, ordination (PCA, MDS), similar to the variance-stabilizing transformation in the old DESeq. A Wald test for significance is provided as the default inference method, with the chi-squared test of the previous version is also available. A manuscript is in preparation. * Normalization: it is possible to provide a matrix of sample- *and* gene-specific normalization factors, which allows the use of normalisation factors from Bioconductor packages such as cqn and EDASeq. Examples of usage are provided in the vignette, and more details are available in the manual pages (specifically, the DESeq function and estimateDispersions function). Enjoy - Mike, Simon, Wolfgang. ------------------------------ Message: 14 Date: Tue, 12 Mar 2013 21:56:17 +0100 From: Wolfgang Huber <whuber@embl.de> To: "Nima [guest]" <guest at="" bioconductor.org=""> Cc: ahmadian.n at gmail.com, bioconductor at r-project.org Subject: Re: [BioC] displaying the image with EBImage Message-ID: <9F4C0F37-22AE-49BF-99CE-51F2BE3CCB4A at embl.de> Content-Type: text/plain; charset=us-ascii Nima Thank you. Can you provide a (minimal) transcript of the commands you issued, and the exact error message you get (e.g. what is the URL looked for but not found by firefox?) Best wishes Wolfgang Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <guest at="" bioconductor.org=""> ha scritto: > > Dear All > > I have problem with displaying my images in R, all functions are working except display > my OS is windows 7 but Mozilla Firefox gives this message: > Oops the page you are looking for could not be found > > would you please help me in this regard?? > > Regards > Nima > > -- output of sessionInfo(): > > R version 2.15.3 (2013-03-01) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] EBImage_4.0.0 > > loaded via a namespace (and not attached): > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 15 Date: Tue, 12 Mar 2013 22:27:58 +0100 From: Nima Ahmadian <ahmadian.n@gmail.com> To: Wolfgang Huber <whuber at="" embl.de=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] displaying the image with EBImage Message-ID: <can+vrqnyqc1cuu=4xml_p1q_sxnmnquicy1s=bb7fzoqjc7klg at="" mail.gmail.com=""> Content-Type: text/plain ^Dear Wolfgang Thank you so much but I solved the issue somehow, there is no any error message, but the image doesn't show automatically with my browser, I have to go to my Temp folder and open it manually so if you have any hint, I would be really grateful Cheers Nima On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > > Nima > > Thank you. Can you provide a (minimal) transcript of the commands you > issued, and the exact error message you get (e.g. what is the URL looked > for but not found by firefox?) > > Best wishes > Wolfgang > > Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] < > guest at bioconductor.org> ha scritto: > > > > > Dear All > > > > I have problem with displaying my images in R, all functions are working > except display > > my OS is windows 7 but Mozilla Firefox gives this message: > > Oops the page you are looking for could not be found > > > > would you please help me in this regard?? > > > > Regards > > Nima > > > > -- output of sessionInfo(): > > > > R version 2.15.3 (2013-03-01) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] EBImage_4.0.0 > > > > loaded via a namespace (and not attached): > > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4 > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]] ------------------------------ Message: 16 Date: Tue, 12 Mar 2013 22:53:39 +0100 From: Andrzej Ole? <andrzej.oles@gmail.com> To: Nima Ahmadian <ahmadian.n at="" gmail.com=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] displaying the image with EBImage Message-ID: <cah7mkkj-pzkd1a6vqkiu8tsybh5o5kc67fbaxzb3jhgcfvytrg at="" mail.gmail.com=""> Content-Type: text/plain; charset=ISO-8859-1 Dear Nima, many thanks for reporting this issue! Could you please check whether the example from the documentation of the 'display' function results in the same error? Try running the following commands: f = system.file("images", "lena-color.png", package="EBImage") x = readImage(f) display(x) If you still keep getting the error please copy the contents of the address bar of your browser and send it to me. Additionally, could you provide the complete path (i.e., beginning with c:\ ... or similar) to the Temp folder containing the generated images and the listing of its contents? I look forward to hearing from you soon. Best, Andrzej On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at="" gmail.com=""> wrote: > ^Dear Wolfgang > > Thank you so much but I solved the issue somehow, there is no any error > message, but the image doesn't show automatically with my browser, I have > to go to my Temp folder and open it manually so if you have any hint, I > would be really grateful > > Cheers > Nima > > On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > >> >> Nima >> >> Thank you. Can you provide a (minimal) transcript of the commands you >> issued, and the exact error message you get (e.g. what is the URL looked >> for but not found by firefox?) >> >> Best wishes >> Wolfgang >> >> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] < >> guest at bioconductor.org> ha scritto: >> >> > >> > Dear All >> > >> > I have problem with displaying my images in R, all functions are working >> except display >> > my OS is windows 7 but Mozilla Firefox gives this message: >> > Oops the page you are looking for could not be found >> > >> > would you please help me in this regard?? >> > >> > Regards >> > Nima >> > >> > -- output of sessionInfo(): >> > >> > R version 2.15.3 (2013-03-01) >> > Platform: x86_64-w64-mingw32/x64 (64-bit) >> > >> > locale: >> > [1] LC_COLLATE=English_United States.1252 >> > [2] LC_CTYPE=English_United States.1252 >> > [3] LC_MONETARY=English_United States.1252 >> > [4] LC_NUMERIC=C >> > [5] LC_TIME=English_United States.1252 >> > >> > attached base packages: >> > [1] stats graphics grDevices utils datasets methods base >> > >> > other attached packages: >> > [1] EBImage_4.0.0 >> > >> > loaded via a namespace (and not attached): >> > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4 >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 17 Date: Wed, 13 Mar 2013 10:37:03 +1100 (AUS Eastern Daylight Time) From: Gordon K Smyth <smyth@wehi.edu.au> To: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and exactTest Message-ID: <pine.wnt.4.64.1303131033340.6032 at="" pc765.wehi.edu.au=""> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Tue, 12 Mar 2013, Zhuzhu Zhang wrote: > Dear Gordon, > > Thank you very much for your reply. That was greatly helpful. > > For prion.df, would you recommend that the user use the default value in the > current version, or choose it differently for different datasets since it > varies as the data size changes? Use the default value. As I tried to explain, the whole point of using prior.df instead of prior.n is that the optimal value for prior.df does not depend on data size. > If the latter, would the same principle of > choosing prior.n be applied? prior.n is a function of prior.df. See ?getPriorN. Gordon > Thanks, > Zhuzhu > > > > > On 3/8/13 2:06 AM, Gordon K Smyth wrote: >> Dear Zhuzhu, >> >> The edgeR User's guide gives case studies of how we intend edgeR to be >> used. You will see that we expect users to generally use the default >> parameters. Although there are a great many parameters that can be >> varied in principle in the edgeR functions, we not expect them to be >> changed in most analyses. >> >> One exception was the prior.n argument to estimateTagwiseDisp. It was >> never our intention that a fixed value for prior.n would be used for >> datasets of different sizes, so previous versions of the User's Guide >> used to explain how to set this parameter. Several versions ago, we >> eliminated the prior.n argument and replaced it with prior.df so that >> the default behavior was what we intended. >> >> Over time we have reduced the default value for prior.df somewhat. >> Larger values of prior.df give more priority to genes with large fold >> changes between groups. Smaller values of prior.df give more priority >> to genes with small dispersion values, i.e., to genes that are >> consistent between replicates. >> >> You ask about exactTest(). To learn more about the different rejection >> regions, you should start by reading the help page for exactTest(), but >> I doubt that this is causing you a problem. The theory behind the >> original exact negative binomial test of Robinson and Smyth (2008) >> presupposed that the conditional distibution of the counts given the >> genewise total was unimodal (like a binomial distribution). This is >> true for typical dispersion values, but is not true for very large >> dispersion values. Hence the original exactTest can give totally >> innappropriate results when the dispersion is very, very large. The new >> rejection region is similar to the original when the dispersion is >> smallish, but gives sensible results in any situation. Hence it should >> be used. The old rejection region is preserved as an option only for >> backward compatability. >> >> Best wishes >> Gordon >> >> >>> Date: Wed, 06 Mar 2013 19:22:06 -0500 >>> From: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and >>> exactTest >>> >>> Dear All, >>> >>> I'm running edgeR 3.0.8 and notice that the results are considerably >>> different than those from older versions. I realized that it was >>> likely due to the different prior.df I used in Function >>> estimateTagwiseDisp, thanks to an earlier discussion on the list- >>> https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html >>> >>> I have a few more questions: >>> >>> 1. How to choose a proper prior.df (or prior.n)? >>> >>> 2. How is the new default method of exactText different from the old >>> default (rejection.region = "smallp")? How does it improve the >>> performance? >>> >>> 3. In general, what parameters should I tune for different datasets, >>> when using function estimateTagwiseDisp and exactTest? I used the >>> defaults but realized that they may not be most appropriate. >>> >>> Thank you for your time and attention. Any suggestions and comments >>> would be extremely helpful and appreciated. >>> >>> Thanks, >>> Zhuzhu ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ------------------------------ Message: 18 Date: Wed, 13 Mar 2013 11:17:30 +1100 (AUS Eastern Daylight Time) From: Gordon K Smyth <smyth@wehi.edu.au> To: Ian Sudbery <ian.sudbery at="" dpag.ox.ac.uk=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: [BioC] negative correlation from limma's duplicateCorrelation Message-ID: <pine.wnt.4.64.1303131037280.6032 at="" pc765.wehi.edu.au=""> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Dear Ian, The help page for duplicateCorrelation() says "For this function to return statistically useful results, there must be at least two more arrays than the number of coefficients to be estimated, i.e., two more than the column rank of design." You have just two arrays, only one more than the number of columns in the design matrix. So there is not enough meaningful information from which to estimate the duplicate correlation. The correlations that you are computing using cor() are not comparable to the quantity estimated by duplicateCorrelation. You are computing correlations between pairs of spots for the same gene across different genes. This correlation arises from the fact that different genes have different expression levels. You would get a positive value for this correlation even if there were no spot duplicates. The duplicateCorrelation() function on the other hand computes a correlation across technical replicates for the same gene. These two correlations are not related. Best wishes Gordon > Date: Mon, 11 Mar 2013 18:27:25 +0000 > From: Ian Sudbery <ian.sudbery at="" dpag.ox.ac.uk=""> > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] negative correlation from limma's duplicateCorrelation > > Hi all, > > I might be misunderstanding something here, but I think I'm having > trouble with Limma's duplicateCorrelation function. > > I am analysing a set of custom spotted two color microarrays. Each > comparison contains two arrays. Each array contains two copies of each > spot. The design table for each comparison looks something like this: > >> targets > SlideNumber FileName Cy3 Cy5 Date Name > yeast_wt_v_dd_rep1 823 s823_bwp17y+ddy.txt wty ddy 17/12/2012 yeast_wt_v_dd_rep1 > yeast_wt_v_dd_rep2 828 s828_wty+ddy.txt wty ddy 18/01/2013 yeast_wt_v_dd_rep2 > > > After background correction and normalisation, I run > duplicateCorrelation before fitting the model: > >> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) > > When I look at the results, I find that the consensus correlation is > negative: > >> corfit$consensus.correlation > [1] -0.4163954 > > Surely this is wrong? Examining the correlation for duplicate spots on > each slide manually suggests a good correlation, particularly if one > ignores flagged spots: > >> good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1 >> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][goo d_spots[,1]]) > [1] 0.8769604 > >> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][goo d_spots[,2]]) > [1] 0.858891 > > Even without removing the flagged spots, the correlation is still positive: >> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1]) > [1] 0.2669379 > >> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2]) > [1] 0.1843577 > > I know that this isn't the way that duplicateCorrelation calculates rho, > but one would expect that a good correlation in this way might indicate > at least a positive correlation from duplicateCorrelation? Or am I > interpreting the consensus correlation incorrectly? > > Here is the rest of my analysis script (I've left out some diagnostic plot generation): > >> targets <- readTargets(row.names="Name") >> RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50)) >> spottypes <- readSpotTypes() >> RG$genes$Status <- controlStatus(spottypes,RG) >> anno <- read.delim("anno.txt", comment.char="!", header = T) >> RG_background <- backgroundCorrect(RG,method = "normexp", offset =10) >> >> #flag out spots where intentity isn't much above background in both channels >> RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0 >> MA <- normalizeWithinArrays(RG_background) >> MA_norm <- MA[MA$genes$Status == "cDNA",] >> MA_norm$genes$Status <- NULL >> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324) >> fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324) >> fit<- eBayes(fit) > >> sesssionInfo() > R version 2.15.1 (2012-06-22) > > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] dichromat_2.0-0 gplots_2.11.0 MASS_7.3-18 KernSmooth_2.23-7 caTools_1.14 gdata_2.12.0 gtools_2.7.0 > [8] reshape2_1.2.2 ggplot2_0.9.3 statmod_1.4.16 limma_3.14.4 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 colorspace_1.2-1 digest_0.6.2 gtable_0.1.2 labeling_0.1 munsell_0.4 plyr_1.8 > [8] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 tools_2.15.1 > > Any advice appreciated > > Ian > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ------------------------------ Message: 19 Date: Wed, 13 Mar 2013 11:27:50 +1100 (AUS Eastern Daylight Time) From: Gordon K Smyth <smyth@wehi.edu.au> To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: [BioC] anova like test in edgeR Message-ID: <pine.wnt.4.64.1303131122400.6032 at="" pc765.wehi.edu.au=""> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT) > From: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] anova like test in edgeR > > Hello everyone: > > a wish to know if there is one way of doing an anova like test in edgeR, > comparing all the possibilities between the groups defined in the > experimental design. > > in the users guide the authors explain one way anova test, but the > comparison is done only against the intercept group, is there a way of > doing a general test comparing all the posibilities and not only against > the intercept? The example in the User's Guide is a general test comparing all possibilities. Best wishes Gordon ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ------------------------------ Message: 20 Date: Wed, 13 Mar 2013 01:38:44 +0100 From: Nima Ahmadian <ahmadian.n@gmail.com> To: Andrzej Ole? <andrzej.oles at="" gmail.com=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] displaying the image with EBImage Message-ID: <can+vrqml+9hs0iq+khoocisgxw7nncxk1zbnxxv_gk5pgkcs0q at="" mail.gmail.com=""> Content-Type: text/plain Dear Andrzej I solved the issue, but I came to know that if you have searchqu toolbar installed on your browser it prevents the images to get open, so I removed it and it worked out cheers Nima On Tue, Mar 12, 2013 at 10:53 PM, Andrzej Ole?? <andrzej.oles at="" gmail.com="">wrote: > Dear Nima, > > many thanks for reporting this issue! Could you please check whether > the example from the documentation of the 'display' function results > in the same error? Try running the following commands: > > f = system.file("images", "lena-color.png", package="EBImage") > x = readImage(f) > display(x) > > If you still keep getting the error please copy the contents of the > address bar of your browser and send it to me. Additionally, could you > provide the complete path (i.e., beginning with c:\ ... or similar) to > the Temp folder containing the generated images and the listing of its > contents? > > I look forward to hearing from you soon. > > Best, > Andrzej > > > On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at="" gmail.com=""> > wrote: > > ^Dear Wolfgang > > > > Thank you so much but I solved the issue somehow, there is no any error > > message, but the image doesn't show automatically with my browser, I have > > to go to my Temp folder and open it manually so if you have any hint, I > > would be really grateful > > > > Cheers > > Nima > > > > On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > > > >> > >> Nima > >> > >> Thank you. Can you provide a (minimal) transcript of the commands you > >> issued, and the exact error message you get (e.g. what is the URL looked > >> for but not found by firefox?) > >> > >> Best wishes > >> Wolfgang > >> > >> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] < > >> guest at bioconductor.org> ha scritto: > >> > >> > > >> > Dear All > >> > > >> > I have problem with displaying my images in R, all functions are > working > >> except display > >> > my OS is windows 7 but Mozilla Firefox gives this message: > >> > Oops the page you are looking for could not be found > >> > > >> > would you please help me in this regard?? > >> > > >> > Regards > >> > Nima > >> > > >> > -- output of sessionInfo(): > >> > > >> > R version 2.15.3 (2013-03-01) > >> > Platform: x86_64-w64-mingw32/x64 (64-bit) > >> > > >> > locale: > >> > [1] LC_COLLATE=English_United States.1252 > >> > [2] LC_CTYPE=English_United States.1252 > >> > [3] LC_MONETARY=English_United States.1252 > >> > [4] LC_NUMERIC=C > >> > [5] LC_TIME=English_United States.1252 > >> > > >> > attached base packages: > >> > [1] stats graphics grDevices utils datasets methods base > >> > > >> > other attached packages: > >> > [1] EBImage_4.0.0 > >> > > >> > loaded via a namespace (and not attached): > >> > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4 > >> > > >> > -- > >> > Sent via the guest posting facility at bioconductor.org. > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor at r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ------------------------------ Message: 21 Date: Wed, 13 Mar 2013 01:21:49 +0000 From: Nadia Davidson <nadia.davidson@mcri.edu.au> To: <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] Negative parameter error when using goseq Message-ID: <loom.20130313t015334-216 at="" post.gmane.org=""> Content-Type: text/plain; charset="us-ascii" Tarca, Adi <atarca at="" ...=""> writes: > > Dear Matthew, > I am getting the following error when using goseq. Any idea what may cause it? > Thanks, > Adi > > > pwf=nullp(genes,bias.data=cntbias) > > gocats=as.list(org.Hs.egGO2ALLEGS) > > GOr=goseq(pwf,gene2cat=gocats) > Using manually entered categories. > Calculating the p-values... > Error in dWNCHypergeo(num_de_incat, num_incat, num_genes - num_incat, : > Negative parameter > > head(pwf) > DEgenes bias.data pwf > 642819 0 3764 0.05775376 > 414235 0 124 0.02107342 > 57107 0 14702 0.06023892 > 649159 1 248 0.02542837 > 3127 0 44633 0.06357525 > 100507412 1 1337 0.05268580 > > head(gocats[3:5]) > $`GO:0000012` > IDA IDA IEA IMP IDA IDA > "3981" "7141" "7515" "23411" "54840" "55775" > IMP IMP IEA > "55775" "200558" "100133315" > > Adi Laurentiu TARCA, Ph.D. Dear Adi, I think the trouble is that "org.Hs.egGO2ALLEGS" lists some GO groups twice for the same gene. This is making goseq overestimate the number of genes in a particular GO category, to the extent that the number of genes in the category is more than the total number of genes. It should be simple to fix with: gocats<-sapply(gocats,function(x){unique(x)}) before the "goseq" command. I'll update the goseq code so it checks for this scenario in future. Cheers, Nadia. ------------------------------ Message: 22 Date: Wed, 13 Mar 2013 12:57:47 +1100 (AUS Eastern Daylight Time) From: Gordon K Smyth <smyth@wehi.edu.au> To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] anova like test in edgeR Message-ID: <pine.wnt.4.64.1303131254070.6032 at="" pc765.wehi.edu.au=""> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed The example is in Section 3.2.5, which is titled "An ANOVA-like test for any differences". Gordon > Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT) > From: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] anova like test in edgeR > > Hello everyone: > > a wish to know if there is one way of doing an anova like test in edgeR, > comparing all the possibilities between the groups defined in the > experimental design. > > in the users guide the authors explain one way anova test, but the > comparison is done only against the intercept group, is there a way of > doing a general test comparing all the posibilities and not only against > the intercept? The example in the User's Guide is a general test comparing all possibilities. Best wishes Gordon ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} ------------------------------ Message: 23 Date: Tue, 12 Mar 2013 19:45:01 -0700 From: Martin Morgan <mtmorgan@fhcrc.org> To: Julian Gehring <julian.gehring at="" embl.de=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] Subsampling of BAM/SAM alignments Message-ID: <513FE82D.6050104 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 03/12/2013 01:32 PM, Julian Gehring wrote: > Hi, > > Is there an efficient function to randomly subsample alignments from a BAM/SAM > within R? In the best case, an equivalent to 'FastqSampler' in 'ShortRead'. If > not, what would be a clever way of doing this with the bioc framework, without > having to read the whole file first? Hi Julian -- there isn't anything at the moment; below is a function that will have at most yieldSize * 2 elements in memory at one time. All the elements in the bam file (satisfying ScanBamParam) will eventually pass through memory, so not super efficient. sampleBam <- function(file, index=file, ..., yieldSize, verbose=FALSE, param=ScanBamParam(what=scanBamWhat())) { qunlist <- Rsamtools:::.quickUnlist tot <- sampleSize <- yieldSize bf <- open(BamFile(file, index, yieldSize=yieldSize)) smpl <- qunlist(scanBam(bf, param=param)) repeat { yld <- qunlist(scanBam(bf, param=param)) yld_n <- length(yld[[1]]) if (length(yld[[1]]) == 0L) break tot <- tot + yld_n keep <- rbinom(1L, yld_n, yld_n/ tot) if (verbose) message(sprintf("sampling %d of %d", keep, tot)) if (keep == 0L) next i <- sample(sampleSize, keep) j <- sample(yld_n, keep) smpl <- Map(function(x, y, i, j) { x[i] <- y[j] x }, smpl, yld, MoreArgs=list(i=i, j=j)) } close(bf) smpl } with fl <- system.file("extdata", "ex1.bam", package="Rsamtools") param <- ScanBamParam( flag=scanBamFlag(isUnmappedQuery=FALSE), what=c("rname", "pos", "cigar", "strand")) lst <- sampleBam(fl, yieldSize=1000, param=param, verbose=TRUE) The end result could be fed to a more friendly structure names(lst)[names(lst) == "rname"] = "seqname" do.call(GappedAlignments, lst) Lightly tested, especially when ScanBamParam() is non-trivial. Martin > > Best wishes > Julian > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ------------------------------ Message: 24 Date: Tue, 12 Mar 2013 17:28:57 -0400 From: Amy Vandiver <amyruthvandiver@gmail.com> To: <bioconductor at="" stat.math.ethz.ch=""> Subject: [BioC] ArrayExpress Question Message-ID: <caf4aqnhqpohkepghenuknje+cx2e3d8ts-vlp=wu_oki3fk7ta at="" mail.gmail.com=""> Content-Type: text/plain Hello, I've been trying to access Array Expression data using the Array Expression Package, however I am running into an error, seemingly due to the format of the SDRF file. I have been trying to run through the ae2bioc function line by line (ae2bioc) to determine if I can work around this error, however many of the called functions (readPhenoData, getSDRFcolumn, etc.) can not be found. Am I missing a specific package containing these functions? Is there a way to work around the original error? Thank you, Amy *My script:* source("http://bioconductor.org/biocLite.R") biocLite("ArrayExpress") library("ArrayExpress") AEset=ArrayExpress("E-MTAB-202") *Error:* Error in .subset2(x, i, exact = exact) : attempt to select less than one element In addition: Warning message: In readPhenoData(sdrf, path) : ArrayExpress: Cannot find 'Label' column in SDRF. Object might not be created correctly. *SessionInfo:* R Under development (unstable) (2013-03-10 r62201) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices datasets utils methods [8] base other attached packages: [1] minfiLocal_0.1.20 [2] minfiData_0.3.1 [3] IlluminaHumanMethylation450kannotation.ilmn.v1.2_0.1.3 [4] IlluminaHumanMethylation450kmanifest_0.4.0 [5] bumphunter_0.99.33 [6] iterators_1.0.6 [7] foreach_1.4.0 [8] minfi_1.5.3 [9] Biostrings_2.27.11 [10] GenomicRanges_1.11.35 [11] IRanges_1.17.35 [12] reshape_0.8.4 [13] plyr_1.8 [14] lattice_0.20-13 [15] limma_3.15.14 [16] ArrayExpress_1.19.0 [17] Biobase_2.19.3 [18] BiocGenerics_0.5.6 [19] BiocInstaller_1.9.6 loaded via a namespace (and not attached): [1] affy_1.37.4 affyio_1.27.1 beanplot_1.1 [4] codetools_0.2-8 DNAcopy_1.33.1 doRNG_1.5 [7] grid_3.1.0 illuminaio_0.1.5 itertools_0.1-1 [10] MASS_7.3-24 matrixStats_0.6.2 mclust_4.0 [13] multtest_2.15.0 nor1mix_1.1-3 preprocessCore_1.21.1 [16] RColorBrewer_1.0-5 R.methodsS3_1.4.2 siggenes_1.33.1 [19] splines_3.1.0 stats4_3.1.0 survival_2.37-4 [22] tools_3.1.0 XML_3.95-0.2 zlibbioc_1.5.0 -- Amy Ruth Vandiver MSII, GSIII Johns Hopkins School of Medicine, MD-PhD Program [[alternative HTML version deleted]] ------------------------------ Message: 25 Date: Wed, 13 Mar 2013 01:55:49 +0000 From: "Marwaha, Shruti (marwahsi)" <marwahsi@mail.uc.edu> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: [BioC] Error in getGEO command with GSEMatrix=TRUE Message-ID: <39BA056F90A9A14C8067B39997BFEDCB5867150A at SN2PRD0102MB117.prod.exchangelabs.com> Content-Type: text/plain; charset="us-ascii" Dear Developers, I am trying to get an ExpressionSet from a GSEfile using getGEO command with GSEMatrix=TRUE. It sometimes gives an error (Connection was reset) if I use GSEMatrix=TRUE however sometimes it works smooth. I have the latest version of R, Bioconductor and GEOquery. Please advise if I am missing some library or need to change some settings. Thanks in advance. My Code library(Biobase) library(GEOquery) library(limma) gset <- getGEO("GSE27347", GSEMatrix =TRUE) Error: Error in function (type, msg, asError = TRUE) : Send failure: Connection was reset Session Info > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C [5] LC_TIME=English_India.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.14.4 GEOquery_2.24.1 Biobase_2.18.0 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2 System Properties OS: Windows 7 RAM: 4GB Processor: 2GHz Methods already attempted * I have tried it on both RStudio and GUI * I have also tried using the command "setInternet2(use=FALSE)" but it doesn't help. * I have tried this command with other GSEfiles also and get the same error. * GEOquery works fine and downloads the GSEfiles if I give GSEMatrix=FALSE * I have also tried downloading the file and provide the file path of the file to getGEO with GSEMatrix=TRUE but it does not give an ExpressionSet as output. Thanks & Regards, Shruti Marwaha Graduate Student Systems Biology and Physiology University of Cincinnati Medical Science Building, 231 Albert Sabin Way Cincinnati, OH, USA 45267 ------------------------------ Message: 26 Date: Tue, 12 Mar 2013 21:35:26 -0600 From: chris_utah <chris.gregg@neuro.utah.edu> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: [BioC] biomaRt access for NCBIM37 build of mouse genome Message-ID: <6D52F2F6-3564-4DA7-9101-2AA9B1D15BF8 at neuro.utah.edu> Content-Type: text/plain Hello, I am using biomaRt. I want to retrieve transcript start and stop info based on the NCBIM37 (mm9) build using getBM. I can't find this annotation in listMarts(archive =T). Is anyone aware of a simple solution? Thank you. best wishes, Chris > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.8 limma_3.14.1 rtracklayer_1.18.0 biomaRt_2.14.0 GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 [8] Gviz_1.2.0 loaded via a namespace (and not attached): [1] annotate_1.36.0 AnnotationDbi_1.20.2 Biobase_2.18.0 Biostrings_2.26.2 biovizBase_1.6.0 bitops_1.0-4.2 [7] BSgenome_1.26.1 cluster_1.14.3 colorspace_1.2-0 DBI_0.2-5 DESeq_1.10.1 dichromat_1.2-4 [13] genefilter_1.40.0 geneplotter_1.36.0 GenomicFeatures_1.10.0 Hmisc_3.10-1 labeling_0.1 lattice_0.20-10 [19] munsell_0.4 parallel_2.15.2 plyr_1.7.1 RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.1 [25] RSQLite_0.11.2 scales_0.2.2 splines_2.15.2 stats4_2.15.2 stringr_0.6.1 survival_2.36-14 [31] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 [[alternative HTML version deleted]] ------------------------------ Message: 27 Date: Tue, 12 Mar 2013 23:39:06 -0700 From: "Ryan C. Thompson" <rct@thompsonclan.org> To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] anova like test in edgeR Message-ID: <51401F0A.6070803 at thompsonclan.org> Content-Type: text/plain Luis, Your design has 4 groups,which means that the inter-group ANOVA-like test has 4 - 1 = 3 degrees of freedom. Hence, the ANOVA-like test is performed by testing 3 coefficients/contrasts equal to zero. The procedures I have described will yield will test where the null hypothesis is all group means being equal, and the alternative hypothesis is that any two or more group means differ significantly from each other. It is true that between 4 groups, there are 6 possible pairwise comparisons to be made. If you want individual significance rankings for each of the 6 possible contrasts, you can do that too, by making 6 separate calls to glmLRT, each with a single coefficient or contrast. Doing so will give you an appropriate rank order of significance for each contrast, but the exact FDR values will probably be too liberal, since there is no correction for the fact that you are performing 6 tests on each gene instead of just 1. Ideally, you would use a post-hoc testing method to assess individual contrasts within genes that are called significant by their FDR in the ANOVA-like test. However, I'm not sure how to perform such tests with the negative binomial distribution. You might consider using the limma-voom method instead of edgeR, since that will perform an actual ANOVA based on the normal distribution, so standard ANOVA post-hoc analysis is possible. If anyone has a take on how to do rigorous post-hoc testing on negative binomial GLMs, I'd be interested in hearing about it. If you just want the logFC values for the missing 3 contrasts, then simply note that they can all be written in terms of the X vs 10 contrasts. For example, the 20 vs 40 contrast would be the 10 vs 40 contrast minus the 10 vs 20 contrast. -Ryan On 03/12/2013 06:42 PM, Luis alberto Martinez lopez wrote: > thank you Ryan for your clear answer: > but in your answer again only considered the comparison versus the 10 > group(the first) > i dont't have a control group so i'm looking for a way of doing an > anova like test in edgeR that compares 10 vs 20 ,20 vs 40, 40 vs 60,10 > vs 40, 10 vs 60,20 vs 60 ie all the posibilities at the same time and > that give me a list with the genes more differential in any condition, > thanks in advance, > Luis > *De:* Ryan C. Thompson <rct at="" thompsonclan.org=""> > *Para:* Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com=""> > *CC:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Enviado:* Lunes, 11 de marzo, 2013 22:46:19 > *Asunto:* Re: [BioC] anova like test in edgeR > > Luis, > > Note that your experimental design is the same whether or not you > include an intercept in the design matrix. If you include an > intercept, then the intercept represents the level in the D10 group, > and the other three coefficients represent the differences between > D20/40/60 and D10. So with an intercept, you would specify coef=2:4 to > do an ANOVA-like test for any differences among all four groups. > > If you choose not to use an intercept term, then the design matrix is > just a different parametrization of the same design. Instead of > representing differences, each coefficient simply represents the > expression level in a group. So you can still perform the same test, > you simply have to specify the contrasts (differences) explicitly: > > > anova.contrasts <- makeContrasts(contrasts=c("grpD20-grpD10", > "grpD40-grpD10", "grpD60-grpD10"), levels=design) > > print(anovalike.contrasts) > Contrasts > Levels grpD20-grpD10 grpD40-grpD10 grpD60-grpD10 > grpD10 -1 -1 -1 > grpD20 1 0 0 > grpD40 0 1 0 > grpD60 0 0 1 > > glmLRT(fit2, contrast=anova.contrasts) > > Hope this helps, > > -Ryan Thompson > > > On 03/11/2013 05:18 PM, Luis alberto Martinez lopez wrote: >> Hello everyone: >> >> a wish to know if there is one way of doing an anova like test in edgeR, comparing all the possibilities between the groups defined in the experimental design. >> >> in the users guide the authors explain one way anova test, but the comparison is done only against the intercept group, is there a way of doing a general test comparing all the posibilities and not only against the intercept? >> >> also i'd like to know what does the following differential expression analysis does because the output is different when i include an intercept, i can't deduce what comparison are being doing.... >> >>> grp<-c("D10","D10","D20","D20","D40","D40","D60","D60") >>> dge = DGEList(counts=general.m, group=grp) >>> dge = calcNormFactors(dge) >>> design <- model.matrix(~0+grp, data=dge$samples) >> Warning message: >> In model.matrix.default(~0 + grp, data = dge$samples) : >> variable 'grp' converted to a factor >>> design >> grpD10 grpD20 grpD40 grpD60 >> i10re1 1 0 0 0 >> i10re2 1 0 0 0 >> i20re1 0 1 0 0 >> i20re2 0 1 0 0 >> i40re1 0 0 1 0 >> i40re2 0 0 1 0 >> i60re1 0 0 0 1 >> i60re2 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$grp >> [1] "contr.treatment" >> >>> dge <- estimateGLMCommonDisp(dge,design) >>> dge <- estimateGLMTrendedDisp(dge,design) >> Loading required package: splines >>> dge <- estimateGLMTagwiseDisp(dge,design) >>> fit2 <- glmFit(dge, design) >>> lrt.anova_like <- glmLRT(fit2, coef=2:4) >>> lrt.anova_like >> An object of class "DGELRT" >> $coefficients >> grpD10 grpD20 grpD40 grpD60 >> comp101_c0 -12.55010 -16.12974 -13.27081 -16.12974 >> comp103_c0 -13.85721 -13.35290 -12.61155 -13.96824 >> comp120_c0 -12.33347 -13.98877 -16.12974 -16.12974 >> comp179_c0 -12.55135 -13.98566 -16.12974 -13.96827 >> comp192_c0 -13.85721 -13.98569 -13.27548 -12.67228 >> 23484 more rows ... >> >> $fitted.values >> i10re1 i10re2 i20re1 i20re2 i40re1 i40re2 i60re1 >> comp101_c0 2.0108643 1.9941877 0.0000000 0.0000000 0.841309 1.168177 0.0000000 >> comp103_c0 0.5020816 0.4979177 1.1140306 0.8859780 1.674705 2.325366 0.5414430 >> comp120_c0 2.5112747 2.4904480 0.5552870 0.4416145 0.000000 0.000000 0.0000000 >> comp179_c0 2.0083306 1.9916749 0.5570245 0.4429963 0.000000 0.000000 0.5414282 >> comp192_c0 0.5020816 0.4979177 0.5570036 0.4429797 0.837345 1.162673 2.1657497 >> i60re2 >> comp101_c0 0.0000000 >> comp103_c0 0.4585719 >> comp120_c0 0.0000000 >> comp179_c0 0.4585593 >> comp192_c0 1.8342685 >> 23484 more rows ... >> >> $deviance >> comp101_c0 comp103_c0 comp120_c0 comp179_c0 comp192_c0 >> 4.386443 3.070288 2.848548 3.917710 2.629182 >> 23484 more elements ... >> >> $df.residual >> [1] 4 4 4 4 4 >> 23484 more elements ... >> >> $abundance >> [1] -13.63381 -13.35715 -13.64240 -13.64482 -13.35716 >> 23484 more elements ... >> >> $design >> grpD10 grpD20 grpD40 grpD60 >> i10re1 1 0 0 0 >> i10re2 1 0 0 0 >> i20re1 0 1 0 0 >> i20re2 0 1 0 0 >> i40re1 0 0 1 0 >> i40re2 0 0 1 0 >> i60re1 0 0 0 1 >> i60re2 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$grp >> [1] "contr.treatment" >> >> >> $offset >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] >> [1,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095 >> [2,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095 >> [3,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095 >> [4,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095 >> [5,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095 >> 23484 more rows ... >> >> $dispersion >> [1] 0.2161256308 0.0003318302 0.0634519684 0.0003270568 0.0003318016 >> 23484 more elements ... >> >> $method >> [1] "oneway" >> >> $samples >> group lib.size norm.factors >> i10re1 D10 594988 0.9808651 >> i10re2 D10 590850 0.9795431 >> i20re1 D20 723182 1.0343009 >> i20re2 D20 567524 1.0481805 >> i40re1 D40 452242 1.1449003 >> i40re2 D40 671656 1.0703965 >> i60re1 D60 801682 0.8892299 >> i60re2 D60 685351 0.8809633 >> >> $table >> logFC.grpD20 logFC.grpD40 logFC.grpD60 logCPM LR >> comp101_c0 -23.27030 -19.14573 -23.27030 0.2621324 647.0115 >> comp103_c0 -19.26416 -18.19462 -20.15192 0.6612798 194043.3654 >> comp120_c0 -20.18153 -23.27030 -23.27030 0.2497509 1999.3276 >> comp179_c0 -20.17704 -23.27030 -20.15196 0.2462508 196431.3868 >> comp192_c0 -20.17709 -19.15248 -18.28224 0.6612611 194057.4372 >> PValue >> comp101_c0 6.475824e-140 >> comp103_c0 0.000000e+00 >> comp120_c0 0.000000e+00 >> comp179_c0 0.000000e+00 >> comp192_c0 0.000000e+00 >> 23484 more rows ... >> >> $comparison >> [1] "grpD20" "grpD40" "grpD60" >> >> $df.test >> [1] 3 3 3 3 3 >> 23484 more elements ... >> >> >>> top<-topTags(lrt.anova_like) >>> top >> Coefficient: grpD20 grpD40 grpD60 >> logFC.grpD20 logFC.grpD40 logFC.grpD60 logCPM LR PValue >> comp628_c0 -23.2703 -23.27030 -19.23749 0.2462499 196434.924 0 >> comp2607_c0 -23.2703 -20.07173 -18.68223 0.4686227 195146.814 0 >> comp2858_c0 -23.2703 -23.27030 -16.85200 1.1206869 192585.615 0 >> comp2885_c0 -23.2703 -18.60303 -17.96733 0.6593429 4326.878 0 >> comp2904_c0 -23.2703 -23.27030 -16.21482 1.8318474 2065.059 0 >> comp2912_c0 -23.2703 -17.88139 -20.15196 1.1207038 192646.480 0 >> comp2933_c0 -23.2703 -18.59551 -18.68219 0.4686259 195111.455 0 >> comp2950_c0 -23.2703 -19.16586 -17.97899 0.4607410 1715.721 0 >> comp2999_c0 -23.2703 -23.27030 -23.27030 1.2524134 3853.539 0 >> comp3028_c0 -23.2703 -19.15248 -17.30519 0.9831770 192775.649 0 >> FDR >> comp628_c0 0 >> comp2607_c0 0 >> comp2858_c0 0 >> comp2885_c0 0 >> comp2904_c0 0 >> comp2912_c0 0 >> comp2933_c0 0 >> comp2950_c0 0 >> comp2999_c0 0 >> comp3028_c0 0 >>> lrt.anova_like <- glmLRT(fit2, coef=1:4) >> Error in mglmLevenberg(y, design = design, dispersion = dispersion, offset = offset, : >> BLAS/LAPACK routine 'DGEMM ' gave error code -13 >> >> >> >> >> >> The DGELRT object says that is comparing >> >> $comparison >> [1] "grpD20" "grpD40" "grpD60" >> >> but what exactly this means? >> >> Luis >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor > > > [[alternative HTML version deleted]] ------------------------------ Message: 28 Date: Wed, 13 Mar 2013 00:59:20 -0700 From: "Ryan C. Thompson" <rct@thompsonclan.org> To: Wolfgang Huber <whuber at="" embl.de=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] DESeq2 Message-ID: <514031D8.6010006 at thompsonclan.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello, I noticed the addition of the DESeq2 package a few days ago and had a look at the new additions. Overall, it looks like an excellent package (and it runs a lot faster than DESeq 1, too). I have a few questions for clarification of exactly what methods DESeq2 is using. Specifically, I notice that the fold change shrinkage is performed in nbinomWaldTest but not in nbinomChisqTest. Is this just for reasons of backward-compatibility of results, or is the Chi-squared test logically incompatible with shrunken GLM coefficients? Secondly, is there any plan to extend the Wald test to testing contrasts of multiple coefficients or testing multiple coefficients/contrasts at once in an ANOVA-like test? Thanks, -Ryan Thompson On 03/12/2013 01:51 PM, Wolfgang Huber wrote: > Dear DESeq users, > > Mike Love, Simon Anders and I have been updating the DESeq package. This resulted in the package DESeq2, which is available from the development branch, and scheduled for the next release: http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html > > For several release cycles, the original package (DESeq) will be maintained at its current functionality, in order to not disrupt the workflows of DESeq users. For new projects, we recommend using DESeq2. Major innovations are: > > * Base class: SummarizedExperiment is used as the superclass for storing the data, rather than eSet. This allows closer integration with upstream workflows involving GRanges and summarizeOverlaps, and facilitates downstream analyses of the genomic regions of interest. > > * Simplified workflow: the wrapper function DESeq() performs all steps for a differential expression analysis. The individual steps are of course also accessible. > > * More powerful statistics: incorporation of prior distributions into the estimation of dispersions and fold changes (empirical-Bayes shrinkage). The dispersion shrinkage improves power compared to the old DESeq. The fold changes shrinkage help moderate the otherwise large spread in log fold changes for genes with low counts, while it has negligible effect on genes with high counts; it may be particularly useful for visualisation, clustering, classification, ordination (PCA, MDS), similar to the variance-stabilizing transformation in the old DESeq. A Wald test for significance is provided as the default inference method, with the chi-squared test of the previous version is also available. A manuscript is in preparation. > > * Normalization: it is possible to provide a matrix of sample- *and* gene-specific normalization factors, which allows the use of normalisation factors from Bioconductor packages such as cqn and EDASeq. > > Examples of usage are provided in the vignette, and more details are available in the manual pages (specifically, the DESeq function and estimateDispersions function). > > Enjoy - > > Mike, Simon, Wolfgang. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 29 Date: Wed, 13 Mar 2013 09:40:11 +0100 From: Hans-Rudolf Hotz <hrh@fmi.ch> To: chris_utah <chris.gregg at="" neuro.utah.edu=""> Cc: Bioconductor mailing list <bioconductor at="" r-project.org=""> Subject: Re: [BioC] biomaRt access for NCBIM37 build of mouse genome Message-ID: <51403B6B.9020700 at fmi.ch> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Hi Chris use one of the archives. 'ensembl67' ("may2012.archive.ensembl.org") was the last ensembl release containing NCBIM37 (if I remember correctly). So you can try: > library(biomaRt) > ensembl67=useMart(host='may2012.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL') > listDatasets(ensembl67)[grep("Mus", listDatasets(ensembl67)$description),] dataset description version 48 mmusculus_gene_ensembl Mus musculus genes (NCBIM37) NCBIM37 > Regards, Hans-Rudolf On 03/13/2013 04:35 AM, chris_utah wrote: > Hello, > > I am using biomaRt. I want to retrieve transcript start and stop info based on the NCBIM37 (mm9) build using getBM. I can't find this annotation in listMarts(archive =T). Is anyone aware of a simple solution? > > Thank you. > > best wishes, > Chris > > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.1 rtracklayer_1.18.0 biomaRt_2.14.0 GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > [8] Gviz_1.2.0 > > loaded via a namespace (and not attached): > [1] annotate_1.36.0 AnnotationDbi_1.20.2 Biobase_2.18.0 Biostrings_2.26.2 biovizBase_1.6.0 bitops_1.0-4.2 > [7] BSgenome_1.26.1 cluster_1.14.3 colorspace_1.2-0 DBI_0.2-5 DESeq_1.10.1 dichromat_1.2-4 > [13] genefilter_1.40.0 geneplotter_1.36.0 GenomicFeatures_1.10.0 Hmisc_3.10-1 labeling_0.1 lattice_0.20-10 > [19] munsell_0.4 parallel_2.15.2 plyr_1.7.1 RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.1 > [25] RSQLite_0.11.2 scales_0.2.2 splines_2.15.2 stats4_2.15.2 stringr_0.6.1 survival_2.36-14 > [31] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0 zlibbioc_1.4.0 > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ------------------------------ Message: 30 Date: Wed, 13 Mar 2013 09:45:43 +0100 From: Michael Love <michaelisaiahlove@gmail.com> To: rct at thompsonclan.org, Wolfgang Huber <whuber at="" embl.de=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] DESeq2 Message-ID: <51403CB7.7080002 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed hi Ryan, Thanks for the comments. On 03/13/13 08:59, Ryan C. Thompson wrote: > Hello, > > I noticed the addition of the DESeq2 package a few days ago and had a > look at the new additions. Overall, it looks like an excellent package > (and it runs a lot faster than DESeq 1, too). I have a few questions > for clarification of exactly what methods DESeq2 is using. > Specifically, I notice that the fold change shrinkage is performed in > nbinomWaldTest but not in nbinomChisqTest. Is this just for reasons of > backward-compatibility of results, or is the Chi-squared test > logically incompatible with shrunken GLM coefficients? The latter was our reason. With shrunken coefficients, the differences in deviances are no longer distributed as a chi-square. For example, with simulated null data (non-intercept betas equal to zero), the differences in deviances will pile up near zero and the resulting p-values will not be uniformly distributed. > Secondly, is there any plan to extend the Wald test to testing > contrasts of multiple coefficients or testing multiple > coefficients/contrasts at once in an ANOVA-like test? > Yes, we are looking into a convenient interface for this. best, Mike > Thanks, > > -Ryan Thompson > > On 03/12/2013 01:51 PM, Wolfgang Huber wrote: >> Dear DESeq users, >> >> Mike Love, Simon Anders and I have been updating the DESeq package. >> This resulted in the package DESeq2, which is available from the >> development branch, and scheduled for the next release: >> http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html >> >> For several release cycles, the original package (DESeq) will be >> maintained at its current functionality, in order to not disrupt the >> workflows of DESeq users. For new projects, we recommend using >> DESeq2. Major innovations are: >> >> * Base class: SummarizedExperiment is used as the superclass for >> storing the data, rather than eSet. This allows closer integration >> with upstream workflows involving GRanges and summarizeOverlaps, and >> facilitates downstream analyses of the genomic regions of interest. >> >> * Simplified workflow: the wrapper function DESeq() performs all >> steps for a differential expression analysis. The individual steps >> are of course also accessible. >> >> * More powerful statistics: incorporation of prior distributions into >> the estimation of dispersions and fold changes (empirical-Bayes >> shrinkage). The dispersion shrinkage improves power compared to the >> old DESeq. The fold changes shrinkage help moderate the otherwise >> large spread in log fold changes for genes with low counts, while it >> has negligible effect on genes with high counts; it may be >> particularly useful for visualisation, clustering, classification, >> ordination (PCA, MDS), similar to the variance-stabilizing >> transformation in the old DESeq. A Wald test for significance is >> provided as the default inference method, with the chi-squared test >> of the previous version is also available. A manuscript is in >> preparation. >> >> * Normalization: it is possible to provide a matrix of sample- *and* >> gene-specific normalization factors, which allows the use of >> normalisation factors from Bioconductor packages such as cqn and EDASeq. >> >> Examples of usage are provided in the vignette, and more details are >> available in the manual pages (specifically, the DESeq function and >> estimateDispersions function). >> >> Enjoy - >> >> Mike, Simon, Wolfgang. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 31 Date: Wed, 13 Mar 2013 06:26:46 -0400 From: Sean Davis <sdavis2@mail.nih.gov> To: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Error in getGEO command with GSEMatrix=TRUE Message-ID: <caneavbn94wf=w_+r-qy+49e=fwk11kcfwza_wakz-myohbwtya at="" mail.gmail.com=""> Content-Type: text/plain On Tue, Mar 12, 2013 at 9:55 PM, Marwaha, Shruti (marwahsi) < marwahsi at mail.uc.edu> wrote: > Dear Developers, > > I am trying to get an ExpressionSet from a GSEfile using getGEO command > with GSEMatrix=TRUE. It sometimes gives an error (Connection was reset) if > I use GSEMatrix=TRUE however sometimes it works smooth. I have the latest > version of R, Bioconductor and GEOquery. Please advise if I am missing some > library or need to change some settings. Thanks in advance. > > My Code > library(Biobase) > library(GEOquery) > library(limma) > gset <- getGEO("GSE27347", GSEMatrix =TRUE) > > Error: > Error in function (type, msg, asError = TRUE) : > Send failure: Connection was reset > > Hi, Marwaha. While I cannot be sure, I suspect that these are intermittent connection problems between you and NCBI. You could try wrapping your getGEO call in something like this to catch this condition : gse = NULL while(!inherits(gse,'list')) {gse=tryCatch( getGEO('GSE27347'), error=function(e) {Sys.sleep(5); message('retrying'); return(e)})} This will continuously retry the getGEO call every 5 seconds until the outcome is a list, which is what we expect getGEO to return in this case. Hopefully, this will make your code a bit more robust. Sean > Session Info > > > sessionInfo() > > R version 2.15.2 (2012-10-26) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > locale: > > [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 > > [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C > > [5] LC_TIME=English_India.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] limma_3.14.4 GEOquery_2.24.1 Biobase_2.18.0 > BiocGenerics_0.4.0 > > > > loaded via a namespace (and not attached): > > [1] RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2 > > System Properties > OS: Windows 7 > RAM: 4GB > Processor: 2GHz > > Methods already attempted > > * I have tried it on both RStudio and GUI > > * I have also tried using the command "setInternet2(use=FALSE)" > but it doesn't help. > > * I have tried this command with other GSEfiles also and get the > same error. > > * GEOquery works fine and downloads the GSEfiles if I give > GSEMatrix=FALSE > > * I have also tried downloading the file and provide the file path > of the file to getGEO with GSEMatrix=TRUE but it does not give an > ExpressionSet as output. > > Thanks & Regards, > Shruti Marwaha > > Graduate Student > Systems Biology and Physiology > University of Cincinnati > Medical Science Building, > 231 Albert Sabin Way > Cincinnati, OH, USA 45267 > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 121, Issue 13

RNASeq Microarray Normalization GO Classification Clustering Cancer Mus musculus probe GO • 2.1k views

ADD COMMENT • link 11.2 years ago Matthew Thornton ▴ 350