Entering edit mode
"bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote:
Send Bioconductor mailing list submissions to
bioconductor at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at r-project.org
You can reach the person managing the list at
bioconductor-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. edgeR: paired samples DGE and GLM (Alessandra [guest])
2. Re: methyAnalysis - changes to GenoSet (Kasper Daniel Hansen)
3. QuasiSeq vs DSS (Richard Friedman)
4. Re: methyAnalysis - changes to GenoSet (Tim Triche, Jr.)
5. displaying the image with EBImage (Nima [guest])
6. Help on Illumina HumanHT12 v4 chromosome filtering (Kemal Akat)
7. Re: QuasiSeq vs DSS (Ryan C. Thompson)
8. Re: QuasiSeq vs DSS (Richard Friedman)
9. Re: QuasiSeq vs DSS (Ryan C. Thompson)
10. Re: edgeR: paired samples DGE and GLM (Ryan C. Thompson)
11. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest
(Zhuzhu Zhang)
12. Subsampling of BAM/SAM alignments (Julian Gehring)
13. DESeq2 (Wolfgang Huber)
14. Re: displaying the image with EBImage (Wolfgang Huber)
15. Re: displaying the image with EBImage (Nima Ahmadian)
16. Re: displaying the image with EBImage (Andrzej Ole?)
17. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest
(Gordon K Smyth)
18. negative correlation from limma's duplicateCorrelation
(Gordon K Smyth)
19. anova like test in edgeR (Gordon K Smyth)
20. Re: displaying the image with EBImage (Nima Ahmadian)
21. Re: Negative parameter error when using goseq (Nadia Davidson)
22. Re: anova like test in edgeR (Gordon K Smyth)
23. Re: Subsampling of BAM/SAM alignments (Martin Morgan)
24. ArrayExpress Question (Amy Vandiver)
25. Error in getGEO command with GSEMatrix=TRUE
(Marwaha, Shruti (marwahsi))
26. biomaRt access for NCBIM37 build of mouse genome (chris_utah)
27. Re: anova like test in edgeR (Ryan C. Thompson)
28. Re: DESeq2 (Ryan C. Thompson)
29. Re: biomaRt access for NCBIM37 build of mouse genome
(Hans-Rudolf Hotz)
30. Re: DESeq2 (Michael Love)
31. Re: Error in getGEO command with GSEMatrix=TRUE (Sean Davis)
----------------------------------------------------------------------
Message: 1
Date: Tue, 12 Mar 2013 04:20:08 -0700 (PDT)
From: "Alessandra [guest]" <guest@bioconductor.org>
To: bioconductor at r-project.org, alessandra.breschi at crg.es
Subject: [BioC] edgeR: paired samples DGE and GLM
Message-ID: <20130312112008.0529C1434F2 at mamba.fhcrc.org>
Hi,
I am trying to use edgeR to compute differential gene expression.
I have a quite simple experimental design:
labExpId Patient sex Treatment
sample.4 1 F A
sample.3 2 M A
sample.5 2 M B
sample.7 3 F A
sample.0 4 M A
sample.8 3 F B
sample.1 4 M B
sample.2 1 F B
I am comparing the effect of treatment across all patients, and it is
fine when I apply exactTest. Then I wanted to include also the Patient
in my model and I try using GLM, but I found several problems. So, I
tried to apply GLM only including the Treatment just to troubleshoot.
I follow the instructions on page 22 from the user's guide revised in
December 2012.
My counts are stored in a matrix called m.
>colnames(m)
[1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8"
[7] "sample.0" "sample.1"
>design=model.matrix(~0+Treatment)
> design
TreatmentA TreatmentB
sample.4 1 0
sample.3 1 0
sample.5 0 1
sample.7 1 0
sample.0 1 0
sample.8 0 1
sample.1 0 1
sample.2 0 1
>M = DGEList(counts=na.omit(m))
>cpm.m <- cpm(M)
>M <- M[rowSums(cpm.m >1) >=1,]
>M <- calcNormFactors(M)
>M <- estimateGLMCommonDisp(M, design, verbose=T)
> names(M)
[1] "counts" "samples" "abundance"
[4] "logCPM" "common.dispersion"
First thing I notice, I don't see pseudo.counts and other attributes I
saw when calling estimateCommonDisp().
> M <- estimateGLMTrendedDisp(M, design)
> M <- estimateGLMTagwiseDisp(M, design)
> fit <- glmFit(M, design)
> lrt <- glmLRT(fit, contrast=c(-1,1))
> res = topTags(lrt, n=nrow(lrt))$table
> head(res)
logFC logCPM LR PValue
FDR
ENSG00000244115 15.512665 2.266789 35.30878 2.813602e-09
3.482958e-05
ENSG00000256329 -17.760869 4.514903 32.89653 9.719681e-09
6.015996e-05
ENSG00000223638 11.777617 -1.467638 18.46673 1.728967e-05
7.134295e-02
ENSG00000134321 -5.328058 5.959204 16.76318 4.234703e-05
1.310535e-01
ENSG00000196861 7.776972 -1.722111 15.83143 6.924259e-05
1.494412e-01
ENSG00000229292 11.368590 -1.876601 15.74621 7.243290e-05
1.494412e-01
> m[rownames(head(res)),]
sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
ENSG00000244115 0 0 0 0 0
6412
ENSG00000256329 0 0 0 32070 0
0
ENSG00000223638 0 0 2 0 0
0
ENSG00000134321 1178 590 890 83130 754 1116
ENSG00000196861 0 3 363 0 0
53
ENSG00000229292 0 0 0 0 0
0
sample.0 sample.1
ENSG00000244115 0 4415
ENSG00000256329 0 0
ENSG00000223638 434 550
ENSG00000134321 852 1092
ENSG00000196861 500 57
ENSG00000229292 344 406
I realized that it may be a problem of the ordering of the design
matrix rows
> colnames(m)
[1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" "sample.8"
[7] "sample.0" "sample.1"
> rownames(design)
[1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0" "sample.8"
[7] "sample.1" "sample.2"
so I redo it forcing the row names of my design matrix to be the same
order as the column names in the matrix count.
Now my design matrix looks like:
> design
TreatmentA TreatmentB
sample.2 0 1
sample.3 1 0
sample.4 1 0
sample.5 0 1
sample.7 1 0
sample.8 0 1
sample.0 1 0
sample.1 0 1
I execute exactly the same commands as before.
Again I cannot see the pseudocounts.
> names(M)
[1] "counts" "samples" "abundance"
[4] "logCPM" "common.dispersion"
> head(res)
logFC logCPM LR PValue FDR
ENSG00000074855 -32.17749 0.2759965 2121.104 0 0
ENSG00000076351 -32.17749 -0.4306713 1549.345 0 0
ENSG00000105552 -32.17749 -0.4864164 1502.658 0 0
ENSG00000106991 -32.17749 0.4622641 2144.947 0 0
ENSG00000123009 -32.17749 -0.2422835 1719.625 0 0
ENSG00000133216 -32.17749 0.2206127 2127.575 0 0
I get very different results still from the exactTest with no GLM.
I attribute this to the missing pseudocounts in the DGEList object?
> m[rownames(head(res)),]
sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
ENSG00000074855 0 948 902 0 676
0
ENSG00000076351 0 494 522 0 538
0
ENSG00000105552 0 490 454 0 486
0
ENSG00000106991 0 1094 798 0 698
0
ENSG00000123009 0 556 520 0 566
0
ENSG00000133216 0 808 762 0 730
0
sample.0 sample.1
ENSG00000074855 1166 0
ENSG00000076351 698 0
ENSG00000105552 764 0
ENSG00000106991 1733 0
ENSG00000123009 980 0
ENSG00000133216 1304 0
I see that these genes have all zero counts for TreatmentB and this
explains the low logFC, but I don't get this when I don't use GLM at
all (exactTest).
So in summary, my questions are:
1) Does the ordering of rows in the design matrix have to be the same
as the column order in the count matrix,
or the sample name is taken into account so the order should not
matter?
2) Do you have any idea why I don't get pseudocounts after estimating
the dispersion, and is it really this that causes such low logFC?
I am looking forward to hearing from you.
Many thanks in advance,
Ale
-- output of sessionInfo():
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] plyr_1.7.1 optparse_1.0.1 ggplot2_0.9.3.1 reshape2_1.2.1
[5] edgeR_3.0.8 limma_3.14.4
loaded via a namespace (and not attached):
[1] MASS_7.3-17 RColorBrewer_1.0-5 colorspace_1.1-1
dichromat_1.2-4
[5] digest_0.5.2 getopt_1.19.1 grid_2.15.0
gtable_0.1.2
[9] labeling_0.1 munsell_0.3 proto_0.3-9.2
scales_0.2.3
[13] stringr_0.6 tools_2.15.0
--
Sent via the guest posting facility at bioconductor.org.
------------------------------
Message: 2
Date: Tue, 12 Mar 2013 09:57:52 -0400
From: Kasper Daniel Hansen <kasperdanielhansen@gmail.com>
To: Lavinia Gordon <lavinia.gordon at="" mcri.edu.au="">
Cc: Zilbauer Matthias <mz304 at="" medschl.cam.ac.uk="">, "dupan at
gmal.com"
<dupan at="" gmal.com="">, "bioconductor at r-project.org"
<bioconductor at="" r-project.org="">
Subject: Re: [BioC] methyAnalysis - changes to GenoSet
Message-ID:
<cac2h7usdeop-zsi_oxp3asneqkg5msoeuys62hh96w9qhqupfa at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Lavina,
I don't really have time to really look into all of this, but note the
function mapToGenome which associates a MethylSet with genomic
coordinates. So you can do
gmSet = mapToGenome(mSet)
and now you have
granges(gmSet) # locations
getMeth(gmSet) # methylation channel
etc. Note that the transformation is non-invertible because a few
CpGs have no associated genomic locations and they are dropped (well,
you can use drop=FALSE).
Kasper
On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon
<lavinia.gordon at="" mcri.edu.au=""> wrote:
> Argh!, thanks Martin, sorry about that.
>
> So in that case I think I've worked out a (rather ugly) way to force
minfi data into a MethyGenoSet object for use within the package
methyAnalysis:
>
> Library(minfi)
> # mset is minfi object, e.g. used in dmpFinder code
> myexprs <- getM(mSet)
> mymeth <- getMeth(mSet)
> myunmeth <- getUnmeth(mSet)
>
> # using Tim Triches helpful code, thanks!
> library(IlluminaHumanMethylation450kprobe)
> data(IlluminaHumanMethylation450kprobe)
> library(GenomicRanges)
> chs = levels(IlluminaHumanMethylation450kprobe$chr)
> names(chs) = paste('chr',chs,sep='')
> CpGs.450k = with(IlluminaHumanMethylation450kprobe,
> GRanges(paste('chr',chr,sep=''),
> IRanges(start=site, width=2, names=Probe_ID),
> strand=strand))
>
> CpGs.450k.df <- as.data.frame(CpGs.450k)
>
> mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start,
end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)),
space=CpGs.450k.df$seqnames)
>
> mymethy.obj <- new("MethyGenoSet", locData=mylocData,
assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"),
unmethylated=new("matrix"), detection=new("matrix")))
>
> # make sure the CpG ids in mylocData agree with what is in
CpGs.450k.df
>
> exprs(mymethy.obj) <- myexprs
> methylated(mymethy.obj) <- mymeth
> unmethylated(mymethy.obj) <- myunmeth
> # had to add this afterwards
>
> methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250)
>
> I haven't run through all of the MethyAnalysis functions so the
MethyGenoSet object may need more information added to it, in other
slots, for it to work properly.
>
>
> Lavinia Gordon
> Senior Research Officer
> Quantitative Sciences Core, Bioinformatics
>
> Murdoch Childrens Research Institute
> The Royal Children's Hospital
> Flemington Road Parkville Victoria 3052 Australia
> T 03 8341 6221
> www.mcri.edu.au
>
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Saturday, 9 March 2013 1:49 AM
> To: Lavinia Gordon
> Cc: bioconductor at r-project.org; dupan at gmal.com
> Subject: Re: [BioC] methyAnalysis - changes to GenoSet
>
> On 03/07/2013 08:23 PM, Lavinia Gordon wrote:
>> Dear Bioc users,
>> I have just tried the example code supplied with the package
methyAnalysis and am getting an error:
>>
>>> ###################################################
>>> ### code chunk number 3: Slide-window smoothing of the DNA
>>> methylation data
###################################################
>>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize =
>>> 250)
>> Smoothing Chromosome 21 ...
>>
>> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for
>> function ?[?, target signature
?MethyGenoSet#character#character#missing?.
>> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message:
>> The ranges method on a GenoSet is depricated. Please use
space(locData(x)) or seqnames(locData(x)) as appropriate for
RangedData or GRanges.
>>> methyGenoset.sm
>> Error: object 'methyGenoset.sm' not found
>
> typo -- methyGenoSet.sm
>
> Martin
>
>>> sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8
>> [2] LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8
>> [4] LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8
>> [6] LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C
>> [8] LC_NAME=C
>> [9] LC_ADDRESS=C
>> [10] LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8
>> [12] LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid stats graphics grDevices
>> [5] utils datasets methods base
>>
>> other attached packages:
>> [1] methyAnalysis_1.0.0 org.Hs.eg.db_2.8.0
>> [3] RSQLite_0.11.2 DBI_0.2-5
>> [5] AnnotationDbi_1.20.5 Biobase_2.18.0
>> [7] IRanges_1.16.6 BiocGenerics_0.4.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.36.1 affyio_1.26.0
>> [3] annotate_1.36.0 BiocInstaller_1.8.3
>> [5] biomaRt_2.14.0 Biostrings_2.26.3
>> [7] biovizBase_1.6.2 bitops_1.0-5
>> [9] BSgenome_1.26.1 cluster_1.14.3
>> [11] colorspace_1.2-1 dichromat_2.0-0
>> [13] genefilter_1.40.0 GenomicFeatures_1.10.2
>> [15] GenomicRanges_1.10.7 genoset_1.10.1
>> [17] Gviz_1.2.1 Hmisc_3.10-1
>> [19] KernSmooth_2.23-9 labeling_0.1
>> [21] lattice_0.20-13 lumi_2.10.0
>> [23] MASS_7.3-23 Matrix_1.0-11
>> [25] methylumi_2.4.0 mgcv_1.7-22
>> [27] munsell_0.4 nleqslv_2.0
>> [29] nlme_3.1-108 parallel_2.15.2
>> [31] plyr_1.8 preprocessCore_1.20.0
>> [33] RColorBrewer_1.0-5 RCurl_1.95-4.1
>> [35] Rsamtools_1.10.2 rtracklayer_1.18.2
>> [37] scales_0.2.3 splines_2.15.2
>> [39] stats4_2.15.2 stringr_0.6.2
>> [41] survival_2.37-4 tools_2.15.2
>> [43] XML_3.95-0.2 xtable_1.7-1
>> [45] zlibbioc_1.4.0
>>>
>>
>> With regards,
>>
>> Lavinia Gordon
>> Senior Research Officer
>> Quantitative Sciences Core, Bioinformatics
>>
>> Murdoch Childrens Research Institute
>> The Royal Children's Hospital
>> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221
>> www.mcri.edu.au<http: www.mcri.edu.au=""/>
>>
>>
>>
______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud
service.
>> For more information please visit http://www.symanteccloud.com
>>
______________________________________________________________________
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
>
______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud
service.
> For more information please visit http://www.symanteccloud.com
>
> If you have any question, please contact MCRI IT Helpdesk for
further assistance.
>
______________________________________________________________________
>
>
______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud
service.
> For more information please visit http://www.symanteccloud.com
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 3
Date: Tue, 12 Mar 2013 10:05:02 -0400
From: Richard Friedman <friedman@cancercenter.columbia.edu>
To: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: [BioC] QuasiSeq vs DSS
Message-ID:
<e41ffaa1-4082-41d3-88b1-034f5586683e at="" cancercenter.columbia.edu="">
Content-Type: text/plain; charset=us-ascii
Dear List.
The papers on DSS (included in Bioconductor):
Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves
differential expression detection in RNA-seq data. Biostatistics. 2013
Apr;14(2):232-43.
and QuasiSeq (included in CRAN):
Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential
expression
in RNA-sequence data using quasi-likelihood with shrunken dispersion
estimates.
Stat Appl Genet Mol Biol. 2012
both give evidence of superior performance to edgeR (if I understand
them correctly).
Have the two methods been compared?
Can the 2 methods been combined (with DSS estimating the dispersion
used in
the quasi-negative bionomial disribution used in QuasiSeq)?
I would appreciate any insight with respect to what is the overall
best
method for differential expression in RNASeq available at present.
Thanks and best wishes,
Rich
Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
Columbia Initiative in Systems Biology
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/
Fritz Lang. Didn't he do "Star Trek".
-Rose Friedman, age 16
------------------------------
Message: 4
Date: Tue, 12 Mar 2013 08:32:19 -0700
From: "Tim Triche, Jr." <tim.triche@gmail.com>
To: Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com="">
Cc: Zilbauer Matthias <mz304 at="" medschl.cam.ac.uk="">, Lavinia
Gordon
<lavinia.gordon at="" mcri.edu.au="">, "bioconductor at
r-project.org"
<bioconductor at="" r-project.org="">, "dupan at gmal.com" <dupan at="" gmal.com="">
Subject: Re: [BioC] methyAnalysis - changes to GenoSet
Message-ID: <fa98bc57-3830-4b9c-8c77-df5b0cc9a177 at="" gmail.com="">
Content-Type: text/plain; charset=utf-8
Right, my original intent was simply to coerce the
SummarizedExperiment-based containers from methylumi and minfi into a
genoset for Pan's package.
That reminds me, I need to update the documentation for 450kProbe to
note that it is essentially obsolete, since both methylumi and minfi
can emit SummarizedExperiments. The only mop-up now is for me to send
in a few minfi patches for processing. The innards of minfi are tidier
than... Others.
--t
On Mar 12, 2013, at 6:57 AM, Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> wrote:
> Lavina,
>
> I don't really have time to really look into all of this, but note
the
> function mapToGenome which associates a MethylSet with genomic
> coordinates. So you can do
>
> gmSet = mapToGenome(mSet)
>
> and now you have
>
> granges(gmSet) # locations
> getMeth(gmSet) # methylation channel
>
> etc. Note that the transformation is non-invertible because a few
> CpGs have no associated genomic locations and they are dropped
(well,
> you can use drop=FALSE).
>
> Kasper
>
> On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon
> <lavinia.gordon at="" mcri.edu.au=""> wrote:
>> Argh!, thanks Martin, sorry about that.
>>
>> So in that case I think I've worked out a (rather ugly) way to
force minfi data into a MethyGenoSet object for use within the package
methyAnalysis:
>>
>> Library(minfi)
>> # mset is minfi object, e.g. used in dmpFinder code
>> myexprs <- getM(mSet)
>> mymeth <- getMeth(mSet)
>> myunmeth <- getUnmeth(mSet)
>>
>> # using Tim Triches helpful code, thanks!
>> library(IlluminaHumanMethylation450kprobe)
>> data(IlluminaHumanMethylation450kprobe)
>> library(GenomicRanges)
>> chs = levels(IlluminaHumanMethylation450kprobe$chr)
>> names(chs) = paste('chr',chs,sep='')
>> CpGs.450k = with(IlluminaHumanMethylation450kprobe,
>> GRanges(paste('chr',chr,sep=''),
>> IRanges(start=site, width=2, names=Probe_ID),
>> strand=strand))
>>
>> CpGs.450k.df <- as.data.frame(CpGs.450k)
>>
>> mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start,
end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)),
space=CpGs.450k.df$seqnames)
>>
>> mymethy.obj <- new("MethyGenoSet", locData=mylocData,
assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"),
unmethylated=new("matrix"), detection=new("matrix")))
>>
>> # make sure the CpG ids in mylocData agree with what is in
CpGs.450k.df
>>
>> exprs(mymethy.obj) <- myexprs
>> methylated(mymethy.obj) <- mymeth
>> unmethylated(mymethy.obj) <- myunmeth
>> # had to add this afterwards
>>
>> methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250)
>>
>> I haven't run through all of the MethyAnalysis functions so the
MethyGenoSet object may need more information added to it, in other
slots, for it to work properly.
>>
>>
>> Lavinia Gordon
>> Senior Research Officer
>> Quantitative Sciences Core, Bioinformatics
>>
>> Murdoch Childrens Research Institute
>> The Royal Children's Hospital
>> Flemington Road Parkville Victoria 3052 Australia
>> T 03 8341 6221
>> www.mcri.edu.au
>>
>>
>> -----Original Message-----
>> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
>> Sent: Saturday, 9 March 2013 1:49 AM
>> To: Lavinia Gordon
>> Cc: bioconductor at r-project.org; dupan at gmal.com
>> Subject: Re: [BioC] methyAnalysis - changes to GenoSet
>>
>> On 03/07/2013 08:23 PM, Lavinia Gordon wrote:
>>> Dear Bioc users,
>>> I have just tried the example code supplied with the package
methyAnalysis and am getting an error:
>>>
>>>> ###################################################
>>>> ### code chunk number 3: Slide-window smoothing of the DNA
>>>> methylation data
###################################################
>>>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize =
>>>> 250)
>>> Smoothing Chromosome 21 ...
>>>
>>> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for
>>> function ?[?, target signature
?MethyGenoSet#character#character#missing?.
>>> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message:
>>> The ranges method on a GenoSet is depricated. Please use
space(locData(x)) or seqnames(locData(x)) as appropriate for
RangedData or GRanges.
>>>> methyGenoset.sm
>>> Error: object 'methyGenoset.sm' not found
>>
>> typo -- methyGenoSet.sm
>>
>> Martin
>>
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8
>>> [2] LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8
>>> [4] LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8
>>> [6] LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=C
>>> [8] LC_NAME=C
>>> [9] LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8
>>> [12] LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] grid stats graphics grDevices
>>> [5] utils datasets methods base
>>>
>>> other attached packages:
>>> [1] methyAnalysis_1.0.0 org.Hs.eg.db_2.8.0
>>> [3] RSQLite_0.11.2 DBI_0.2-5
>>> [5] AnnotationDbi_1.20.5 Biobase_2.18.0
>>> [7] IRanges_1.16.6 BiocGenerics_0.4.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.36.1 affyio_1.26.0
>>> [3] annotate_1.36.0 BiocInstaller_1.8.3
>>> [5] biomaRt_2.14.0 Biostrings_2.26.3
>>> [7] biovizBase_1.6.2 bitops_1.0-5
>>> [9] BSgenome_1.26.1 cluster_1.14.3
>>> [11] colorspace_1.2-1 dichromat_2.0-0
>>> [13] genefilter_1.40.0 GenomicFeatures_1.10.2
>>> [15] GenomicRanges_1.10.7 genoset_1.10.1
>>> [17] Gviz_1.2.1 Hmisc_3.10-1
>>> [19] KernSmooth_2.23-9 labeling_0.1
>>> [21] lattice_0.20-13 lumi_2.10.0
>>> [23] MASS_7.3-23 Matrix_1.0-11
>>> [25] methylumi_2.4.0 mgcv_1.7-22
>>> [27] munsell_0.4 nleqslv_2.0
>>> [29] nlme_3.1-108 parallel_2.15.2
>>> [31] plyr_1.8 preprocessCore_1.20.0
>>> [33] RColorBrewer_1.0-5 RCurl_1.95-4.1
>>> [35] Rsamtools_1.10.2 rtracklayer_1.18.2
>>> [37] scales_0.2.3 splines_2.15.2
>>> [39] stats4_2.15.2 stringr_0.6.2
>>> [41] survival_2.37-4 tools_2.15.2
>>> [43] XML_3.95-0.2 xtable_1.7-1
>>> [45] zlibbioc_1.4.0
>>>
>>> With regards,
>>>
>>> Lavinia Gordon
>>> Senior Research Officer
>>> Quantitative Sciences Core, Bioinformatics
>>>
>>> Murdoch Childrens Research Institute
>>> The Royal Children's Hospital
>>> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221
>>> www.mcri.edu.au<http: www.mcri.edu.au=""/>
>>>
>>>
>>>
______________________________________________________________________
>>> This email has been scanned by the Symantec Email Security.cloud
service.
>>> For more information please visit http://www.symanteccloud.com
>>>
______________________________________________________________________
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>>
______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud
service.
>> For more information please visit http://www.symanteccloud.com
>>
>> If you have any question, please contact MCRI IT Helpdesk for
further assistance.
>>
______________________________________________________________________
>>
>>
______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud
service.
>> For more information please visit http://www.symanteccloud.com
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 5
Date: Tue, 12 Mar 2013 09:30:32 -0700 (PDT)
From: "Nima [guest]" <guest@bioconductor.org>
To: bioconductor at r-project.org, ahmadian.n at gmail.com
Subject: [BioC] displaying the image with EBImage
Message-ID: <20130312163032.69E5714352C at mamba.fhcrc.org>
Dear All
I have problem with displaying my images in R, all functions are
working except display
my OS is windows 7 but Mozilla Firefox gives this message:
Oops the page you are looking for could not be found
would you please help me in this regard??
Regards
Nima
-- output of sessionInfo():
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] EBImage_4.0.0
loaded via a namespace (and not attached):
[1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4
--
Sent via the guest posting facility at bioconductor.org.
------------------------------
Message: 6
Date: Tue, 12 Mar 2013 16:35:10 +0000
From: Kemal Akat <kakat@mail.rockefeller.edu>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: [BioC] Help on Illumina HumanHT12 v4 chromosome filtering
Message-ID: <25FA2F2F-5525-4206-9515-C45CF82AE54F at rockefeller.edu>
Content-Type: text/plain; charset="us-ascii"
Hi all,
I am running into problems when I want to remove probes targeting
chromosome Y.
I can reproduce this behavior with the data from the
beadarrayExampleData package:
library(beadarray)
library(illuminaHumanv4.db)
library(beadarrayExampleData)
data(exampleSummaryData)
## filter for probe quality
ids = as.character(featureNames(exampleSummaryData))
qual = unlist(mget(ids, illuminaHumanv4PROBEQUALITY, ifnotfound = NA))
rem = qual == "No match" | qual == "Bad" | is.na(qual)
exampleSummaryData_filt = exampleSummaryData[!rem]
## get chromosome location for remaining probes
ids = as.character(featureNames(exampleSummaryData_filt))
chr = unlist(mget(ids, illuminaHumanv4CHR, ifnotfound = NA))
## filter out probes targeting the Y chromosome
rem_chr= chr == "Y"
exampleSummaryData_filt = exampleSummaryData_filt[!rem_chr]
Error in obj[i, , ..., drop = drop] :
(subscript) logical subscript too long
Calls: [ ... [ -> callNextMethod -> .nextMethod -> lapply -> FUN
The eSet (Illumina) object starts with
R> dim(exampleSummaryData)
Features Samples Channels
49576 12 2
probes. After quality filtering
R> dim(exampleSummaryData_filt) ## after filtering for probe quality
Features Samples Channels
30084 12 2
R>
Now, for the second filtering (ids =
as.character(featureNames(exampleSummaryData_filt) etc.):
R> length(ids)
[1] 30084
R> length(chr)
[1] 30109
Where are the extra probes coming from? I tried to match only the ones
in exampleSummaryData_filt),
idx = match(ids, names(chr))
chr = chr[idx]
but this led to other problems.
Maybe I am missing something obvious?
Thank you for any hints or help!
Kemal
R> sessionInfo()
R Under development (unstable) (2013-02-06 r61857)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] beadarrayExampleData_1.0.5 illuminaHumanv4.db_1.16.0
org.Hs.eg.db_2.9.0 RSQLite_0.11.2
[5] DBI_0.2-5 AnnotationDbi_1.21.13
beadarray_2.9.2 ggplot2_0.9.3
[9] Biobase_2.19.3 BiocGenerics_0.5.6
setwidth_1.0-1
loaded via a namespace (and not attached):
[1] AnnotationForge_1.1.10 BeadDataPackR_1.11.0 colorspace_1.2-0
dichromat_1.2-4 digest_0.6.0
[6] grid_3.0.0 gtable_0.1.2 IRanges_1.17.36
labeling_0.1 limma_3.15.15
[11] MASS_7.3-23 munsell_0.4 plyr_1.8
proto_0.3-10 RColorBrewer_1.0-5
[16] reshape2_1.2.2 scales_0.2.3 stats4_3.0.0
stringr_0.6.2 tools_3.0.0
R>
------------------------------
Message: 7
Date: Tue, 12 Mar 2013 09:59:41 -0700
From: "Ryan C. Thompson" <rct@thompsonclan.org>
To: Richard Friedman <friedman at="" cancercenter.columbia.edu="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID: <513F5EFD.3060606 at thompsonclan.org>
Content-Type: text/plain
Dear Rich,
From what I can tell, it should be possible. The development version
of
DESeq2 implements the DSS "squeezing" method combined with edgeR's
Cox-Reid dispersion estimation. You could use DESeq2 to estimate
dispersions, and then copy those dispersion values into an edgeR
DGEList
object. Then you can use edgeR::glmQLFTest, which implements
(approximately) the QuasiSeq method.
I have not had time yet to investigate putting these packages together
in this way, but it is something I plan to look at. I'm certain that
the
combination is technically possible, and I'm reasonably sure that the
result would be statistically meaningful.
-Ryan Thompson
On Mar 12, 2013 7:06 AM, "Richard Friedman"
<friedman at="" cancercenter.columbia.edu="" <mailto:friedman="" at="" cancercenter.columbia.edu="">> wrote:
Dear List.
The papers on DSS (included in Bioconductor):
Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion
improves
differential expression detection in RNA-seq data. Biostatistics.
2013
Apr;14(2):232-43.
and QuasiSeq (included in CRAN):
Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting
differential expression
in RNA-sequence data using quasi-likelihood with shrunken
dispersion
estimates.
Stat Appl Genet Mol Biol. 2012
both give evidence of superior performance to edgeR (if I
understand
them correctly).
Have the two methods been compared?
Can the 2 methods been combined (with DSS estimating the
dispersion
used in
the quasi-negative bionomial disribution used in QuasiSeq)?
I would appreciate any insight with respect to what is the overall
best
method for differential expression in RNASeq available at present.
Thanks and best wishes,
Rich
Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks
(MAGNet)/
Columbia Initiative in Systems Biology
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
<mailto:friedman at="" cancercenter.columbia.edu="">
http://cancercenter.columbia.edu/~friedman/
<http: cancercenter.columbia.edu="" %7efriedman=""/>
Fritz Lang. Didn't he do "Star Trek".
-Rose Friedman, age 16
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
------------------------------
Message: 8
Date: Tue, 12 Mar 2013 13:11:25 -0400
From: Richard Friedman <friedman@cancercenter.columbia.edu>
To: "Ryan C. Thompson" <rct at="" thompsonclan.org="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID:
<961B56D4-1114-47E9-B6E0-935BA09E9EDF at
cancercenter.columbia.edu>
Content-Type: text/plain; charset=us-ascii
Dear Ryan,
Thank you for your response.
3 questions:
1. If I had just a simple pairwise comparison is it known DSS or
QuasiSeq better?
2. I was unaware that an approximate implementation of QuasiSeq was
available in
edgeR. If so, is it known hor it compare to the ordinairy EdgeR on
the one hand and the
full QuasiSeq on the other.
3. And I guess that the third question is for Gordon - Is using DSS
and QuasiSeq (or EdgeR) together
desireable and if so, are there plans to incorporate DSS into QuasiSeq
(EdgeR).
My note was planning ahead. I will still be in the microarray world
for a more few weeks
before I return to learning RNASeq. I wanted to know what the best
practice is.
If you (or anybody out there) develops a script to meld the two
methods, I am sure that
it would be interesting to the list.
Best wishes,
Rich
On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote:
> Dear Rich,
> From what I can tell, it should be possible. The development version
of DESeq2 implements the DSS "squeezing" method combined with edgeR's
Cox-Reid dispersion estimation. You could use DESeq2 to estimate
dispersions, and then copy those dispersion values into an edgeR
DGEList object. Then you can use edgeR::glmQLFTest, which
implements (approximately) the QuasiSeq method.
> I have not had time yet to investigate putting these packages
together in this way, but it is something I plan to look at. I'm
certain that the combination is technically possible, and I'm
reasonably sure that the result would be statistically meaningful.
> -Ryan Thompson
> On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at="" cancercenter.columbia.edu=""> wrote:
> Dear List.
>
> The papers on DSS (included in Bioconductor):
>
> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion
improves
> differential expression detection in RNA-seq data. Biostatistics.
2013
> Apr;14(2):232-43.
>
> and QuasiSeq (included in CRAN):
>
> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential
expression
> in RNA-sequence data using quasi-likelihood with shrunken dispersion
estimates.
> Stat Appl Genet Mol Biol. 2012
>
> both give evidence of superior performance to edgeR (if I understand
them correctly).
>
> Have the two methods been compared?
> Can the 2 methods been combined (with DSS estimating the dispersion
used in
> the quasi-negative bionomial disribution used in QuasiSeq)?
>
> I would appreciate any insight with respect to what is the overall
best
> method for differential expression in RNASeq available at present.
>
> Thanks and best wishes,
> Rich
>
>
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks
(MAGNet)/
> Columbia Initiative in Systems Biology
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
>
> Fritz Lang. Didn't he do "Star Trek".
> -Rose Friedman, age 16
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 9
Date: Tue, 12 Mar 2013 12:15:25 -0700
From: "Ryan C. Thompson" <rct@thompsonclan.org>
To: Richard Friedman <friedman at="" cancercenter.columbia.edu="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID: <513F7ECD.8090402 at thompsonclan.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Here is a quote from Gordon Smyth a few months ago in response to a
question of mine, which I think neatly summarizes the relationship
between QuasiSeq and edgeR::glmQLFTest:
> glmQLFTest() and QuasiSeq were developed independently with the same
> idea, hence we got together to write the paper. If you use common
> dispersion to fit the linear model, then glmQLFTest() is like
> NegBinQLShrink. If you use trended dispersion to fit the linear
model
> (recommended), then glmQLFTest() is like NegBinQLSpline. They are
not
> quite identical however. glmQLTest() leverages the functionality of
> the edgeR and limma packages, whereas QuasiSeq has used its own
> implementations of everything. The latter are described in the
paper.
(Gordon, I hope you don't mind me posting this publically. Hopefully
it
saves you the trouble of rewriting it.)
-Ryan
On Tue 12 Mar 2013 10:11:25 AM PDT, Richard Friedman wrote:
>
> Dear Ryan,
>
> Thank you for your response.
> 3 questions:
> 1. If I had just a simple pairwise comparison is it known DSS or
> QuasiSeq better?
> 2. I was unaware that an approximate implementation of QuasiSeq was
> available in
> edgeR. If so, is it known hor it compare to the ordinairy EdgeR on
the
> one hand and the
> full QuasiSeq on the other.
> 3. And I guess that the third question is for Gordon - Is using DSS
> and QuasiSeq (or EdgeR) together
> desireable and if so, are there plans to incorporate DSS into
QuasiSeq
> (EdgeR).
>
> My note was planning ahead. I will still be in the microarray world
> for a more few weeks
> before I return to learning RNASeq. I wanted to know what the best
> practice is.
> If you (or anybody out there) develops a script to meld the two
> methods, I am sure that
> it would be interesting to the list.
>
> Best wishes,
> Rich
>
>
>
>
>
> On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote:
>
>>
>> Dear Rich,
>> From what I can tell, it should be possible. The development
version
>> of DESeq2 implements the DSS "squeezing" method combined with
edgeR's
>> Cox-Reid dispersion estimation. You could use DESeq2 to estimate
>> dispersions, and then copy those dispersion values into an edgeR
>> DGEList object. Then you can use edgeR::glmQLFTest, which
implements
>> (approximately) the QuasiSeq method.
>> I have not had time yet to investigate putting these packages
>> together in this way, but it is something I plan to look at. I'm
>> certain that the combination is technically possible, and I'm
>> reasonably sure that the result would be statistically meaningful.
>> -Ryan Thompson
>> On Mar 12, 2013 7:06 AM, "Richard Friedman"
>> <friedman at="" cancercenter.columbia.edu=""> wrote:
>> Dear List.
>>
>> The papers on DSS (included in Bioconductor):
>>
>> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion
improves
>> differential expression detection in RNA-seq data. Biostatistics.
2013
>> Apr;14(2):232-43.
>>
>> and QuasiSeq (included in CRAN):
>>
>> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential
>> expression
>> in RNA-sequence data using quasi-likelihood with shrunken
dispersion
>> estimates.
>> Stat Appl Genet Mol Biol. 2012
>>
>> both give evidence of superior performance to edgeR (if I
understand
>> them correctly).
>>
>> Have the two methods been compared?
>> Can the 2 methods been combined (with DSS estimating the dispersion
>> used in
>> the quasi-negative bionomial disribution used in QuasiSeq)?
>>
>> I would appreciate any insight with respect to what is the overall
best
>> method for differential expression in RNASeq available at present.
>>
>> Thanks and best wishes,
>> Rich
>>
>>
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks
(MAGNet)/
>> Columbia Initiative in Systems Biology
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> Fritz Lang. Didn't he do "Star Trek".
>> -Rose Friedman, age 16
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 10
Date: Tue, 12 Mar 2013 12:26:31 -0700
From: "Ryan C. Thompson" <rct@thompsonclan.org>
To: "Alessandra [guest]" <guest at="" bioconductor.org="">
Cc: bioconductor at r-project.org, alessandra.breschi at crg.es
Subject: Re: [BioC] edgeR: paired samples DGE and GLM
Message-ID: <513F8167.8090900 at thompsonclan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
See this previous discussion for a discussion of the pseudocounts:
https://stat.ethz.ch/pipermail/bioconductor/2013-March/051182.html
Only the non-GLM method makes use of pseudocounts, so it's not
important
that they are missing from the GLM analysis. I believe the design rows
and count columns *do* need to match up in the same order. To do a
paired test, you can just use model.matrix(~0+Treatment+Patient) and
proceed as you already have. The GLM method does give non-identical
results, and in general it seems that people are assuming that the GLM
method is better overall, though as always, proof is scarce when
sequencing is so expensive.
Hope this answers your questions.
-Ryan Thompson
On 03/12/2013 04:20 AM, Alessandra [guest] wrote:
> Hi,
>
> I am trying to use edgeR to compute differential gene expression.
>
> I have a quite simple experimental design:
> labExpId Patient sex Treatment
> sample.4 1 F A
> sample.3 2 M A
> sample.5 2 M B
> sample.7 3 F A
> sample.0 4 M A
> sample.8 3 F B
> sample.1 4 M B
> sample.2 1 F B
>
>
> I am comparing the effect of treatment across all patients, and it
is fine when I apply exactTest. Then I wanted to include also the
Patient in my model and I try using GLM, but I found several problems.
So, I tried to apply GLM only including the Treatment just to
troubleshoot. I follow the instructions on page 22 from the user's
guide revised in December 2012.
>
> My counts are stored in a matrix called m.
>
>> colnames(m)
> [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8"
> [7] "sample.0" "sample.1"
>
>> design=model.matrix(~0+Treatment)
>> design
> TreatmentA TreatmentB
> sample.4 1 0
> sample.3 1 0
> sample.5 0 1
> sample.7 1 0
> sample.0 1 0
> sample.8 0 1
> sample.1 0 1
> sample.2 0 1
>
>> M = DGEList(counts=na.omit(m))
>> cpm.m <- cpm(M)
>> M <- M[rowSums(cpm.m >1) >=1,]
>> M <- calcNormFactors(M)
>> M <- estimateGLMCommonDisp(M, design, verbose=T)
>> names(M)
> [1] "counts" "samples" "abundance"
> [4] "logCPM" "common.dispersion"
>
> First thing I notice, I don't see pseudo.counts and other attributes
I saw when calling estimateCommonDisp().
>
>> M <- estimateGLMTrendedDisp(M, design)
>> M <- estimateGLMTagwiseDisp(M, design)
>> fit <- glmFit(M, design)
>> lrt <- glmLRT(fit, contrast=c(-1,1))
>> res = topTags(lrt, n=nrow(lrt))$table
>> head(res)
> logFC logCPM LR PValue
FDR
> ENSG00000244115 15.512665 2.266789 35.30878 2.813602e-09
3.482958e-05
> ENSG00000256329 -17.760869 4.514903 32.89653 9.719681e-09
6.015996e-05
> ENSG00000223638 11.777617 -1.467638 18.46673 1.728967e-05
7.134295e-02
> ENSG00000134321 -5.328058 5.959204 16.76318 4.234703e-05
1.310535e-01
> ENSG00000196861 7.776972 -1.722111 15.83143 6.924259e-05
1.494412e-01
> ENSG00000229292 11.368590 -1.876601 15.74621 7.243290e-05
1.494412e-01
>
>> m[rownames(head(res)),]
> sample.2 sample.3 sample.4 sample.5 sample.7
sample.8
> ENSG00000244115 0 0 0 0 0
6412
> ENSG00000256329 0 0 0 32070 0
0
> ENSG00000223638 0 0 2 0 0
0
> ENSG00000134321 1178 590 890 83130 754 1116
> ENSG00000196861 0 3 363 0 0
53
> ENSG00000229292 0 0 0 0 0
0
>
> sample.0 sample.1
> ENSG00000244115 0 4415
> ENSG00000256329 0 0
> ENSG00000223638 434 550
> ENSG00000134321 852 1092
> ENSG00000196861 500 57
> ENSG00000229292 344 406
>
> I realized that it may be a problem of the ordering of the design
matrix rows
>
>> colnames(m)
> [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7"
"sample.8"
> [7] "sample.0" "sample.1"
>> rownames(design)
> [1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0"
"sample.8"
> [7] "sample.1" "sample.2"
>
> so I redo it forcing the row names of my design matrix to be the
same order as the column names in the matrix count.
> Now my design matrix looks like:
>
>> design
> TreatmentA TreatmentB
> sample.2 0 1
> sample.3 1 0
> sample.4 1 0
> sample.5 0 1
> sample.7 1 0
> sample.8 0 1
> sample.0 1 0
> sample.1 0 1
>
> I execute exactly the same commands as before.
> Again I cannot see the pseudocounts.
>> names(M)
> [1] "counts" "samples" "abundance"
> [4] "logCPM" "common.dispersion"
>
>> head(res)
> logFC logCPM LR PValue FDR
> ENSG00000074855 -32.17749 0.2759965 2121.104 0 0
> ENSG00000076351 -32.17749 -0.4306713 1549.345 0 0
> ENSG00000105552 -32.17749 -0.4864164 1502.658 0 0
> ENSG00000106991 -32.17749 0.4622641 2144.947 0 0
> ENSG00000123009 -32.17749 -0.2422835 1719.625 0 0
> ENSG00000133216 -32.17749 0.2206127 2127.575 0 0
>
> I get very different results still from the exactTest with no GLM.
> I attribute this to the missing pseudocounts in the DGEList object?
>
>> m[rownames(head(res)),]
> sample.2 sample.3 sample.4 sample.5 sample.7
sample.8
> ENSG00000074855 0 948 902 0 676
0
> ENSG00000076351 0 494 522 0 538
0
> ENSG00000105552 0 490 454 0 486
0
> ENSG00000106991 0 1094 798 0 698
0
> ENSG00000123009 0 556 520 0 566
0
> ENSG00000133216 0 808 762 0 730
0
> sample.0 sample.1
> ENSG00000074855 1166 0
> ENSG00000076351 698 0
> ENSG00000105552 764 0
> ENSG00000106991 1733 0
> ENSG00000123009 980 0
> ENSG00000133216 1304 0
>
> I see that these genes have all zero counts for TreatmentB and this
explains the low logFC, but I don't get this when I don't use GLM at
all (exactTest).
>
> So in summary, my questions are:
>
> 1) Does the ordering of rows in the design matrix have to be the
same as the column order in the count matrix,
> or the sample name is taken into account so the order should not
matter?
>
> 2) Do you have any idea why I don't get pseudocounts after
estimating the dispersion, and is it really this that causes such low
logFC?
>
> I am looking forward to hearing from you.
>
> Many thanks in advance,
> Ale
>
>
> -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] plyr_1.7.1 optparse_1.0.1 ggplot2_0.9.3.1 reshape2_1.2.1
> [5] edgeR_3.0.8 limma_3.14.4
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-17 RColorBrewer_1.0-5 colorspace_1.1-1
dichromat_1.2-4
> [5] digest_0.5.2 getopt_1.19.1 grid_2.15.0
gtable_0.1.2
> [9] labeling_0.1 munsell_0.3 proto_0.3-9.2
scales_0.2.3
> [13] stringr_0.6 tools_2.15.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 11
Date: Tue, 12 Mar 2013 16:14:58 -0400
From: Zhuzhu Zhang <zhuzhuz@email.unc.edu>
To: Gordon K Smyth <smyth at="" wehi.edu.au="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and
exactTest
Message-ID: <513F8CC2.5070300 at email.unc.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear Gordon,
Thank you very much for your reply. That was greatly helpful.
For prion.df, would you recommend that the user use the default value
in
the current version, or choose it differently for different datasets
since it varies as the data size changes? If the latter, would the
same
principle of choosing prior.n be applied?
Thanks,
Zhuzhu
On 3/8/13 2:06 AM, Gordon K Smyth wrote:
> Dear Zhuzhu,
>
> The edgeR User's guide gives case studies of how we intend edgeR to
be
> used. You will see that we expect users to generally use the
default
> parameters. Although there are a great many parameters that can be
> varied in principle in the edgeR functions, we not expect them to be
> changed in most analyses.
>
> One exception was the prior.n argument to estimateTagwiseDisp. It
was
> never our intention that a fixed value for prior.n would be used for
> datasets of different sizes, so previous versions of the User's
Guide
> used to explain how to set this parameter. Several versions ago, we
> eliminated the prior.n argument and replaced it with prior.df so
that
> the default behavior was what we intended.
>
> Over time we have reduced the default value for prior.df somewhat.
> Larger values of prior.df give more priority to genes with large
fold
> changes between groups. Smaller values of prior.df give more
priority
> to genes with small dispersion values, i.e., to genes that are
> consistent between replicates.
>
> You ask about exactTest(). To learn more about the different
> rejection regions, you should start by reading the help page for
> exactTest(), but I doubt that this is causing you a problem. The
> theory behind the original exact negative binomial test of Robinson
> and Smyth (2008) presupposed that the conditional distibution of the
> counts given the genewise total was unimodal (like a binomial
> distribution). This is true for typical dispersion values, but is
not
> true for very large dispersion values. Hence the original exactTest
> can give totally innappropriate results when the dispersion is very,
> very large. The new rejection region is similar to the original when
> the dispersion is smallish, but gives sensible results in any
> situation. Hence it should be used. The old rejection region is
> preserved as an option only for backward compatability.
>
> Best wishes
> Gordon
>
>
>> Date: Wed, 06 Mar 2013 19:22:06 -0500
>> From: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu="">
>> To: bioconductor at r-project.org
>> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and
>> exactTest
>>
>> Dear All,
>>
>> I'm running edgeR 3.0.8 and notice that the results are
considerably
>> different than those from older versions. I realized that it was
likely
>> due to the different prior.df I used in Function
estimateTagwiseDisp,
>> thanks to an earlier discussion on the list-
>>
https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html
>>
>> I have a few more questions:
>>
>> 1. How to choose a proper prior.df (or prior.n)?
>>
>> 2. How is the new default method of exactText different from the
old
>> default (rejection.region = "smallp")? How does it improve the
>> performance?
>>
>> 3. In general, what parameters should I tune for different
datasets,
>> when using function estimateTagwiseDisp and exactTest? I used the
>> defaults but realized that they may not be most appropriate.
>>
>> Thank you for your time and attention. Any suggestions and comments
>> would be extremely helpful and appreciated.
>>
>> Thanks,
>> Zhuzhu
>>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:6}}
------------------------------
Message: 12
Date: Tue, 12 Mar 2013 21:32:03 +0100
From: Julian Gehring <julian.gehring@embl.de>
To: bioconductor at r-project.org
Subject: [BioC] Subsampling of BAM/SAM alignments
Message-ID: <513F90C3.3020301 at embl.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
Is there an efficient function to randomly subsample alignments from a
BAM/SAM within R? In the best case, an equivalent to 'FastqSampler'
in
'ShortRead'. If not, what would be a clever way of doing this with
the
bioc framework, without having to read the whole file first?
Best wishes
Julian
------------------------------
Message: 13
Date: Tue, 12 Mar 2013 21:51:39 +0100
From: Wolfgang Huber <whuber@embl.de>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: [BioC] DESeq2
Message-ID: <592E98F1-5FEF-47F0-B9F8-FA485F4489B0 at embl.de>
Content-Type: text/plain; charset=us-ascii
Dear DESeq users,
Mike Love, Simon Anders and I have been updating the DESeq package.
This resulted in the package DESeq2, which is available from the
development branch, and scheduled for the next release:
http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
For several release cycles, the original package (DESeq) will be
maintained at its current functionality, in order to not disrupt the
workflows of DESeq users. For new projects, we recommend using DESeq2.
Major innovations are:
* Base class: SummarizedExperiment is used as the superclass for
storing the data, rather than eSet. This allows closer integration
with upstream workflows involving GRanges and summarizeOverlaps, and
facilitates downstream analyses of the genomic regions of interest.
* Simplified workflow: the wrapper function DESeq() performs all steps
for a differential expression analysis. The individual steps are of
course also accessible.
* More powerful statistics: incorporation of prior distributions into
the estimation of dispersions and fold changes (empirical-Bayes
shrinkage). The dispersion shrinkage improves power compared to the
old DESeq. The fold changes shrinkage help moderate the otherwise
large spread in log fold changes for genes with low counts, while it
has negligible effect on genes with high counts; it may be
particularly useful for visualisation, clustering, classification,
ordination (PCA, MDS), similar to the variance-stabilizing
transformation in the old DESeq. A Wald test for significance is
provided as the default inference method, with the chi-squared test of
the previous version is also available. A manuscript is in
preparation.
* Normalization: it is possible to provide a matrix of sample- *and*
gene-specific normalization factors, which allows the use of
normalisation factors from Bioconductor packages such as cqn and
EDASeq.
Examples of usage are provided in the vignette, and more details are
available in the manual pages (specifically, the DESeq function and
estimateDispersions function).
Enjoy -
Mike, Simon, Wolfgang.
------------------------------
Message: 14
Date: Tue, 12 Mar 2013 21:56:17 +0100
From: Wolfgang Huber <whuber@embl.de>
To: "Nima [guest]" <guest at="" bioconductor.org="">
Cc: ahmadian.n at gmail.com, bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID: <9F4C0F37-22AE-49BF-99CE-51F2BE3CCB4A at embl.de>
Content-Type: text/plain; charset=us-ascii
Nima
Thank you. Can you provide a (minimal) transcript of the commands you
issued, and the exact error message you get (e.g. what is the URL
looked for but not found by firefox?)
Best wishes
Wolfgang
Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <guest at="" bioconductor.org=""> ha scritto:
>
> Dear All
>
> I have problem with displaying my images in R, all functions are
working except display
> my OS is windows 7 but Mozilla Firefox gives this message:
> Oops the page you are looking for could not be found
>
> would you please help me in this regard??
>
> Regards
> Nima
>
> -- output of sessionInfo():
>
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] EBImage_4.0.0
>
> loaded via a namespace (and not attached):
> [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 15
Date: Tue, 12 Mar 2013 22:27:58 +0100
From: Nima Ahmadian <ahmadian.n@gmail.com>
To: Wolfgang Huber <whuber at="" embl.de="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
<can+vrqnyqc1cuu=4xml_p1q_sxnmnquicy1s=bb7fzoqjc7klg at="" mail.gmail.com="">
Content-Type: text/plain
^Dear Wolfgang
Thank you so much but I solved the issue somehow, there is no any
error
message, but the image doesn't show automatically with my browser, I
have
to go to my Temp folder and open it manually so if you have any hint,
I
would be really grateful
Cheers
Nima
On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de="">
wrote:
>
> Nima
>
> Thank you. Can you provide a (minimal) transcript of the commands
you
> issued, and the exact error message you get (e.g. what is the URL
looked
> for but not found by firefox?)
>
> Best wishes
> Wolfgang
>
> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
> guest at bioconductor.org> ha scritto:
>
> >
> > Dear All
> >
> > I have problem with displaying my images in R, all functions are
working
> except display
> > my OS is windows 7 but Mozilla Firefox gives this message:
> > Oops the page you are looking for could not be found
> >
> > would you please help me in this regard??
> >
> > Regards
> > Nima
> >
> > -- output of sessionInfo():
> >
> > R version 2.15.3 (2013-03-01)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods
base
> >
> > other attached packages:
> > [1] EBImage_4.0.0
> >
> > loaded via a namespace (and not attached):
> > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
[[alternative HTML version deleted]]
------------------------------
Message: 16
Date: Tue, 12 Mar 2013 22:53:39 +0100
From: Andrzej Ole? <andrzej.oles@gmail.com>
To: Nima Ahmadian <ahmadian.n at="" gmail.com="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
<cah7mkkj-pzkd1a6vqkiu8tsybh5o5kc67fbaxzb3jhgcfvytrg at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Dear Nima,
many thanks for reporting this issue! Could you please check whether
the example from the documentation of the 'display' function results
in the same error? Try running the following commands:
f = system.file("images", "lena-color.png", package="EBImage")
x = readImage(f)
display(x)
If you still keep getting the error please copy the contents of the
address bar of your browser and send it to me. Additionally, could you
provide the complete path (i.e., beginning with c:\ ... or similar) to
the Temp folder containing the generated images and the listing of its
contents?
I look forward to hearing from you soon.
Best,
Andrzej
On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at="" gmail.com=""> wrote:
> ^Dear Wolfgang
>
> Thank you so much but I solved the issue somehow, there is no any
error
> message, but the image doesn't show automatically with my browser, I
have
> to go to my Temp folder and open it manually so if you have any
hint, I
> would be really grateful
>
> Cheers
> Nima
>
> On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de="">
wrote:
>
>>
>> Nima
>>
>> Thank you. Can you provide a (minimal) transcript of the commands
you
>> issued, and the exact error message you get (e.g. what is the URL
looked
>> for but not found by firefox?)
>>
>> Best wishes
>> Wolfgang
>>
>> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
>> guest at bioconductor.org> ha scritto:
>>
>> >
>> > Dear All
>> >
>> > I have problem with displaying my images in R, all functions are
working
>> except display
>> > my OS is windows 7 but Mozilla Firefox gives this message:
>> > Oops the page you are looking for could not be found
>> >
>> > would you please help me in this regard??
>> >
>> > Regards
>> > Nima
>> >
>> > -- output of sessionInfo():
>> >
>> > R version 2.15.3 (2013-03-01)
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >
>> > locale:
>> > [1] LC_COLLATE=English_United States.1252
>> > [2] LC_CTYPE=English_United States.1252
>> > [3] LC_MONETARY=English_United States.1252
>> > [4] LC_NUMERIC=C
>> > [5] LC_TIME=English_United States.1252
>> >
>> > attached base packages:
>> > [1] stats graphics grDevices utils datasets methods
base
>> >
>> > other attached packages:
>> > [1] EBImage_4.0.0
>> >
>> > loaded via a namespace (and not attached):
>> > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4
>> >
>> > --
>> > Sent via the guest posting facility at bioconductor.org.
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 17
Date: Wed, 13 Mar 2013 10:37:03 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
To: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and
exactTest
Message-ID: <pine.wnt.4.64.1303131033340.6032 at="" pc765.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Tue, 12 Mar 2013, Zhuzhu Zhang wrote:
> Dear Gordon,
>
> Thank you very much for your reply. That was greatly helpful.
>
> For prion.df, would you recommend that the user use the default
value in the
> current version, or choose it differently for different datasets
since it
> varies as the data size changes?
Use the default value. As I tried to explain, the whole point of
using
prior.df instead of prior.n is that the optimal value for prior.df
does
not depend on data size.
> If the latter, would the same principle of
> choosing prior.n be applied?
prior.n is a function of prior.df. See ?getPriorN.
Gordon
> Thanks,
> Zhuzhu
>
>
>
>
> On 3/8/13 2:06 AM, Gordon K Smyth wrote:
>> Dear Zhuzhu,
>>
>> The edgeR User's guide gives case studies of how we intend edgeR to
be
>> used. You will see that we expect users to generally use the
default
>> parameters. Although there are a great many parameters that can be
>> varied in principle in the edgeR functions, we not expect them to
be
>> changed in most analyses.
>>
>> One exception was the prior.n argument to estimateTagwiseDisp. It
was
>> never our intention that a fixed value for prior.n would be used
for
>> datasets of different sizes, so previous versions of the User's
Guide
>> used to explain how to set this parameter. Several versions ago,
we
>> eliminated the prior.n argument and replaced it with prior.df so
that
>> the default behavior was what we intended.
>>
>> Over time we have reduced the default value for prior.df somewhat.
>> Larger values of prior.df give more priority to genes with large
fold
>> changes between groups. Smaller values of prior.df give more
priority
>> to genes with small dispersion values, i.e., to genes that are
>> consistent between replicates.
>>
>> You ask about exactTest(). To learn more about the different
rejection
>> regions, you should start by reading the help page for exactTest(),
but
>> I doubt that this is causing you a problem. The theory behind the
>> original exact negative binomial test of Robinson and Smyth (2008)
>> presupposed that the conditional distibution of the counts given
the
>> genewise total was unimodal (like a binomial distribution). This
is
>> true for typical dispersion values, but is not true for very large
>> dispersion values. Hence the original exactTest can give totally
>> innappropriate results when the dispersion is very, very large. The
new
>> rejection region is similar to the original when the dispersion is
>> smallish, but gives sensible results in any situation. Hence it
should
>> be used. The old rejection region is preserved as an option only
for
>> backward compatability.
>>
>> Best wishes
>> Gordon
>>
>>
>>> Date: Wed, 06 Mar 2013 19:22:06 -0500
>>> From: Zhuzhu Zhang <zhuzhuz at="" email.unc.edu="">
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and
>>> exactTest
>>>
>>> Dear All,
>>>
>>> I'm running edgeR 3.0.8 and notice that the results are
considerably
>>> different than those from older versions. I realized that it was
>>> likely due to the different prior.df I used in Function
>>> estimateTagwiseDisp, thanks to an earlier discussion on the list-
>>>
https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html
>>>
>>> I have a few more questions:
>>>
>>> 1. How to choose a proper prior.df (or prior.n)?
>>>
>>> 2. How is the new default method of exactText different from the
old
>>> default (rejection.region = "smallp")? How does it improve the
>>> performance?
>>>
>>> 3. In general, what parameters should I tune for different
datasets,
>>> when using function estimateTagwiseDisp and exactTest? I used the
>>> defaults but realized that they may not be most appropriate.
>>>
>>> Thank you for your time and attention. Any suggestions and
comments
>>> would be extremely helpful and appreciated.
>>>
>>> Thanks,
>>> Zhuzhu
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 18
Date: Wed, 13 Mar 2013 11:17:30 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
To: Ian Sudbery <ian.sudbery at="" dpag.ox.ac.uk="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: [BioC] negative correlation from limma's duplicateCorrelation
Message-ID: <pine.wnt.4.64.1303131037280.6032 at="" pc765.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Dear Ian,
The help page for duplicateCorrelation() says
"For this function to return statistically useful results, there must
be
at least two more arrays than the number of coefficients to be
estimated,
i.e., two more than the column rank of design."
You have just two arrays, only one more than the number of columns in
the
design matrix. So there is not enough meaningful information from
which
to estimate the duplicate correlation.
The correlations that you are computing using cor() are not comparable
to
the quantity estimated by duplicateCorrelation. You are computing
correlations between pairs of spots for the same gene across different
genes. This correlation arises from the fact that different genes
have
different expression levels. You would get a positive value for this
correlation even if there were no spot duplicates. The
duplicateCorrelation() function on the other hand computes a
correlation
across technical replicates for the same gene. These two correlations
are
not related.
Best wishes
Gordon
> Date: Mon, 11 Mar 2013 18:27:25 +0000
> From: Ian Sudbery <ian.sudbery at="" dpag.ox.ac.uk="">
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch="">
> Subject: [BioC] negative correlation from limma's
duplicateCorrelation
>
> Hi all,
>
> I might be misunderstanding something here, but I think I'm having
> trouble with Limma's duplicateCorrelation function.
>
> I am analysing a set of custom spotted two color microarrays. Each
> comparison contains two arrays. Each array contains two copies of
each
> spot. The design table for each comparison looks something like
this:
>
>> targets
> SlideNumber FileName Cy3 Cy5
Date Name
> yeast_wt_v_dd_rep1 823 s823_bwp17y+ddy.txt wty ddy
17/12/2012 yeast_wt_v_dd_rep1
> yeast_wt_v_dd_rep2 828 s828_wty+ddy.txt wty ddy
18/01/2013 yeast_wt_v_dd_rep2
>
>
> After background correction and normalisation, I run
> duplicateCorrelation before fitting the model:
>
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2,
spacing = 8324)
>
> When I look at the results, I find that the consensus correlation is
> negative:
>
>> corfit$consensus.correlation
> [1] -0.4163954
>
> Surely this is wrong? Examining the correlation for duplicate spots
on
> each slide manually suggests a good correlation, particularly if one
> ignores flagged spots:
>
>> good_spots <- MA_norm$weights[1:8324,] == 1 &
MA_norm$weights[8325:16648,] ==1
>> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][goo
d_spots[,1]])
> [1] 0.8769604
>
>> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][goo
d_spots[,2]])
> [1] 0.858891
>
> Even without removing the flagged spots, the correlation is still
positive:
>> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1])
> [1] 0.2669379
>
>> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2])
> [1] 0.1843577
>
> I know that this isn't the way that duplicateCorrelation calculates
rho,
> but one would expect that a good correlation in this way might
indicate
> at least a positive correlation from duplicateCorrelation? Or am I
> interpreting the consensus correlation incorrectly?
>
> Here is the rest of my analysis script (I've left out some
diagnostic plot generation):
>
>> targets <- readTargets(row.names="Name")
>> RG <- read.maimages(targets$FileName, source = "genepix",
wt.fun=wtflags(weight=0, cutoff=-50))
>> spottypes <- readSpotTypes()
>> RG$genes$Status <- controlStatus(spottypes,RG)
>> anno <- read.delim("anno.txt", comment.char="!", header = T)
>> RG_background <- backgroundCorrect(RG,method = "normexp", offset
=10)
>>
>> #flag out spots where intentity isn't much above background in both
channels
>> RG_background$weights[RG_background$R < 50 & RG_background$G < 50]
= 0
>> MA <- normalizeWithinArrays(RG_background)
>> MA_norm <- MA[MA$genes$Status == "cDNA",]
>> MA_norm$genes$Status <- NULL
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2,
spacing = 8324)
>> fit <- lmFit(MA_norm, design = design, ndups = 2,
correlation=corfit$consensus.correlation,spacing = 8324)
>> fit<- eBayes(fit)
>
>> sesssionInfo()
> R version 2.15.1 (2012-06-22)
>
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
Kingdom.1252
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] dichromat_2.0-0 gplots_2.11.0 MASS_7.3-18
KernSmooth_2.23-7 caTools_1.14 gdata_2.12.0 gtools_2.7.0
> [8] reshape2_1.2.2 ggplot2_0.9.3 statmod_1.4.16
limma_3.14.4
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5 colorspace_1.2-1 digest_0.6.2
gtable_0.1.2 labeling_0.1 munsell_0.4 plyr_1.8
> [8] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3
stringr_0.6.2 tools_2.15.1
>
> Any advice appreciated
>
> Ian
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 19
Date: Wed, 13 Mar 2013 11:27:50 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: [BioC] anova like test in edgeR
Message-ID: <pine.wnt.4.64.1303131122400.6032 at="" pc765.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT)
> From: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] anova like test in edgeR
>
> Hello everyone:
>
> a wish to know if there is one way of doing an anova like test in
edgeR,
> comparing all the possibilities between the groups defined in the
> experimental design.
>
> in the users guide the authors explain one way anova test, but the
> comparison is done only against the intercept group, is there a way
of
> doing a general test comparing all the posibilities and not only
against
> the intercept?
The example in the User's Guide is a general test comparing all
possibilities.
Best wishes
Gordon
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 20
Date: Wed, 13 Mar 2013 01:38:44 +0100
From: Nima Ahmadian <ahmadian.n@gmail.com>
To: Andrzej Ole? <andrzej.oles at="" gmail.com="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
<can+vrqml+9hs0iq+khoocisgxw7nncxk1zbnxxv_gk5pgkcs0q at="" mail.gmail.com="">
Content-Type: text/plain
Dear Andrzej
I solved the issue, but I came to know that if you have searchqu
toolbar
installed on your browser it prevents the images to get open, so I
removed
it and it worked out
cheers
Nima
On Tue, Mar 12, 2013 at 10:53 PM, Andrzej Ole?? <andrzej.oles at="" gmail.com="">wrote:
> Dear Nima,
>
> many thanks for reporting this issue! Could you please check whether
> the example from the documentation of the 'display' function results
> in the same error? Try running the following commands:
>
> f = system.file("images", "lena-color.png", package="EBImage")
> x = readImage(f)
> display(x)
>
> If you still keep getting the error please copy the contents of the
> address bar of your browser and send it to me. Additionally, could
you
> provide the complete path (i.e., beginning with c:\ ... or similar)
to
> the Temp folder containing the generated images and the listing of
its
> contents?
>
> I look forward to hearing from you soon.
>
> Best,
> Andrzej
>
>
> On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at="" gmail.com="">
> wrote:
> > ^Dear Wolfgang
> >
> > Thank you so much but I solved the issue somehow, there is no any
error
> > message, but the image doesn't show automatically with my browser,
I have
> > to go to my Temp folder and open it manually so if you have any
hint, I
> > would be really grateful
> >
> > Cheers
> > Nima
> >
> > On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote:
> >
> >>
> >> Nima
> >>
> >> Thank you. Can you provide a (minimal) transcript of the commands
you
> >> issued, and the exact error message you get (e.g. what is the URL
looked
> >> for but not found by firefox?)
> >>
> >> Best wishes
> >> Wolfgang
> >>
> >> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
> >> guest at bioconductor.org> ha scritto:
> >>
> >> >
> >> > Dear All
> >> >
> >> > I have problem with displaying my images in R, all functions
are
> working
> >> except display
> >> > my OS is windows 7 but Mozilla Firefox gives this message:
> >> > Oops the page you are looking for could not be found
> >> >
> >> > would you please help me in this regard??
> >> >
> >> > Regards
> >> > Nima
> >> >
> >> > -- output of sessionInfo():
> >> >
> >> > R version 2.15.3 (2013-03-01)
> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >> >
> >> > locale:
> >> > [1] LC_COLLATE=English_United States.1252
> >> > [2] LC_CTYPE=English_United States.1252
> >> > [3] LC_MONETARY=English_United States.1252
> >> > [4] LC_NUMERIC=C
> >> > [5] LC_TIME=English_United States.1252
> >> >
> >> > attached base packages:
> >> > [1] stats graphics grDevices utils datasets methods
base
> >> >
> >> > other attached packages:
> >> > [1] EBImage_4.0.0
> >> >
> >> > loaded via a namespace (and not attached):
> >> > [1] abind_1.4-0 jpeg_0.1-2 png_0.1-4 tiff_0.1-4
> >> >
> >> > --
> >> > Sent via the guest posting facility at bioconductor.org.
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at r-project.org
> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 21
Date: Wed, 13 Mar 2013 01:21:49 +0000
From: Nadia Davidson <nadia.davidson@mcri.edu.au>
To: <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Negative parameter error when using goseq
Message-ID: <loom.20130313t015334-216 at="" post.gmane.org="">
Content-Type: text/plain; charset="us-ascii"
Tarca, Adi <atarca at="" ...=""> writes:
>
> Dear Matthew,
> I am getting the following error when using goseq. Any idea what may
cause it?
> Thanks,
> Adi
>
> > pwf=nullp(genes,bias.data=cntbias)
> > gocats=as.list(org.Hs.egGO2ALLEGS)
> > GOr=goseq(pwf,gene2cat=gocats)
> Using manually entered categories.
> Calculating the p-values...
> Error in dWNCHypergeo(num_de_incat, num_incat, num_genes -
num_incat, :
> Negative parameter
> > head(pwf)
> DEgenes bias.data pwf
> 642819 0 3764 0.05775376
> 414235 0 124 0.02107342
> 57107 0 14702 0.06023892
> 649159 1 248 0.02542837
> 3127 0 44633 0.06357525
> 100507412 1 1337 0.05268580
> > head(gocats[3:5])
> $`GO:0000012`
> IDA IDA IEA IMP IDA
IDA
> "3981" "7141" "7515" "23411" "54840"
"55775"
> IMP IMP IEA
> "55775" "200558" "100133315"
>
> Adi Laurentiu TARCA, Ph.D.
Dear Adi,
I think the trouble is that "org.Hs.egGO2ALLEGS" lists some GO groups
twice for the same gene. This is making goseq overestimate the number
of genes in a particular GO category, to the extent that the number of
genes in the category is more than the total number of genes.
It should be simple to fix with:
gocats<-sapply(gocats,function(x){unique(x)})
before the "goseq" command.
I'll update the goseq code so it checks for this scenario in future.
Cheers,
Nadia.
------------------------------
Message: 22
Date: Wed, 13 Mar 2013 12:57:47 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] anova like test in edgeR
Message-ID: <pine.wnt.4.64.1303131254070.6032 at="" pc765.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
The example is in Section 3.2.5, which is titled "An ANOVA-like test
for
any differences".
Gordon
> Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT)
> From: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] anova like test in edgeR
>
> Hello everyone:
>
> a wish to know if there is one way of doing an anova like test in
edgeR,
> comparing all the possibilities between the groups defined in the
> experimental design.
>
> in the users guide the authors explain one way anova test, but the
> comparison is done only against the intercept group, is there a way
of
> doing a general test comparing all the posibilities and not only
against
> the intercept?
The example in the User's Guide is a general test comparing all
possibilities.
Best wishes
Gordon
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 23
Date: Tue, 12 Mar 2013 19:45:01 -0700
From: Martin Morgan <mtmorgan@fhcrc.org>
To: Julian Gehring <julian.gehring at="" embl.de="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Subsampling of BAM/SAM alignments
Message-ID: <513FE82D.6050104 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 03/12/2013 01:32 PM, Julian Gehring wrote:
> Hi,
>
> Is there an efficient function to randomly subsample alignments from
a BAM/SAM
> within R? In the best case, an equivalent to 'FastqSampler' in
'ShortRead'. If
> not, what would be a clever way of doing this with the bioc
framework, without
> having to read the whole file first?
Hi Julian -- there isn't anything at the moment; below is a function
that will
have at most yieldSize * 2 elements in memory at one time. All the
elements in
the bam file (satisfying ScanBamParam) will eventually pass through
memory, so
not super efficient.
sampleBam <-
function(file, index=file, ..., yieldSize, verbose=FALSE,
param=ScanBamParam(what=scanBamWhat()))
{
qunlist <- Rsamtools:::.quickUnlist
tot <- sampleSize <- yieldSize
bf <- open(BamFile(file, index, yieldSize=yieldSize))
smpl <- qunlist(scanBam(bf, param=param))
repeat {
yld <- qunlist(scanBam(bf, param=param))
yld_n <- length(yld[[1]])
if (length(yld[[1]]) == 0L)
break
tot <- tot + yld_n
keep <- rbinom(1L, yld_n, yld_n/ tot)
if (verbose)
message(sprintf("sampling %d of %d", keep, tot))
if (keep == 0L)
next
i <- sample(sampleSize, keep)
j <- sample(yld_n, keep)
smpl <- Map(function(x, y, i, j) {
x[i] <- y[j]
x
}, smpl, yld, MoreArgs=list(i=i, j=j))
}
close(bf)
smpl
}
with
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
param <- ScanBamParam(
flag=scanBamFlag(isUnmappedQuery=FALSE),
what=c("rname", "pos", "cigar", "strand"))
lst <- sampleBam(fl, yieldSize=1000, param=param, verbose=TRUE)
The end result could be fed to a more friendly structure
names(lst)[names(lst) == "rname"] = "seqname"
do.call(GappedAlignments, lst)
Lightly tested, especially when ScanBamParam() is non-trivial.
Martin
>
> Best wishes
> Julian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
------------------------------
Message: 24
Date: Tue, 12 Mar 2013 17:28:57 -0400
From: Amy Vandiver <amyruthvandiver@gmail.com>
To: <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] ArrayExpress Question
Message-ID:
<caf4aqnhqpohkepghenuknje+cx2e3d8ts-vlp=wu_oki3fk7ta at="" mail.gmail.com="">
Content-Type: text/plain
Hello,
I've been trying to access Array Expression data using the Array
Expression
Package, however I am running into an error, seemingly due to the
format of
the SDRF file. I have been trying to run through the ae2bioc function
line
by line (ae2bioc) to determine if I can work around this error,
however
many of the called functions (readPhenoData, getSDRFcolumn, etc.) can
not
be found. Am I missing a specific package containing these functions?
Is
there a way to work around the original error?
Thank you,
Amy
*My script:*
source("http://bioconductor.org/biocLite.R")
biocLite("ArrayExpress")
library("ArrayExpress")
AEset=ArrayExpress("E-MTAB-202")
*Error:*
Error in .subset2(x, i, exact = exact) :
attempt to select less than one element
In addition: Warning message:
In readPhenoData(sdrf, path) :
ArrayExpress: Cannot find 'Label' column in SDRF. Object might not
be
created correctly.
*SessionInfo:*
R Under development (unstable) (2013-03-10 r62201)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices datasets utils
methods
[8] base
other attached packages:
[1] minfiLocal_0.1.20
[2] minfiData_0.3.1
[3] IlluminaHumanMethylation450kannotation.ilmn.v1.2_0.1.3
[4] IlluminaHumanMethylation450kmanifest_0.4.0
[5] bumphunter_0.99.33
[6] iterators_1.0.6
[7] foreach_1.4.0
[8] minfi_1.5.3
[9] Biostrings_2.27.11
[10] GenomicRanges_1.11.35
[11] IRanges_1.17.35
[12] reshape_0.8.4
[13] plyr_1.8
[14] lattice_0.20-13
[15] limma_3.15.14
[16] ArrayExpress_1.19.0
[17] Biobase_2.19.3
[18] BiocGenerics_0.5.6
[19] BiocInstaller_1.9.6
loaded via a namespace (and not attached):
[1] affy_1.37.4 affyio_1.27.1 beanplot_1.1
[4] codetools_0.2-8 DNAcopy_1.33.1 doRNG_1.5
[7] grid_3.1.0 illuminaio_0.1.5 itertools_0.1-1
[10] MASS_7.3-24 matrixStats_0.6.2 mclust_4.0
[13] multtest_2.15.0 nor1mix_1.1-3 preprocessCore_1.21.1
[16] RColorBrewer_1.0-5 R.methodsS3_1.4.2 siggenes_1.33.1
[19] splines_3.1.0 stats4_3.1.0 survival_2.37-4
[22] tools_3.1.0 XML_3.95-0.2 zlibbioc_1.5.0
--
Amy Ruth Vandiver
MSII, GSIII
Johns Hopkins School of Medicine, MD-PhD Program
[[alternative HTML version deleted]]
------------------------------
Message: 25
Date: Wed, 13 Mar 2013 01:55:49 +0000
From: "Marwaha, Shruti (marwahsi)" <marwahsi@mail.uc.edu>
To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: [BioC] Error in getGEO command with GSEMatrix=TRUE
Message-ID:
<39BA056F90A9A14C8067B39997BFEDCB5867150A at
SN2PRD0102MB117.prod.exchangelabs.com>
Content-Type: text/plain; charset="us-ascii"
Dear Developers,
I am trying to get an ExpressionSet from a GSEfile using getGEO
command with GSEMatrix=TRUE. It sometimes gives an error (Connection
was reset) if I use GSEMatrix=TRUE however sometimes it works smooth.
I have the latest version of R, Bioconductor and GEOquery. Please
advise if I am missing some library or need to change some settings.
Thanks in advance.
My Code
library(Biobase)
library(GEOquery)
library(limma)
gset <- getGEO("GSE27347", GSEMatrix =TRUE)
Error:
Error in function (type, msg, asError = TRUE) :
Send failure: Connection was reset
Session Info
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.14.4 GEOquery_2.24.1 Biobase_2.18.0
BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2
System Properties
OS: Windows 7
RAM: 4GB
Processor: 2GHz
Methods already attempted
* I have tried it on both RStudio and GUI
* I have also tried using the command
"setInternet2(use=FALSE)" but it doesn't help.
* I have tried this command with other GSEfiles also and get
the same error.
* GEOquery works fine and downloads the GSEfiles if I give
GSEMatrix=FALSE
* I have also tried downloading the file and provide the file
path of the file to getGEO with GSEMatrix=TRUE but it does not give an
ExpressionSet as output.
Thanks & Regards,
Shruti Marwaha
Graduate Student
Systems Biology and Physiology
University of Cincinnati
Medical Science Building,
231 Albert Sabin Way
Cincinnati, OH, USA 45267
------------------------------
Message: 26
Date: Tue, 12 Mar 2013 21:35:26 -0600
From: chris_utah <chris.gregg@neuro.utah.edu>
To: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: [BioC] biomaRt access for NCBIM37 build of mouse genome
Message-ID: <6D52F2F6-3564-4DA7-9101-2AA9B1D15BF8 at neuro.utah.edu>
Content-Type: text/plain
Hello,
I am using biomaRt. I want to retrieve transcript start and stop info
based on the NCBIM37 (mm9) build using getBM. I can't find this
annotation in listMarts(archive =T). Is anyone aware of a simple
solution?
Thank you.
best wishes,
Chris
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods base
other attached packages:
[1] edgeR_3.0.8 limma_3.14.1 rtracklayer_1.18.0
biomaRt_2.14.0 GenomicRanges_1.10.5 IRanges_1.16.4
BiocGenerics_0.4.0
[8] Gviz_1.2.0
loaded via a namespace (and not attached):
[1] annotate_1.36.0 AnnotationDbi_1.20.2 Biobase_2.18.0
Biostrings_2.26.2 biovizBase_1.6.0 bitops_1.0-4.2
[7] BSgenome_1.26.1 cluster_1.14.3 colorspace_1.2-0
DBI_0.2-5 DESeq_1.10.1 dichromat_1.2-4
[13] genefilter_1.40.0 geneplotter_1.36.0
GenomicFeatures_1.10.0 Hmisc_3.10-1 labeling_0.1
lattice_0.20-10
[19] munsell_0.4 parallel_2.15.2 plyr_1.7.1
RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.1
[25] RSQLite_0.11.2 scales_0.2.2 splines_2.15.2
stats4_2.15.2 stringr_0.6.1 survival_2.36-14
[31] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0
zlibbioc_1.4.0
[[alternative HTML version deleted]]
------------------------------
Message: 27
Date: Tue, 12 Mar 2013 23:39:06 -0700
From: "Ryan C. Thompson" <rct@thompsonclan.org>
To: Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] anova like test in edgeR
Message-ID: <51401F0A.6070803 at thompsonclan.org>
Content-Type: text/plain
Luis,
Your design has 4 groups,which means that the inter-group ANOVA-like
test has 4 - 1 = 3 degrees of freedom. Hence, the ANOVA-like test is
performed by testing 3 coefficients/contrasts equal to zero. The
procedures I have described will yield will test where the null
hypothesis is all group means being equal, and the alternative
hypothesis is that any two or more group means differ significantly
from
each other. It is true that between 4 groups, there are 6 possible
pairwise comparisons to be made. If you want individual significance
rankings for each of the 6 possible contrasts, you can do that too, by
making 6 separate calls to glmLRT, each with a single coefficient or
contrast. Doing so will give you an appropriate rank order of
significance for each contrast, but the exact FDR values will probably
be too liberal, since there is no correction for the fact that you are
performing 6 tests on each gene instead of just 1. Ideally, you would
use a post-hoc testing method to assess individual contrasts within
genes that are called significant by their FDR in the ANOVA-like test.
However, I'm not sure how to perform such tests with the negative
binomial distribution. You might consider using the limma-voom method
instead of edgeR, since that will perform an actual ANOVA based on the
normal distribution, so standard ANOVA post-hoc analysis is possible.
If
anyone has a take on how to do rigorous post-hoc testing on negative
binomial GLMs, I'd be interested in hearing about it.
If you just want the logFC values for the missing 3 contrasts, then
simply note that they can all be written in terms of the X vs 10
contrasts. For example, the 20 vs 40 contrast would be the 10 vs 40
contrast minus the 10 vs 20 contrast.
-Ryan
On 03/12/2013 06:42 PM, Luis alberto Martinez lopez wrote:
> thank you Ryan for your clear answer:
> but in your answer again only considered the comparison versus the
10
> group(the first)
> i dont't have a control group so i'm looking for a way of doing an
> anova like test in edgeR that compares 10 vs 20 ,20 vs 40, 40 vs
60,10
> vs 40, 10 vs 60,20 vs 60 ie all the posibilities at the same time
and
> that give me a list with the genes more differential in any
condition,
> thanks in advance,
> Luis
> *De:* Ryan C. Thompson <rct at="" thompsonclan.org="">
> *Para:* Luis alberto Martinez lopez <lewisluis2013 at="" yahoo.com="">
> *CC:* "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> *Enviado:* Lunes, 11 de marzo, 2013 22:46:19
> *Asunto:* Re: [BioC] anova like test in edgeR
>
> Luis,
>
> Note that your experimental design is the same whether or not you
> include an intercept in the design matrix. If you include an
> intercept, then the intercept represents the level in the D10 group,
> and the other three coefficients represent the differences between
> D20/40/60 and D10. So with an intercept, you would specify coef=2:4
to
> do an ANOVA-like test for any differences among all four groups.
>
> If you choose not to use an intercept term, then the design matrix
is
> just a different parametrization of the same design. Instead of
> representing differences, each coefficient simply represents the
> expression level in a group. So you can still perform the same test,
> you simply have to specify the contrasts (differences) explicitly:
>
> > anova.contrasts <- makeContrasts(contrasts=c("grpD20-grpD10",
> "grpD40-grpD10", "grpD60-grpD10"), levels=design)
> > print(anovalike.contrasts)
> Contrasts
> Levels grpD20-grpD10 grpD40-grpD10 grpD60-grpD10
> grpD10 -1 -1 -1
> grpD20 1 0 0
> grpD40 0 1 0
> grpD60 0 0 1
> > glmLRT(fit2, contrast=anova.contrasts)
>
> Hope this helps,
>
> -Ryan Thompson
>
>
> On 03/11/2013 05:18 PM, Luis alberto Martinez lopez wrote:
>> Hello everyone:
>>
>> a wish to know if there is one way of doing an anova like test in
edgeR, comparing all the possibilities between the groups defined in
the experimental design.
>>
>> in the users guide the authors explain one way anova test, but the
comparison is done only against the intercept group, is there a way
of doing a general test comparing all the posibilities and not only
against the intercept?
>>
>> also i'd like to know what does the following differential
expression analysis does because the output is different when i
include an intercept, i can't deduce what comparison are being
doing....
>>
>>> grp<-c("D10","D10","D20","D20","D40","D40","D60","D60")
>>> dge = DGEList(counts=general.m, group=grp)
>>> dge = calcNormFactors(dge)
>>> design <- model.matrix(~0+grp, data=dge$samples)
>> Warning message:
>> In model.matrix.default(~0 + grp, data = dge$samples) :
>> variable 'grp' converted to a factor
>>> design
>> grpD10 grpD20 grpD40 grpD60
>> i10re1 1 0 0 0
>> i10re2 1 0 0 0
>> i20re1 0 1 0 0
>> i20re2 0 1 0 0
>> i40re1 0 0 1 0
>> i40re2 0 0 1 0
>> i60re1 0 0 0 1
>> i60re2 0 0 0 1
>> attr(,"assign")
>> [1] 1 1 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$grp
>> [1] "contr.treatment"
>>
>>> dge <- estimateGLMCommonDisp(dge,design)
>>> dge <- estimateGLMTrendedDisp(dge,design)
>> Loading required package: splines
>>> dge <- estimateGLMTagwiseDisp(dge,design)
>>> fit2 <- glmFit(dge, design)
>>> lrt.anova_like <- glmLRT(fit2, coef=2:4)
>>> lrt.anova_like
>> An object of class "DGELRT"
>> $coefficients
>> grpD10 grpD20 grpD40 grpD60
>> comp101_c0 -12.55010 -16.12974 -13.27081 -16.12974
>> comp103_c0 -13.85721 -13.35290 -12.61155 -13.96824
>> comp120_c0 -12.33347 -13.98877 -16.12974 -16.12974
>> comp179_c0 -12.55135 -13.98566 -16.12974 -13.96827
>> comp192_c0 -13.85721 -13.98569 -13.27548 -12.67228
>> 23484 more rows ...
>>
>> $fitted.values
>> i10re1 i10re2 i20re1 i20re2 i40re1
i40re2 i60re1
>> comp101_c0 2.0108643 1.9941877 0.0000000 0.0000000 0.841309
1.168177 0.0000000
>> comp103_c0 0.5020816 0.4979177 1.1140306 0.8859780 1.674705
2.325366 0.5414430
>> comp120_c0 2.5112747 2.4904480 0.5552870 0.4416145 0.000000
0.000000 0.0000000
>> comp179_c0 2.0083306 1.9916749 0.5570245 0.4429963 0.000000
0.000000 0.5414282
>> comp192_c0 0.5020816 0.4979177 0.5570036 0.4429797 0.837345
1.162673 2.1657497
>> i60re2
>> comp101_c0 0.0000000
>> comp103_c0 0.4585719
>> comp120_c0 0.0000000
>> comp179_c0 0.4585593
>> comp192_c0 1.8342685
>> 23484 more rows ...
>>
>> $deviance
>> comp101_c0 comp103_c0 comp120_c0 comp179_c0 comp192_c0
>> 4.386443 3.070288 2.848548 3.917710 2.629182
>> 23484 more elements ...
>>
>> $df.residual
>> [1] 4 4 4 4 4
>> 23484 more elements ...
>>
>> $abundance
>> [1] -13.63381 -13.35715 -13.64240 -13.64482 -13.35716
>> 23484 more elements ...
>>
>> $design
>> grpD10 grpD20 grpD40 grpD60
>> i10re1 1 0 0 0
>> i10re2 1 0 0 0
>> i20re1 0 1 0 0
>> i20re2 0 1 0 0
>> i40re1 0 0 1 0
>> i40re2 0 0 1 0
>> i60re1 0 0 0 1
>> i60re2 0 0 0 1
>> attr(,"assign")
>> [1] 1 1 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$grp
>> [1] "contr.treatment"
>>
>>
>> $offset
>> [,1] [,2] [,3] [,4] [,5] [,6]
[,7] [,8]
>> [1,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707
13.31095
>> [2,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707
13.31095
>> [3,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707
13.31095
>> [4,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707
13.31095
>> [5,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707
13.31095
>> 23484 more rows ...
>>
>> $dispersion
>> [1] 0.2161256308 0.0003318302 0.0634519684 0.0003270568
0.0003318016
>> 23484 more elements ...
>>
>> $method
>> [1] "oneway"
>>
>> $samples
>> group lib.size norm.factors
>> i10re1 D10 594988 0.9808651
>> i10re2 D10 590850 0.9795431
>> i20re1 D20 723182 1.0343009
>> i20re2 D20 567524 1.0481805
>> i40re1 D40 452242 1.1449003
>> i40re2 D40 671656 1.0703965
>> i60re1 D60 801682 0.8892299
>> i60re2 D60 685351 0.8809633
>>
>> $table
>> logFC.grpD20 logFC.grpD40 logFC.grpD60 logCPM
LR
>> comp101_c0 -23.27030 -19.14573 -23.27030 0.2621324
647.0115
>> comp103_c0 -19.26416 -18.19462 -20.15192 0.6612798
194043.3654
>> comp120_c0 -20.18153 -23.27030 -23.27030 0.2497509
1999.3276
>> comp179_c0 -20.17704 -23.27030 -20.15196 0.2462508
196431.3868
>> comp192_c0 -20.17709 -19.15248 -18.28224 0.6612611
194057.4372
>> PValue
>> comp101_c0 6.475824e-140
>> comp103_c0 0.000000e+00
>> comp120_c0 0.000000e+00
>> comp179_c0 0.000000e+00
>> comp192_c0 0.000000e+00
>> 23484 more rows ...
>>
>> $comparison
>> [1] "grpD20" "grpD40" "grpD60"
>>
>> $df.test
>> [1] 3 3 3 3 3
>> 23484 more elements ...
>>
>>
>>> top<-topTags(lrt.anova_like)
>>> top
>> Coefficient: grpD20 grpD40 grpD60
>> logFC.grpD20 logFC.grpD40 logFC.grpD60 logCPM
LR PValue
>> comp628_c0 -23.2703 -23.27030 -19.23749 0.2462499
196434.924 0
>> comp2607_c0 -23.2703 -20.07173 -18.68223 0.4686227
195146.814 0
>> comp2858_c0 -23.2703 -23.27030 -16.85200 1.1206869
192585.615 0
>> comp2885_c0 -23.2703 -18.60303 -17.96733 0.6593429
4326.878 0
>> comp2904_c0 -23.2703 -23.27030 -16.21482 1.8318474
2065.059 0
>> comp2912_c0 -23.2703 -17.88139 -20.15196 1.1207038
192646.480 0
>> comp2933_c0 -23.2703 -18.59551 -18.68219 0.4686259
195111.455 0
>> comp2950_c0 -23.2703 -19.16586 -17.97899 0.4607410
1715.721 0
>> comp2999_c0 -23.2703 -23.27030 -23.27030 1.2524134
3853.539 0
>> comp3028_c0 -23.2703 -19.15248 -17.30519 0.9831770
192775.649 0
>> FDR
>> comp628_c0 0
>> comp2607_c0 0
>> comp2858_c0 0
>> comp2885_c0 0
>> comp2904_c0 0
>> comp2912_c0 0
>> comp2933_c0 0
>> comp2950_c0 0
>> comp2999_c0 0
>> comp3028_c0 0
>>> lrt.anova_like <- glmLRT(fit2, coef=1:4)
>> Error in mglmLevenberg(y, design = design, dispersion = dispersion,
offset = offset, :
>> BLAS/LAPACK routine 'DGEMM ' gave error code -13
>>
>>
>>
>>
>>
>> The DGELRT object says that is comparing
>>
>> $comparison
>> [1] "grpD20" "grpD40" "grpD60"
>>
>> but what exactly this means?
>>
>> Luis
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.inf
ormatics.conductor
>
>
>
[[alternative HTML version deleted]]
------------------------------
Message: 28
Date: Wed, 13 Mar 2013 00:59:20 -0700
From: "Ryan C. Thompson" <rct@thompsonclan.org>
To: Wolfgang Huber <whuber at="" embl.de="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] DESeq2
Message-ID: <514031D8.6010006 at thompsonclan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello,
I noticed the addition of the DESeq2 package a few days ago and had a
look at the new additions. Overall, it looks like an excellent package
(and it runs a lot faster than DESeq 1, too). I have a few questions
for
clarification of exactly what methods DESeq2 is using. Specifically, I
notice that the fold change shrinkage is performed in nbinomWaldTest
but
not in nbinomChisqTest. Is this just for reasons of
backward-compatibility of results, or is the Chi-squared test
logically
incompatible with shrunken GLM coefficients? Secondly, is there any
plan
to extend the Wald test to testing contrasts of multiple coefficients
or
testing multiple coefficients/contrasts at once in an ANOVA-like test?
Thanks,
-Ryan Thompson
On 03/12/2013 01:51 PM, Wolfgang Huber wrote:
> Dear DESeq users,
>
> Mike Love, Simon Anders and I have been updating the DESeq package.
This resulted in the package DESeq2, which is available from the
development branch, and scheduled for the next release:
http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
>
> For several release cycles, the original package (DESeq) will be
maintained at its current functionality, in order to not disrupt the
workflows of DESeq users. For new projects, we recommend using DESeq2.
Major innovations are:
>
> * Base class: SummarizedExperiment is used as the superclass for
storing the data, rather than eSet. This allows closer integration
with upstream workflows involving GRanges and summarizeOverlaps, and
facilitates downstream analyses of the genomic regions of interest.
>
> * Simplified workflow: the wrapper function DESeq() performs all
steps for a differential expression analysis. The individual steps are
of course also accessible.
>
> * More powerful statistics: incorporation of prior distributions
into the estimation of dispersions and fold changes (empirical-Bayes
shrinkage). The dispersion shrinkage improves power compared to the
old DESeq. The fold changes shrinkage help moderate the otherwise
large spread in log fold changes for genes with low counts, while it
has negligible effect on genes with high counts; it may be
particularly useful for visualisation, clustering, classification,
ordination (PCA, MDS), similar to the variance-stabilizing
transformation in the old DESeq. A Wald test for significance is
provided as the default inference method, with the chi-squared test of
the previous version is also available. A manuscript is in
preparation.
>
> * Normalization: it is possible to provide a matrix of sample- *and*
gene-specific normalization factors, which allows the use of
normalisation factors from Bioconductor packages such as cqn and
EDASeq.
>
> Examples of usage are provided in the vignette, and more details are
available in the manual pages (specifically, the DESeq function and
estimateDispersions function).
>
> Enjoy -
>
> Mike, Simon, Wolfgang.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 29
Date: Wed, 13 Mar 2013 09:40:11 +0100
From: Hans-Rudolf Hotz <hrh@fmi.ch>
To: chris_utah <chris.gregg at="" neuro.utah.edu="">
Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">
Subject: Re: [BioC] biomaRt access for NCBIM37 build of mouse genome
Message-ID: <51403B6B.9020700 at fmi.ch>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Hi Chris
use one of the archives. 'ensembl67' ("may2012.archive.ensembl.org")
was
the last ensembl release containing NCBIM37 (if I remember correctly).
So you can try:
> library(biomaRt)
> ensembl67=useMart(host='may2012.archive.ensembl.org',
biomart='ENSEMBL_MART_ENSEMBL')
> listDatasets(ensembl67)[grep("Mus",
listDatasets(ensembl67)$description),]
dataset description version
48 mmusculus_gene_ensembl Mus musculus genes (NCBIM37) NCBIM37
>
Regards, Hans-Rudolf
On 03/13/2013 04:35 AM, chris_utah wrote:
> Hello,
>
> I am using biomaRt. I want to retrieve transcript start and stop
info based on the NCBIM37 (mm9) build using getBM. I can't find this
annotation in listMarts(archive =T). Is anyone aware of a simple
solution?
>
> Thank you.
>
> best wishes,
> Chris
>
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] edgeR_3.0.8 limma_3.14.1 rtracklayer_1.18.0
biomaRt_2.14.0 GenomicRanges_1.10.5 IRanges_1.16.4
BiocGenerics_0.4.0
> [8] Gviz_1.2.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.36.0 AnnotationDbi_1.20.2 Biobase_2.18.0
Biostrings_2.26.2 biovizBase_1.6.0 bitops_1.0-4.2
> [7] BSgenome_1.26.1 cluster_1.14.3 colorspace_1.2-0
DBI_0.2-5 DESeq_1.10.1 dichromat_1.2-4
> [13] genefilter_1.40.0 geneplotter_1.36.0
GenomicFeatures_1.10.0 Hmisc_3.10-1 labeling_0.1
lattice_0.20-10
> [19] munsell_0.4 parallel_2.15.2 plyr_1.7.1
RColorBrewer_1.0-5 RCurl_1.95-3 Rsamtools_1.10.1
> [25] RSQLite_0.11.2 scales_0.2.2 splines_2.15.2
stats4_2.15.2 stringr_0.6.1 survival_2.36-14
> [31] tools_2.15.2 XML_3.95-0.1 xtable_1.7-0
zlibbioc_1.4.0
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 30
Date: Wed, 13 Mar 2013 09:45:43 +0100
From: Michael Love <michaelisaiahlove@gmail.com>
To: rct at thompsonclan.org, Wolfgang Huber <whuber at="" embl.de="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DESeq2
Message-ID: <51403CB7.7080002 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
hi Ryan,
Thanks for the comments.
On 03/13/13 08:59, Ryan C. Thompson wrote:
> Hello,
>
> I noticed the addition of the DESeq2 package a few days ago and had
a
> look at the new additions. Overall, it looks like an excellent
package
> (and it runs a lot faster than DESeq 1, too). I have a few questions
> for clarification of exactly what methods DESeq2 is using.
> Specifically, I notice that the fold change shrinkage is performed
in
> nbinomWaldTest but not in nbinomChisqTest. Is this just for reasons
of
> backward-compatibility of results, or is the Chi-squared test
> logically incompatible with shrunken GLM coefficients?
The latter was our reason. With shrunken coefficients, the
differences
in deviances are no longer distributed as a chi-square. For example,
with simulated null data (non-intercept betas equal to zero), the
differences in deviances will pile up near zero and the resulting
p-values will not be uniformly distributed.
> Secondly, is there any plan to extend the Wald test to testing
> contrasts of multiple coefficients or testing multiple
> coefficients/contrasts at once in an ANOVA-like test?
>
Yes, we are looking into a convenient interface for this.
best,
Mike
> Thanks,
>
> -Ryan Thompson
>
> On 03/12/2013 01:51 PM, Wolfgang Huber wrote:
>> Dear DESeq users,
>>
>> Mike Love, Simon Anders and I have been updating the DESeq package.
>> This resulted in the package DESeq2, which is available from the
>> development branch, and scheduled for the next release:
>> http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
>>
>> For several release cycles, the original package (DESeq) will be
>> maintained at its current functionality, in order to not disrupt
the
>> workflows of DESeq users. For new projects, we recommend using
>> DESeq2. Major innovations are:
>>
>> * Base class: SummarizedExperiment is used as the superclass for
>> storing the data, rather than eSet. This allows closer integration
>> with upstream workflows involving GRanges and summarizeOverlaps,
and
>> facilitates downstream analyses of the genomic regions of interest.
>>
>> * Simplified workflow: the wrapper function DESeq() performs all
>> steps for a differential expression analysis. The individual steps
>> are of course also accessible.
>>
>> * More powerful statistics: incorporation of prior distributions
into
>> the estimation of dispersions and fold changes (empirical-Bayes
>> shrinkage). The dispersion shrinkage improves power compared to the
>> old DESeq. The fold changes shrinkage help moderate the otherwise
>> large spread in log fold changes for genes with low counts, while
it
>> has negligible effect on genes with high counts; it may be
>> particularly useful for visualisation, clustering, classification,
>> ordination (PCA, MDS), similar to the variance-stabilizing
>> transformation in the old DESeq. A Wald test for significance is
>> provided as the default inference method, with the chi-squared test
>> of the previous version is also available. A manuscript is in
>> preparation.
>>
>> * Normalization: it is possible to provide a matrix of sample-
*and*
>> gene-specific normalization factors, which allows the use of
>> normalisation factors from Bioconductor packages such as cqn and
EDASeq.
>>
>> Examples of usage are provided in the vignette, and more details
are
>> available in the manual pages (specifically, the DESeq function and
>> estimateDispersions function).
>>
>> Enjoy -
>>
>> Mike, Simon, Wolfgang.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 31
Date: Wed, 13 Mar 2013 06:26:46 -0400
From: Sean Davis <sdavis2@mail.nih.gov>
To: "Marwaha, Shruti (marwahsi)" <marwahsi at="" mail.uc.edu="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Error in getGEO command with GSEMatrix=TRUE
Message-ID:
<caneavbn94wf=w_+r-qy+49e=fwk11kcfwza_wakz-myohbwtya at="" mail.gmail.com="">
Content-Type: text/plain
On Tue, Mar 12, 2013 at 9:55 PM, Marwaha, Shruti (marwahsi) <
marwahsi at mail.uc.edu> wrote:
> Dear Developers,
>
> I am trying to get an ExpressionSet from a GSEfile using getGEO
command
> with GSEMatrix=TRUE. It sometimes gives an error (Connection was
reset) if
> I use GSEMatrix=TRUE however sometimes it works smooth. I have the
latest
> version of R, Bioconductor and GEOquery. Please advise if I am
missing some
> library or need to change some settings. Thanks in advance.
>
> My Code
> library(Biobase)
> library(GEOquery)
> library(limma)
> gset <- getGEO("GSE27347", GSEMatrix =TRUE)
>
> Error:
> Error in function (type, msg, asError = TRUE) :
> Send failure: Connection was reset
>
>
Hi, Marwaha.
While I cannot be sure, I suspect that these are intermittent
connection
problems between you and NCBI. You could try wrapping your getGEO
call in
something like this to catch this condition :
gse = NULL
while(!inherits(gse,'list')) {gse=tryCatch(
getGEO('GSE27347'),
error=function(e) {Sys.sleep(5); message('retrying'); return(e)})}
This will continuously retry the getGEO call every 5 seconds until the
outcome is a list, which is what we expect getGEO to return in this
case.
Hopefully, this will make your code a bit more robust.
Sean
> Session Info
>
> > sessionInfo()
>
> R version 2.15.2 (2012-10-26)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
>
> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_India.1252
>
>
>
> attached base packages:
>
> [1] stats graphics grDevices utils datasets methods base
>
>
>
> other attached packages:
>
> [1] limma_3.14.4 GEOquery_2.24.1 Biobase_2.18.0
> BiocGenerics_0.4.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2
>
> System Properties
> OS: Windows 7
> RAM: 4GB
> Processor: 2GHz
>
> Methods already attempted
>
> * I have tried it on both RStudio and GUI
>
> * I have also tried using the command
"setInternet2(use=FALSE)"
> but it doesn't help.
>
> * I have tried this command with other GSEfiles also and get
the
> same error.
>
> * GEOquery works fine and downloads the GSEfiles if I give
> GSEMatrix=FALSE
>
> * I have also tried downloading the file and provide the
file path
> of the file to getGEO with GSEMatrix=TRUE but it does not give an
> ExpressionSet as output.
>
> Thanks & Regards,
> Shruti Marwaha
>
> Graduate Student
> Systems Biology and Physiology
> University of Cincinnati
> Medical Science Building,
> 231 Albert Sabin Way
> Cincinnati, OH, USA 45267
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 121, Issue 13