Works fine for me. My guess is that you can't install any Bioc
package,
not just edgeR.
Gordon
On Mon, 3 Oct 2011, Helena Persson wrote:
> Well, I installed the devel version of R, but if I then do:
> source("
http://www.bioconductor.org/biocLite.R")
> biocLite("edgeR")
I get the error message:
Using R version 2.14.0, biocinstall version 2.8.4.
Installing Bioconductor version 2.8 packages:
[1] "edgeR"
Please wait...
Warning: unable to access index for repository
http://bioconductor.org
/packages/2.8/bioc/bin/macosx/leopard/contrib/2.14
Warning: unable to access index for repository
http://bioconductor.org
/packages/2.8/data/annotation/bin/macosx/leopard/contrib/2.14
Warning: unable to access index for repository
http://bioconductor.org
/packages/2.8/data/experiment/bin/macosx/leopard/contrib/2.14
Warning: unable to access index for repository
http://bioconductor.org
/packages/2.8/extra/bin/macosx/leopard/contrib/2.14
Warning: unable to access index for repository http://brainarray.mbni.
med.umich.edu/bioc/bin/macosx/leopard/contrib/2.14
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package ?edgeR? is not available (for R Under development)
... so I guess I am doing something wrong.
Helena
________________________________________
Fr?n: Gordon K Smyth [smyth at wehi.EDU.AU]
Skickat: den 3 oktober 2011 08:53
Till: Helena Persson
Kopia: Bioconductor mailing list
?mne: Re: SV: edgeR on microRNA data
You need to install the devel version of R, available from CRAN. Then
you
get the devel version of edgeR and other Bioconductor packages
automatically.
Gordon
On Mon, 3 Oct 2011, Helena Persson wrote:
Dear Gordon,
Upgrading sounds like a good idea ? how do I install the devel version
of edgeR?
Best,
Helena
________________________________________
Fr?n: Gordon K Smyth [smyth at wehi.EDU.AU]
Skickat: den 3 oktober 2011 05:33
Till: Helena Persson
Kopia: Bioconductor mailing list
?mne: Re: edgeR on microRNA data
Dear Helena,
You will find it very helpful to upgrade your version of edgeR to the
current developmental version (although you will need to be using R
devel
aka R 2.14 to do so). You will find that exactTest() is now much
faster
and less memory consuming. The current release version is time
consuming
when the counts are large, mainly because of a change to the way in
which
the rejection region is computed that we implemented two months ago.
Fair comment about adding comments on prop.used. We had not
considered
that users would generally change this.
If you choose prior.n very small, then edgeR will simply use the
genewise
dispersion estimate that depends on the data from that gene alone.
This
is not over-fitting in itself. However it can lead to an increase in
the
FDR because edgeR does not take into account when doing significance
tests
of the uncertainty with which the dispersion is estimated.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.wehi.edu.au
http://www.statsci.org/smyth
------------ original message --------------
On Mon, 3 Oct 2011, Helena Persson wrote:
> Dear Gordon,
I guess I should start with some clarifications:
>I am concerned that you have decreased prop.used its default
> value of 0.3. I would tend to increase this rather than decrease
it.
For the microRNA data I have few genes but a relatively large
expression
range. My reason for decreasing the prop.used was that I suspected
that
using 30% would bin genes that had very different means of expression.
I
did not give this a lot of thought at the time and have now rerun the
analysis using 0.3. Maybe it would be good to comment a bit more on
this
parameter in the R help page or the edgeR vignette?
> On the other hand, you have increased prior.n from its default
value, which
> for your data would be a little over 0.5. Is this simply because it
gave
> better looking results? Anyway, increasing prior.n does not result
in
> overfitting. The risk with larger prior.n is simply that it may
start to
> return differentially expressed miRs that are increased or decreased
in only
> a few of the samples, rather than consistently for all samples in a
group.
I decided to remove two of the samples in the control group because
they
appeared to be outliers from the rest, so my smallest group is
actually 8
samples. I did not put together the control samples, but judging from
the
clinical data I got it is more hetereogeneous than the patient groups.
Choosing 2 for the prior.n was a compromise (I realised I should go
quite
low for my dataset, but using 0 as suggested by someone I talked to
produced very short lists of genes that did not look any better
judging
from boxplots). Actually, I was wondering if setting the prior very
low
(rather than high) could lead to overfitting of the variance estimate.
>How large are the common and tagwise dispersions for your data?
The common dispersion varies a little depending on how I group the
samples:
[1] 0.2681829 (three groups, 8 ctrl and 2 x 15 patients)
[1] 0.2788752 (two groups, 8 ctrl and 32 patients)
The tagwise dispersions (cds1 <- estimateTagwiseDisp(cds1, prior.n=1,
trend=TRUE, prop.used=0.3, grid=FALSE)):
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.09599 0.18370 0.24550 0.28160 0.31190 2.23000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1022 0.1894 0.2534 0.2916 0.3183 2.1890
A strange thing: When I run exactTest for the miRNA data (618 genes x
38
samples) edgeR becomes extremely memory-consuming, basically using up
all
of the 8 GB RAM on my laptop and then becomes painfully slow as the
memory
starts switching. When I run exactTest for CAGE data for the same
samples
(15066 genes x 40 samples) it never goes above 3 GB and finishes
rapidly.
I use a grid search for the CAGE data tagwise dispersion estimate and
the
library sizes are smaller (around 7 million counts vs 15 million), but
otherwise the previous steps are basically the same. Any experience
(or
qualified guess) of what might make the analysis use so much memory?
Thanks again,
Helena
On Sat, 1 Oct 2011, Gordon K Smyth wrote:
> Dear Helena,
>
> Compared with mRNA-Seq, you have an unusually small number of
transcripts but
> a relatively large number of biological replicates. This suggests
that you
> should use a relative small value for prior.n but a relatively large
value
> for prop.used. I am concerned that you have decreased prop.used its
default
> value of 0.3. I would tend to increase this rather than decrease
it.
>
> On the other hand, you have increased prior.n from its default
value, which
> for your data would be a little over 0.5. Is this simply because it
gave
> better looking results? Anyway, increasing prior.n does not result
in
> overfitting. The risk with larger prior.n is simply that it may
start to
> return differentially expressed miRs that are increased or decreased
in only
> a few of the samples, rather than consistently for all samples in a
group.
>
> Your experience with prior.n is unintuitive to me. Generally
speaking,
> choosing prior.n small means that each miR gets to set its own
dispersion, so
> that miR with large variance will not appear in the topTag list.
When you
> say "variance outliers", do you mean large or small variance?
>
> Since your minimum group sample size is 10, I would have required
miRs to
> satisfy your cpm requirement in >= 10 samples rather than 5.
>
> Best wishes
> Gordon
>
>> Date: Thu, 29 Sep 2011 05:25:14 +0000
>> From: Helena Persson <helena.persson at="" ki.se="">
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch="">
>> Subject: [BioC] edgeR on microRNA data
>>
>> Hi,
>
>> I would be grateful for some input on using edgeR for small RNA
sequence
>> data. I have been testing edgeR on a set of miRNA data (3 groups
with n=10,
>> 15 and 15). After removing genes that are not expressed at >= 0.2
cpm in >=
>> 5 samples I have ~600 rows left. I tried calculating the tagwise
dispersion
>> estimate with:
>>
>> cds1 <- estimateTagwiseDisp(cds1, prior.n=2, trend=TRUE,
prop.used=0.1,
>> grid=FALSE)
>>
>> Increasing the prior to e.g. 10 gives more differentially expressed
genes
>> that do not look bad. Decreasing the prior to 0 leaves me with
extremely
>> few differentially expressed genes that are mainly variance
outliers. I
>> guess that miRNA data is likely to behave differently from mRNA
data since
>> there are so few genes (but still a very large dynamic range). Is
it
>> possible that I am over-fitting the estimate? Would you recommend
changing
>> any other parameters?
>>
>> Best regards,
>> Helena
>> _________________________________
>>
>> Helena Persson, PhD
>>
>> Karolinska Institutet
>> Dept of Biosciences and Nutrition
>> H?lsov?gen 7-9
>> SE-141 83 Huddinge
>> Sweden
>>
>> Helena.Persson at ki.se
>>
>> tel. +46-(0)8-52481058
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:15}}