Hello Paolo-
There are a couple of ways of getting the consensus peaksets. If you
already have added them to your DBA object (e.g. by using
dba.peakset), you can retrieve the overlapping set (and/or write it to
a file) using dba.pekaset with the bRetrieve and writefile options:
peaks = dba.peakset(DBA, peaksetnum, bRetrieve=T,
writefile="overlappingpeaks.bed")
You can also get the overlapping peaks using dba.overlap:
overlaps = dba.overlap(DBA,c(peaksetnum1,peaksetnum2))
write.table(overlaps$inAll,file="overlappingpeaks.bed")
Also, I've added a new feature to dba.peakset to automatically add a
set of consensus peaksets all at once. For example, to add the
consensus peaksets for all samples that differ only in what peakcaller
was used:
DBA = dba.peakset(DBA, consensus = -DBA_CALLER)
Likewise to add consensus peaksets for each sample type by taking a
consensus of their replicates:
DBA = dba.peakset(DBA, consensus = -DBA_REPLICATE)
In this case, the minus sign means to combine peaksets that have the
same metadata EXCEPT a specific attributes (CALLER or REPLICATE). You
can also add consensus peaksets for all sets of samples that share
specific metadata, eg.:
DBA = dba.peakset(DBA, consensus= = c(DBA_TISSUE, DBA_CONDITION))
You can look at just the added peaksets by using the Consensus mask:
dba.show(DBA,DBA$masks$Consensus)
Hope this is helpful-
Rory
On 14/09/2012 11:00, "bioconductor-request@r-project.org<mailto :bioconductor-request@r-project.org="">" <bioconductor- request@r-project.org<mailto:bioconductor-request@r-project.org="">>
wrote:
Send Bioconductor mailing list submissions to
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request@r-project.org<mailto:bioconductor- request@r-project.org="">
You can reach the person managing the list at
bioconductor-owner@r-project.org<mailto:bioconductor- owner@r-project.org="">
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. Re: limma separate channel analysis of your data
(Sanogo, Yibayiri O)
2. Re: finding end of file in FASTA file (Martin Morgan)
3. Set plot title in makeVennDiagram (ChIPpeakAnno)
(Ant?nio Miguel de Jesus Domingues)
4. Re: Set plot title in makeVennDiagram (ChIPpeakAnno)
(Zhu, Lihua (Julie))
5. Set plot title in makeVennDiagram (ChIPpeakAnno)
(Ant?nio Miguel de Jesus Domingues)
6. Re: Set plot title in makeVennDiagram (ChIPpeakAnno)
(Zhu, Lihua (Julie))
7. creating GSEA files using biomart (Juliet Hannah)
8. Re: creating GSEA files using biomart (Steffen Durinck)
9. Re: GWASTools mendelList Constructor (Samuel Younkin)
10. DiffBind - sample sheet for multiple replicates and peak
callers (Ant?nio Miguel de Jesus Domingues)
11. Re: creating GSEA files using biomart (Juliet Hannah)
12. Re: topGO: full-length descriptions in graph output
(James W. MacDonald)
13. How to design matrix that account to effect of transgenic
event in edgeR ? (Sermsawat Tunlaya-Anukit)
14. Re: limma separate channel analysis of your data (Gordon K
Smyth)
15. edgeR: confusing BCV plot (Gordon K Smyth)
16. Install devel version of edgeR (Gordon K Smyth)
17. Re: HTqPCR error (Heidi Dvinge)
18. Re: Install devel version of edgeR (Gordon K Smyth)
19. Re: DiffBind - overlap between different peak callers
(Paolo Kunderfranco)
20. Re: DiffBind - sample sheet for multiple replicates and peak
(Paolo Kunderfranco)
21. DiffBind - error in dba.count (Ant?nio Miguel de Jesus
Domingues)
----------------------------------------------------------------------
Message: 1
Date: Thu, 13 Sep 2012 10:12:58 +0000
From: "Sanogo, Yibayiri O"
<sanogo@illinois.edu<mailto:sanogo@illinois.edu>>
To: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>>
Cc: Bioconductor mailing list
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>,
Osee
Sanogo
<sanogo@life.illinois.edu<mailto:sanogo@life.illinois.edu>>
Subject: Re: [BioC] limma separate channel analysis of your data
Message-ID:
<dd4a5aa12039fb46b3ac4eafff27533c1122fcc4@citesmbx4.ad.uillino is.edu<mailto:dd4a5aa12039fb46b3ac4eafff27533c1122fcc4@citesmbx4.ad.ui="" llinois.edu="">>
Content-Type: text/plain
Dear Gordon,
In the code below,
design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
fit <- lmscFit(MA,design)
fit <- eBayes(fit, trend=TRUE)
shouldn't we include the intracorrelation coefficient in the single
channel model?
Is the following code correct?
corfit <- intraspotCorrelation(MA, design)
fit <- lmscFit(MA,design, correlation=corfit$consensus)
Please let me know if this is the right way of doing it.
Thank you.
Osee
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>]
Sent: Tuesday, September 11, 2012 8:13 PM
To: Sanogo, Yibayiri O
Cc: Osee Sanogo; Bioconductor mailing list
Subject: limma separate channel analysis of your data
Dear Osee,
Thank you for attaching the revised design of your experiment. I found
it
somewhat tricky to analyse, so my recommendations are more involved
than I
usually give. I think that you will need to perform a separate channel
analysis, otherwise it will not be possible to make baseline
comparisons
between the brain parts.
I also think that you may need to re-normalize your data using the
limma
package. You say that the data has been "loess/scale normalized into
an
expression set", but I don't think that scale normalization should be
necessary for Agilent data, and an expression set is not fully
suitable
for two colour data. The following code assumes that you have
normalized
the data into an MAList object using the limma package. I will call it
'MA'. I strongly recommend that you use normexp background correction
and
loess normalization as described in the limma User's Guide for GenePix
data.
The first step is to put your targets data into separate channel
format:
targets <- read.delim("Design_Brain.txt")
Treat <- as.vector(t(targets[,c("Cy3","Cy5")]))
Treat <- ifelse(Treat>0,"Exp","Ctrl")
Treat <- factor(Treat)
Brain <- rep(targets$Brain.Part,each=2,len=48)
Fish <- factor(rep(targets$Fish.Number,each=2,len=48))
Dye <- rep(0:1,len=48)
data.frame(Fish,Brain,Treat,Dye)
Now you can fit a linear model, correcting for probe-specific dye-
bias,
correcting for any differences between the fish, and computing
Treatment
vs Control effects for each brain part:
design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
fit <- lmscFit(MA,design)
fit <- eBayes(fit, trend=TRUE)
Now you can extract genes that have a treatment effect. To find genes
that have a treatment vs control effect in telencephalon, you would
use:
topTable(fit, coef="BrainT:TreatExp")
For genes DE for treatment vs control in brain stem, it would be
topTable(fit, coef="BrainBS:TreatExp")
And so on.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote:
Dear Gordon,
Attached please find the revised design of my experiment.
Thank you for your willingness to help me analyze these data.
Best regards,
Osee
________________________________________
From: Sanogo, Yibayiri O
Sent: Monday, September 10, 2012 9:51 PM
To: Gordon K Smyth; Osee Sanogo
Cc: Bioconductor mailing list
Subject: RE: Advice with RemoveBatchEffect and Rank Product package
Dear Gordon,
Please ignore the last two columns (weight and length) of the design
for
now: I think they are screwed up due to some kind of sorting. I will
have
to double check them (look in the notebook which is not here now).
Only
different brain regions (T, D, C, BS) belonging to the same fish
should
have same weight, length.
The other columns are:
Fish.Number= corresponds to fish ID, the individual fish (biological
replicates) Slide= the Agilent 4X44k slides on each the arrays were
located Brain.Part= the different brain regions that were dissected
and
hybridized (within region). They were T (telencephalon), D
(diencephalon),
C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is
control
and "1" is experimental.
FileName Cy3 Cy5 Fish Number Slide Brain Part
1T.gpr 1 -1 1 2 T
2T.gpr -1 1 2 1 T
3T.gpr 1 -1 3 4 T
4T.gpr -1 1 4 3 T
5T.gpr 1 -1 5 6 T
6T.gpr -1 1 6 5 T
1D.gpr -1 1 1 5 D
2D.gpr 1 -1 2 4 D
3D.gpr -1 1 3 1 D
4D.gpr 1 -1 4 6 D
5D.gpr -1 1 5 3 D
6D.gpr 1 -1 6 2 D
1C.gpr 1 -1 1 4 C
2C.gpr -1 1 2 3 C
3C.gpr 1 -1 3 6 C
4C.gpr -1 1 4 5 C
5C.gpr 1 -1 5 2 C
6C.gpr -1 1 6 1 C
1BS.gpr -1 1 1 1 BS
2BS.gpr 1 -1 2 2 BS
3BS.gpr -1 1 3 3 BS
4BS.gpr 1 -1 4 4 BS
6BS.gpr 1 -1 6 6 BS
Thank you for your willingness to help. I will stay up a bit so that I
can
respond faster to your questions given the time difference between us.
Thank you so much.
Osee
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>]
Sent: Monday, September 10, 2012 6:34 PM
To: Osee Sanogo
Cc: Sanogo, Yibayiri O; Bioconductor mailing list
Subject: Re: Advice with RemoveBatchEffect and Rank Product package
Dear Osee,
Can you explain what the columns Fish.Number, Slide, Brain.Part,
Weight
and Length mean in your experiment? If the first six arrays are from
biologically independent samples, why do they have the same weight and
length?
Best wishes
Gordon
On Mon, 10 Sep 2012, Osee Sanogo wrote:
Dear Gordon,
>From what you said, it seems that I am oversimplifying my experiment
by
attempting to analyze it with RankProd, which doesn't offer the option
for
complex modeling.
Could you please explain to me how I could analyze the experiment
using
Limma?
Please let me know if you'd like me to provide further details of the
experiment.
Thank you so much.
Osee
On 9/10/12 2:04 AM, "Gordon K Smyth"
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Osee,
No, you can't use removeBatchEffect to control for dye bias.
Can you ignore the dye effect? Not in general, but who knows?
Your experiment seems too complex to be properly analysed using
RankProd.
For one thing, it seems clear that you have obtained multiple parts of
the
brain from the same biological replicates, meaning that your samples
are
paired by fish number.
I could explain how to analyse this experiment using limma. However,
if
you are determined that you will use RankProd, it might be best to
email
the authors of that package for advice.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Sun, 9 Sep 2012, Osee Sanogo wrote:
Dear Gordon,
Thank you for getting back to me about my questions.
My experiment is trying to identify differentially expressed genes in
four
regions of the brain in response to a stressor. I have 6 biol.
replicates in
each brain region for the control and experimental groups in each
region,
and the comparison is being done within brain region (i.e., T control
vs T
exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample
were
run in two-color Agilent Array.
You're right that the design I sent was from the separate channel
analysis,
in which I attempted to account for array and dye effect, and then run
the
data in RankProd. Now I know that this is not right. Ok, I will use
the
single channel analysis in Limma.
I still would like to run the two-channel data (ratios) in RankProd,
as my
previous experience found this useful for my dara (low replicate
numbers).
My questions: 1) Could I use RemoveBatchEffect to control for dye bias
before running the two-channel data in RankProd? If yes, how should I
do
this using the RemoveBatch Effect function?
2) I found that about 3% of my probes have dye effect. Can I
omit controlling for dye effect without compromising the result of my
analysis?
The data were loess/scale normalized into an expression set (Data_RP).
Here is the design of the experiment
FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length
1 1T.gpr 1 -1 1 2 T 39 0.63
2 2T.gpr -1 1 2 1 T 39 0.63
3 3T.gpr 1 -1 3 4 T 39 0.63
4 4T.gpr -1 1 4 3 T 39 0.63
5 5T.gpr 1 -1 5 6 T 39 0.63
6 6T.gpr -1 1 6 5 T NA NA
7 1D.gpr -1 1 1 5 D 47 1.21
8 2D.gpr 1 -1 2 4 D 47 1.21
9 3D.gpr -1 1 3 1 D 47 1.21
10 4D.gpr 1 -1 4 6 D 47 1.21
11 5D.gpr -1 1 5 3 D 47 1.21
12 6D.gpr 1 -1 6 2 D NA NA
13 1C.gpr 1 -1 1 4 C 47 1.31
14 2C.gpr -1 1 2 3 C 47 1.31
15 3C.gpr 1 -1 3 6 C 47 1.31
16 4C.gpr -1 1 4 5 C 47 1.31
17 5C.gpr 1 -1 5 2 C 47 1.31
18 6C.gpr -1 1 6 1 C NA NA
19 1BS.gpr -1 1 1 1 BS 89 1.44
20 2BS.gpr 1 -1 2 2 BS 89 1.44
21 3BS.gpr -1 1 3 3 BS 89 1.44
22 4BS.gpr 1 -1 4 4 BS NA NA
23 5BS.gpr -1 1 5 5 BS NA NA
24 6BS.gpr 1 -1 6 6 BS NA NA
Thank you for your help and please let me know if you need further
explanation of the experiment.
Best regards,
Osee
On 9/9/12 7:24 PM, "Gordon K Smyth"
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Osee,
You are attempting to do a number of things that don't seem correct to
me.
First, you seem to attempting a separate channel analysis of two color
microarray data, but ignoring the pairing of the red and green
channels.
It isn't correct to do this. I don't see any way to use RankProd, or
any
other package designed for independent samples, correctly in this
context.
If you must do a separate channel analysis, you would be better off
using
the separate channel analysis facilities of the limma package.
Second, when you set batch=rep(1,24), you are defining a batch that
consists of every array in your data set. Obviously it doesn't make
sense
to remove batch effects unless there are at least two batches.
Third, I don't follow your design matrix.
It would be better to go back to the start, and describe in more basic
terms what is the nature of your data and what comparison you are
trying
to make.
Best wishes
Gordon
Date: Sat, 8 Sep 2012 11:40:45 +0000
From: "Sanogo, Yibayiri O"
<sanogo@illinois.edu<mailto:sanogo@illinois.edu>>
To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package
Dear Members of the list,
(I apologize for posting this again -I sent it earlier to the list but
from another account and I was listed me as non-Member -and Member I
am
since 2008:-)).
I have been using Rank Prod to analyze Agilent two-color data.
However, I
would like to remove the dye effect prior to analysis. I read on the
forum
that RemoveBatchEffect should be used in the Limma linear model, a
type of
modeling that is not in Rank Product.
I have two questions:
1) Would it be appropriate to use RemoveBatchEffect to correct for dye
effect prior to running the expression data using Rank Prod?
2) a) If no, what other function could I use to do this?
b) If yes, I would like a help with the correct design and how to
properly indicate the batch.
Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS
=are
different areas of the brain):
design1
BS C D T
1 0 0 0 1
2 0 0 0 -1
3 0 0 0 1
4 0 0 0 -1
5 0 0 0 1
6 0 0 0 -1
7 0 0 -1 0
8 0 0 1 0
9 0 0 -1 0
10 0 0 1 0
11 0 0 -1 0
12 0 0 1 0
13 0 1 0 0
14 0 -1 0 0
15 0 1 0 0
16 0 -1 0 0
17 0 1 0 0
18 0 -1 0 0
19 -1 0 0 0
20 1 0 0 0
21 -1 0 0 0
22 1 0 0 0
23 -1 0 0 0
24 1 0 0 0
attr(,"assign")
[1] 1 1 1 1
I've tried this (Data_RP are my data, the M values of the expression
set):
DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL,
design=design1)
but it is returning an error message
" Error in contr.sum(levels(batch)) :
not enough degrees of freedom to define contrasts"
Please help me correct this code.
Thank you so much for your help.
Osee
-- -- --
Y. Osee Sanogo
Integrative Biology
Institute for Genomic Biology
University of Illinois at Urbana
505 S. Goodwin Ave
Urbana, IL-61801
Tel: 217-333 2308 (Office)
217-417 9593 (Cell)
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:9}}
------------------------------
Message: 2
Date: Thu, 13 Sep 2012 05:58:35 -0700
From: Martin Morgan <mtmorgan@fhcrc.org<mailto:mtmorgan@fhcrc.org>>
To: "Jack [guest]"
<guest@bioconductor.org<mailto:guest@bioconductor.org>>
Cc: mailme842@gmail.com<mailto:mailme842@gmail.com>,
bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] finding end of file in FASTA file
Message-ID:
<5051D87B.7060307@fhcrc.org<mailto:5051d87b.7060307@fhcrc.org>>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 09/13/2012 01:42 AM, Jack [guest] wrote:
library(ShortRead)
fastadata <- readFasta("fastafolder", "fa$")
file <- tempfile()
writeFasta(fastadata, file)
var1 <- readLines(file)
while(countlength(tmp <- readLines(file, n = -1)) > 0) {
#do something
}
I want the while loop to run till the end of file is reached, but the
while statement dosent work. Thanks for help.
Hi Jack -- if the goal is to read the fasta file in chunks, use a
'connection' that can remember the current location. After running the
following to get a reproducible example fasta file
library(ShortRead)
example(readFasta)
fl = dir(analysisPath(sp), "s_1_sequence.txt", full=TRUE)
we can create a connection and open it, and the do our loop reading
500
lines at a time
con <- file(fl); open(con)
while(length(res <- readLines(con, n=500)))
cat(length(res), "\n")
close(con)
which prints out
500
500
24
Unfortunately, readFasta doesn't work on connections (that would be a
worthwhile feature request). There is also FaFile in Rsamtools, try
example(FaFile)
FaFile is most useful when the fasta file would benefit from being
indexed, e.g., hundreds of contigs, but might also be useful for your
purposes.
Martin
Regards
Jack
-- output of sessionInfo():
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.14.4 latticeExtra_0.6-24 RColorBrewer_1.0-5
Rsamtools_1.8.6 lattice_0.20-10 Biostrings_2.24.1
GenomicRanges_1.8.13
[8] IRanges_1.14.4 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1 hwriter_1.3
stats4_2.15.1 tools_2.15.1 zlibbioc_1.2.0
--
Sent via the guest posting facility at bioconductor.org.
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
------------------------------
Message: 3
Date: Thu, 13 Sep 2012 15:03:09 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno)
Message-ID:
<capacvobkwtdhug9m0e1prpk3qpfctmaqc6vemqzpi4npago7ca@mail.gmai l.com<mailto:capacvobkwtdhug9m0e1prpk3qpfctmaqc6vemqzpi4npago7ca@mail.="" gmail.com="">>
Content-Type: text/plain
Hi Bioconducters,
I am comparing 2 list of ChIP peaks obtained with different peak
callers
and would like to change some of the graphical options of the plot
created
by makeVennDiagram. Crucially I would like to add a title to the plot.
Is there any way to change the graphical options of makeVennDiagram?
Cheers,
Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
Message: 4
Date: Thu, 13 Sep 2012 13:35:40 +0000
From: "Zhu, Lihua (Julie)"
<julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>
To: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>,
"bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno)
Message-ID: <cc775a0a.2227%julie.zhu@umassmed.edu<mailto:cc775a0a.2227 %julie.zhu@umassmed.edu="">>
Content-Type: text/plain; charset="iso-8859-1"
Anotonio,
Thanks for the great suggestion! Currently makeVennDiagram only
supports
limited plotting options, i.e., cex and count.col. I will add the
title in
my to do list.
Best regards,
Julie
On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues"
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
Hi Bioconducters,
I am comparing 2 list of ChIP peaks obtained with different peak
callers
and would like to change some of the graphical options of the plot
created
by makeVennDiagram. Crucially I would like to add a title to the plot.
Is there any way to change the graphical options of makeVennDiagram?
Cheers,
Ant?nio
------------------------------
Message: 5
Date: Thu, 13 Sep 2012 15:41:32 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: "Zhu, Lihua (Julie)"
<julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno)
Message-ID:
<capacvoak_6meyap5j+kcwhjx- u2nfwm3dpmql="_-zCdNfbG3_A@mail.gmail.com<mailto:CAPaCvoAK_6MEYAP5J" +kcwhjx-u2nfwm3dpmql="_-zCdNfbG3_A@mail.gmail.com">>
Content-Type: text/plain
Hi Julie,
Thanks!
If it is not asking too much and how much work it takes, but adding
colours
to the background of the circles would also be nice for presentations
and
papers. Basically some possibilities along the lines of the R package
VennDiagram (
http://www.biomedcentral.com/1471-2105/12/35)
Good package you've developed btw.
Ant?nio
On 13 September 2012 15:35, Zhu, Lihua (Julie)
<julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>wrote:
Anotonio,
Thanks for the great suggestion! Currently makeVennDiagram only
supports
limited plotting options, i.e., cex and count.col. I will add the
title in
my to do list.
Best regards,
Julie
On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues"
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
> Hi Bioconducters,
>
> I am comparing 2 list of ChIP peaks obtained with different peak
callers
> and would like to change some of the graphical options of the plot
created
> by makeVennDiagram. Crucially I would like to add a title to the
plot.
>
> Is there any way to change the graphical options of makeVennDiagram?
>
> Cheers,
> Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
Message: 6
Date: Thu, 13 Sep 2012 13:45:32 +0000
From: "Zhu, Lihua (Julie)"
<julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>
To: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] Set plot title in makeVennDiagram (ChIPpeakAnno)
Message-ID: <cc775c59.222c%julie.zhu@umassmed.edu<mailto:cc775c59.222c %julie.zhu@umassmed.edu="">>
Content-Type: text/plain
Will do! Thanks for the positive feedback and great suggestion!
Best regards,
Julie
On 9/13/12 9:41 AM, "Ant?nio Miguel de Jesus Domingues"
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
Hi Julie,
Thanks!
If it is not asking too much and how much work it takes, but adding
colours to the background of the circles would also be nice for
presentations and papers. Basically some possibilities along the lines
of the R package VennDiagram
(
http://www.biomedcentral.com/1471-2105/12/35)
Good package you've developed btw.
Ant?nio
On 13 September 2012 15:35, Zhu, Lihua (Julie)
<julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> wrote:
Anotonio,
Thanks for the great suggestion! Currently makeVennDiagram only
supports
limited plotting options, i.e., cex and count.col. I will add the
title in
my to do list.
Best regards,
Julie
On 9/13/12 9:03 AM, "Ant?nio Miguel de Jesus Domingues"
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
Hi Bioconducters,
I am comparing 2 list of ChIP peaks obtained with different peak
callers
and would like to change some of the graphical options of the plot
created
by makeVennDiagram. Crucially I would like to add a title to the plot.
Is there any way to change the graphical options of makeVennDiagram?
Cheers,
Ant?nio
[[alternative HTML version deleted]]
------------------------------
Message: 7
Date: Thu, 13 Sep 2012 11:29:06 -0400
From: Juliet Hannah
<juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] creating GSEA files using biomart
Message-ID:
<calzuzrtgppvifd4+n43bg7efgvrl8uq0nghxe4wd+gouow3fug@mail.gmai l.com<mailto:calzuzrtgppvifd4+n43bg7efgvrl8uq0nghxe4wd+gouow3fug@mail.="" gmail.com="">>
Content-Type: text/plain; charset=ISO-8859-1
All,
I am trying to create the GSEA chip file. This example uses Affy data,
and the chip file is already available. I'm
doing this as an exercise in preparation for other platforms.
The chip file should look like:
Probe Set ID Gene Symbol Gene Title
244901_at ORF25 hypothetical protein
244902_at NAD4L NADH dehydrogenase subunit 4L
244912_at CCB382 cytochrome c biogenesis orf382
244919_at CCB203 cytochrome c biogenesis orf203
244925_at NAD7 NADH dehydrogenase subunit 7
How can I obtain the third column from biomart. I tried searching the
attributes, but couldn't find the right name. Is it a matter of trial
and error to find the correct attribute, or
are there systematic ways to find it. Here is what I have so far:
library("biomaRt")
probeSets <- c("219666_at", "220547_s_at", "218034_at")
ensembl = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters
= "affy_hg_u133a",values = probeSets, mart = ensembl)
Also, does anyone have any suggestions regarding how to handle the
duplicates (seen in this example) with respect to GSEA.
Thanks,
Juliet Hannah
------------------------------
Message: 8
Date: Thu, 13 Sep 2012 08:42:45 -0700
From: Steffen Durinck
<durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>>
To: Juliet Hannah
<juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>
Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] creating GSEA files using biomart
Message-ID:
<capb0=vm67-3=paowcbk18etqjuukbre5+zbnonmqoqcbsyfpvq@mail.gmai l.com<mailto:capb0="Vm67-3=PaOWcBK18EtQJuUKBRE5+zBNONMQoqCbSYfPvQ@mail." gmail.com="">>
Content-Type: text/plain
Hi Juliet,
The third attribute you're looking for is 'description':
idens <- getBM(attributes =
c("affy_hg_u133a","hgnc_symbol","description"), filters
="affy_hg_u133a",values = probeSets, mart = ensembl)
Gives:
affy_hg_u133a hgnc_symbol
description
1 219666_at MS4A6A membrane-spanning 4-domains,
subfamily A, member 6A [Source:HGNC Symbol;Acc:13375]
2 220547_s_at FAM35B family with sequence
similarity 35, member B [Source:HGNC Symbol;Acc:31425]
3 218034_at FIS1 fission 1 (mitochondrial outer membrane)
homolog (S. cerevisiae) [Source:HGNC Symbol;Acc:21689]
4 220547_s_at FAM35B2 family with sequence similarity 35,
member B2 (pseudogene) [Source:HGNC Symbol;Acc:34038]
5 220547_s_at FAM35A family with sequence
similarity 35, member A [Source:HGNC Symbol;Acc:28773]
There is no systematic way to figure out with attribute name you need
to
use all you have is the attribute name and a description of the
attribute.
The more you get used to looking at those, the easier it gets to
figure
out which one you need and once you know the attributes you need,
often
you'll be using a similar set of attributes most of the time
It is interesting to see in your example that one probeset maps to
three
different but closely related genes. In the past I thought Ensembl
would
remove such unambiguous mappers. I think the best to do in this case
is
to remove all probes that map to multiple genes as there is no way to
tell
which gene you'll be measuring. I'll report this example to the
Ensembll
team as they used to do this for us.
Cheers,
Steffen
On Thu, Sep 13, 2012 at 8:29 AM, Juliet Hannah
<juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>wrote:
All,
I am trying to create the GSEA chip file. This example uses Affy data,
and the chip file is already available. I'm
doing this as an exercise in preparation for other platforms.
The chip file should look like:
Probe Set ID Gene Symbol Gene Title
244901_at ORF25 hypothetical protein
244902_at NAD4L NADH dehydrogenase subunit 4L
244912_at CCB382 cytochrome c biogenesis orf382
244919_at CCB203 cytochrome c biogenesis orf203
244925_at NAD7 NADH dehydrogenase subunit 7
How can I obtain the third column from biomart. I tried searching the
attributes, but couldn't find the right name. Is it a matter of trial
and error to find the correct attribute, or
are there systematic ways to find it. Here is what I have so far:
library("biomaRt")
probeSets <- c("219666_at", "220547_s_at", "218034_at")
ensembl = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters
= "affy_hg_u133a",values = probeSets, mart = ensembl)
Also, does anyone have any suggestions regarding how to handle the
duplicates (seen in this example) with respect to GSEA.
Thanks,
Juliet Hannah
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
------------------------------
Message: 9
Date: Thu, 13 Sep 2012 11:53:28 -0400
From: Samuel Younkin <syounkin@jhsph.edu<mailto:syounkin@jhsph.edu>>
To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] GWASTools mendelList Constructor
Message-ID:
<50520178.7020205@jhsph.edu<mailto:50520178.7020205@jhsph.edu>>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Thanks Stephanie, your response was very helpful.
One thing I found was that the offspring, father and mother vectors
must
be character vectors, not factor vectors. If they are factors
mendelList() will not work. One must be careful not to let them be
coerced into factors, say during construction of the data frame. The
inclusion of stringsAsFactors=FALSE is necessary in the below code.
Thanks,
Sam
On 09/12/2012 11:37 AM, Stephanie M. Gogarten wrote:
Hi Sam,
Sorry that the documentation for this function is not very clear - I
will work on revising it for the next release at the beginning of
October.
The offspring, mother, and father vectors should be subject IDs, which
are not required to be integers. The scanID vector should be the scan
ID of the offspring, which should be integers. I suspect your problem
might be that all subjects with genotypes must have an entry in all
the
vector arguments to mendelList - think of the arguments as being
columns
of a data frame that contains all your subjects. This is because the
mendelList function needs to know the scan IDs of the father and
mother
as well, and all the vector arguments must have the same length.
Here is a toy example of some IDs that produce a valid mendelList
object:
> dat <- data.frame(family=c(1,1,1), offspring=c("a","b","c"),
father=c("b",0,0), mother=c("c",0,0), sex=c("M","M","F"), scanID=1:3,
stringsAsFactors=FALSE)
> dat
family offspring father mother sex scanID
1 1 a b c M 1
2 1 b 0 0 M 2
3 1 c 0 0 F 3
> mendelList(dat$family, dat$offspring, dat$father, dat$mother,
dat$sex, dat$scanID)
$`1`
$`1`$a
offspring father mother
1 1 2 3
attr(,"class")
[1] "mendelList"
If you only include the first row, without separate entries for the
father and mother, you get a NULL:
> dat <- dat[1,]
> dat
family offspring father mother sex scanID
1 1 a b c M 1
> mendelList(dat$family, dat$offspring, dat$father, dat$mother,
dat$sex, dat$scanID)
NULL
If this doesn't solve your problem, please send a small example of
your
code that gives a NULL result.
Stephanie
-------- Original Message --------
Subject: [BioC] GWASTools mendelList Constructor
Date: Wed, 12 Sep 2012 10:36:58 -0400
From: Samuel Younkin <syounkin@jhsph.edu<mailto:syounkin@jhsph.edu>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
BioC,
I am trying to use the Mendelian error checking feature of the
Bioconductor package GWASTools. When I try to create a mendelList
object using the constructor function mendelList(), it consistently
returns NULL with no warning or error message.
I have tried changing the subject ids to character, or integer, or
numeric, but I continue to get NULL. Without any error message to
work
with I am at a loss. Any advice?
Should the offspring, mother and father vectors be scan ids or
scan/subject names? My scan names are not numbers and so these
vectors
would not satisfy the requirement in the manual of "a vector of
offspring/father/mother ID numbers."
Should the scanID vector be the scan id of the offspring?
Any advice would be greatly appreciated.
Thanks,
Sam
> sessionInfo()
R version 2.15.1 Patched (2012-07-01 r59713)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
[3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
[5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] MinimumDistance_1.1.20 CleftCNVAssoc_0.0.1
GenomicRanges_1.9.63
[4] IRanges_1.15.43 GWASTools_1.2.1 sandwich_2.2-9
[7] zoo_1.7-7 GWASExactHW_1.0 ncdf_1.6.6
[10] Biobase_2.17.6 BiocGenerics_0.3.1 BiocInstaller_1.4.7
loaded via a namespace (and not attached):
[1] affyio_1.25.0 annotate_1.35.3
AnnotationDbi_1.19.35
[4] Biostrings_2.25.8 bit_1.1-8 codetools_0.2-8
[7] crlmm_1.15.18 DBI_0.2-5 DNAcopy_1.31.1
[10] ellipse_0.3-7 ff_2.2-7 foreach_1.4.0
[13] genefilter_1.39.0 grid_2.15.1 iterators_1.0.6
[16] lattice_0.20-6 lmtest_0.9-30 msm_1.1.1
[19] mvtnorm_0.9-9992 oligoClasses_1.19.41 preprocessCore_1.19.0
[22] RSQLite_0.11.1 SNPchip_2.3.13 splines_2.15.1
[25] stats4_2.15.1 survival_2.36-14 tools_2.15.1
[28] VanillaICE_1.19.23 XML_3.9-4 xtable_1.7-0
[31] zlibbioc_1.3.0
>
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 10
Date: Thu, 13 Sep 2012 18:06:05 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] DiffBind - sample sheet for multiple replicates and
peak callers
Message-ID:
<capacvobu89nekmnu- 2jojczybqwdhd="ut=b5O4hL10k+axXgNQ@mail.gmail.com<mailto" :capacvobu89nekmnu-2jojczybqwdhd="ut=b5O4hL10k+axXgNQ@mail.gmail.com">>
Content-Type: text/plain
Hi all,
I am trying to use DiffBind to compare peaks called in control vs
condition. I have 2 replicates for each and I've also called peaks
using 2
different peak callers (to wi, MACS and QuEST). I've also prepared a
sample
data sheet that looks like this:
SampleID Tissue Factor Condition Replicate Peak.caller
bamReads
bamControl Peaks
control Hela TF wt 1
MACS path path path
control Hela TF wt 1
QuEST path path path
control2 Hela TF wt 2
MACS path path path
control 2 Hela TF wt 2
QuEST path path path
(and the same for the conditions)
My plan was to load all the data and then using diffbind selecte a set
of
common peaks for the peak callers before proceeding with the analysis.
However, when I load the data (data =
dba(sampleSheet="samplesheet.csv"))
the peaks for each caller are not recognized as a different variable.
How
I can do that and is this silly?
I could also derive a set of common peaks independently but it would
be
neat to do it all with the same package and that seems to be possible
but I
could not find how to do it in the documentation.
Thanks,
Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
Message: 11
Date: Thu, 13 Sep 2012 12:08:32 -0400
From: Juliet Hannah
<juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>
To: Steffen Durinck
<durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>>
Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] creating GSEA files using biomart
Message-ID:
<calzuzrqhues-gbza- q1jth0pgm6ohggbha_dqvwuug3edv1wbg@mail.gmail.com<mailto:calzuzrqhues-="" gbza-q1jth0pgm6ohggbha_dqvwuug3edv1wbg@mail.gmail.com="">>
Content-Type: text/plain; charset=ISO-8859-1
Thanks Steffen for the helpful answers. "description", how
embarrassing!
On Thu, Sep 13, 2012 at 11:42 AM, Steffen Durinck
<durinck.steffen@gene.com<mailto:durinck.steffen@gene.com>> wrote:
Hi Juliet,
The third attribute you're looking for is 'description':
idens <- getBM(attributes =
c("affy_hg_u133a","hgnc_symbol","description"),
filters ="affy_hg_u133a",values = probeSets, mart = ensembl)
Gives:
affy_hg_u133a hgnc_symbol
description
1 219666_at MS4A6A membrane-spanning 4-domains,
subfamily A, member 6A [Source:HGNC Symbol;Acc:13375]
2 220547_s_at FAM35B family with sequence
similarity 35, member B [Source:HGNC Symbol;Acc:31425]
3 218034_at FIS1 fission 1 (mitochondrial outer membrane)
homolog
(S. cerevisiae) [Source:HGNC Symbol;Acc:21689]
4 220547_s_at FAM35B2 family with sequence similarity 35,
member
B2 (pseudogene) [Source:HGNC Symbol;Acc:34038]
5 220547_s_at FAM35A family with sequence
similarity 35, member A [Source:HGNC Symbol;Acc:28773]
There is no systematic way to figure out with attribute name you need
to use
all you have is the attribute name and a description of the attribute.
The
more you get used to looking at those, the easier it gets to figure
out
which one you need and once you know the attributes you need, often
you'll
be using a similar set of attributes most of the time
It is interesting to see in your example that one probeset maps to
three
different but closely related genes. In the past I thought Ensembl
would
remove such unambiguous mappers. I think the best to do in this case
is to
remove all probes that map to multiple genes as there is no way to
tell
which gene you'll be measuring. I'll report this example to the
Ensembll
team as they used to do this for us.
Cheers,
Steffen
On Thu, Sep 13, 2012 at 8:29 AM, Juliet Hannah
<juliet.hannah@gmail.com<mailto:juliet.hannah@gmail.com>>
wrote:
All,
I am trying to create the GSEA chip file. This example uses Affy data,
and the chip file is already available. I'm
doing this as an exercise in preparation for other platforms.
The chip file should look like:
Probe Set ID Gene Symbol Gene Title
244901_at ORF25 hypothetical protein
244902_at NAD4L NADH dehydrogenase subunit 4L
244912_at CCB382 cytochrome c biogenesis orf382
244919_at CCB203 cytochrome c biogenesis orf203
244925_at NAD7 NADH dehydrogenase subunit 7
How can I obtain the third column from biomart. I tried searching the
attributes, but couldn't find the right name. Is it a matter of trial
and error to find the correct attribute, or
are there systematic ways to find it. Here is what I have so far:
library("biomaRt")
probeSets <- c("219666_at", "220547_s_at", "218034_at")
ensembl = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters
= "affy_hg_u133a",values = probeSets, mart = ensembl)
Also, does anyone have any suggestions regarding how to handle the
duplicates (seen in this example) with respect to GSEA.
Thanks,
Juliet Hannah
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 12
Date: Thu, 13 Sep 2012 13:34:24 -0400
From: "James W. MacDonald" <jmacdon@uw.edu<mailto:jmacdon@uw.edu>>
To: "Frank Schwach [guest]"
<guest@bioconductor.org<mailto:guest@bioconductor.org>>
Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] topGO: full-length descriptions in graph output
Message-ID: <50521920.5000104@uw.edu<mailto:50521920.5000104@uw.edu>>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Frank,
On 9/13/2012 4:29 AM, Frank Schwach [guest] wrote:
Hi,
does anybody know if there is any way in the topGO package to force
full-length descriptions of the GO terms to be printed in the graph?
By default, only about 20 characters of the descriptions are printed.
If you look at the help page for showSigOfNodes(), there is an
argument
.NO.CHAR that is listed as an internal function. This controls the
number of characters that are shown in the graph, and defaults to 20.
You can bump that up to whatever value you see fit, but note that it
*is* listed as an internal function, so you are stepping into the deep
end of the pool - if things don't work the way you want, it will
likely
be up to you to figure out why and fix it.
Best,
Jim
Thanks for your help!
Frank
-- output of sessionInfo():
R version 2.11.1 (2010-05-31)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] topGO_1.16.2 SparseM_0.85 GO.db_2.4.1
[4] RSQLite_0.9-1 DBI_0.2-5 AnnotationDbi_1.10.1
[7] Biobase_2.8.0 graph_1.26.0
loaded via a namespace (and not attached):
[1] grid_2.11.1 lattice_0.18-8 tools_2.11.1
--
Sent via the guest posting facility at bioconductor.org.
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
------------------------------
Message: 13
Date: Thu, 13 Sep 2012 14:29:12 -0400
From: Sermsawat Tunlaya-Anukit
<stunlay@ncsu.edu<mailto:stunlay@ncsu.edu>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] How to design matrix that account to effect of
transgenic event in edgeR ?
Message-ID:
<cahvcje69smyfrui_r1cqd33fe_ygm7drut7dx468wjae6vgueq@mail.gmai l.com<mailto:cahvcje69smyfrui_r1cqd33fe_ygm7drut7dx468wjae6vgueq@mail.="" gmail.com="">>
Content-Type: text/plain
I would like to ask about design matrix in GLM model in edgeR. I have
group
of transgenic down regulate gene. We have 3 lines of transgnics per
down
regulate gene which different level (event) of down regulate gene and
in
each line we have 3 replicate. This means we have 9 sample per 1 gene
downregulate. This experiment also have double down regulate (2 genes
down
regulate at the same time). We would like to find gene that change
when we
down regulate target gene. I create event1-9 to seperate the
transgenic
event. I try design matrix as show below and can't perform GLM. It
show "
Error in solve.default(R, t(beta)) :
system is computationally singular: reciprocal condition number =
3.13057e-17
If i change the design metric to column1-3 (effect of mean,geneA and
geneB), this new design matric work well. How can i desing matric that
account for effect of transgnic event?
mean geneA geneB event1 event2 event3 event4 event5 event6 event7
event8
event9
wt 1 0 0 0 0 0 0 0 0 0 0 0
wt 1 0 0 0 0 0 0 0 0 0 0 0
wt 1 0 0 0 0 0 0 0 0 0 0 0
down_a 1 1 0 1 0 0 0 0 0 0 0 0
down_a 1 1 0 1 0 0 0 0 0 0 0 0
down_a 1 1 0 1 0 0 0 0 0 0 0 0
down_a 1 1 0 0 1 0 0 0 0 0 0 0
down_a 1 1 0 0 1 0 0 0 0 0 0 0
down_a 1 1 0 0 1 0 0 0 0 0 0 0
down_a 1 1 0 0 0 1 0 0 0 0 0 0
down_a 1 1 0 0 0 1 0 0 0 0 0 0
down_a 1 1 0 0 0 1 0 0 0 0 0 0
down_b 1 0 1 0 0 0 1 0 0 0 0 0
down_b 1 0 1 0 0 0 1 0 0 0 0 0
down_b 1 0 1 0 0 0 1 0 0 0 0 0
down_b 1 0 1 0 0 0 0 1 0 0 0 0
down_b 1 0 1 0 0 0 0 1 0 0 0 0
down_b 1 0 1 0 0 0 0 1 0 0 0 0
down_b 1 0 1 0 0 0 0 0 1 0 0 0
down_b 1 0 1 0 0 0 0 0 1 0 0 0
down_b 1 0 1 0 0 0 0 0 1 0 0 0
down_ab 1 1 1 0 0 0 0 0 0 1 0 0
down_ab 1 1 1 0 0 0 0 0 0 1 0 0
down_ab 1 1 1 0 0 0 0 0 0 1 0 0
down_ab 1 1 1 0 0 0 0 0 0 0 1 0
down_ab 1 1 1 0 0 0 0 0 0 0 1 0
down_ab 1 1 1 0 0 0 0 0 0 0 1 0
down_ab 1 1 1 0 0 0 0 0 0 0 0 1
down_ab 1 1 1 0 0 0 0 0 0 0 0 1
down_ab 1 1 1 0 0 0 0 0 0 0 0 1
Thank you very much
Sermsawat T.
[[alternative HTML version deleted]]
------------------------------
Message: 14
Date: Fri, 14 Sep 2012 08:57:36 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>>
To: "Sanogo, Yibayiri O"
<sanogo@illinois.edu<mailto:sanogo@illinois.edu>>
Cc: Bioconductor mailing list
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>,
Osee
Sanogo
<sanogo@life.illinois.edu<mailto:sanogo@life.illinois.edu>>
Subject: Re: [BioC] limma separate channel analysis of your data
Message-ID: <pine.wnt.4.64.1209140856110.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140856110.6004@pc765.wehi.edu.au="">>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Dear Osee,
Yes, your code is correct. I forgot to add the intraspotCorrelation
estimation.
Best wishes
Gordon
On Thu, 13 Sep 2012, Sanogo, Yibayiri O wrote:
Dear Gordon,
In the code below,
design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
fit <- lmscFit(MA,design)
fit <- eBayes(fit, trend=TRUE)
shouldn't we include the intracorrelation coefficient in the single
channel model?
Is the following code correct?
corfit <- intraspotCorrelation(MA, design)
fit <- lmscFit(MA,design, correlation=corfit$consensus)
Please let me know if this is the right way of doing it.
Thank you.
Osee
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>]
Sent: Tuesday, September 11, 2012 8:13 PM
To: Sanogo, Yibayiri O
Cc: Osee Sanogo; Bioconductor mailing list
Subject: limma separate channel analysis of your data
Dear Osee,
Thank you for attaching the revised design of your experiment. I found
it
somewhat tricky to analyse, so my recommendations are more involved
than I
usually give. I think that you will need to perform a separate channel
analysis, otherwise it will not be possible to make baseline
comparisons
between the brain parts.
I also think that you may need to re-normalize your data using the
limma
package. You say that the data has been "loess/scale normalized into
an
expression set", but I don't think that scale normalization should be
necessary for Agilent data, and an expression set is not fully
suitable
for two colour data. The following code assumes that you have
normalized
the data into an MAList object using the limma package. I will call it
'MA'. I strongly recommend that you use normexp background correction
and
loess normalization as described in the limma User's Guide for GenePix
data.
The first step is to put your targets data into separate channel
format:
targets <- read.delim("Design_Brain.txt")
Treat <- as.vector(t(targets[,c("Cy3","Cy5")]))
Treat <- ifelse(Treat>0,"Exp","Ctrl")
Treat <- factor(Treat)
Brain <- rep(targets$Brain.Part,each=2,len=48)
Fish <- factor(rep(targets$Fish.Number,each=2,len=48))
Dye <- rep(0:1,len=48)
data.frame(Fish,Brain,Treat,Dye)
Now you can fit a linear model, correcting for probe-specific dye-
bias,
correcting for any differences between the fish, and computing
Treatment
vs Control effects for each brain part:
design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
fit <- lmscFit(MA,design)
fit <- eBayes(fit, trend=TRUE)
Now you can extract genes that have a treatment effect. To find genes
that have a treatment vs control effect in telencephalon, you would
use:
topTable(fit, coef="BrainT:TreatExp")
For genes DE for treatment vs control in brain stem, it would be
topTable(fit, coef="BrainBS:TreatExp")
And so on.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote:
Dear Gordon,
Attached please find the revised design of my experiment.
Thank you for your willingness to help me analyze these data.
Best regards,
Osee
________________________________________
From: Sanogo, Yibayiri O
Sent: Monday, September 10, 2012 9:51 PM
To: Gordon K Smyth; Osee Sanogo
Cc: Bioconductor mailing list
Subject: RE: Advice with RemoveBatchEffect and Rank Product package
Dear Gordon,
Please ignore the last two columns (weight and length) of the design
for
now: I think they are screwed up due to some kind of sorting. I will
have
to double check them (look in the notebook which is not here now).
Only
different brain regions (T, D, C, BS) belonging to the same fish
should
have same weight, length.
The other columns are:
Fish.Number= corresponds to fish ID, the individual fish (biological
replicates) Slide= the Agilent 4X44k slides on each the arrays were
located Brain.Part= the different brain regions that were dissected
and
hybridized (within region). They were T (telencephalon), D
(diencephalon),
C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is
control
and "1" is experimental.
FileName Cy3 Cy5 Fish Number Slide Brain Part
1T.gpr 1 -1 1 2 T
2T.gpr -1 1 2 1 T
3T.gpr 1 -1 3 4 T
4T.gpr -1 1 4 3 T
5T.gpr 1 -1 5 6 T
6T.gpr -1 1 6 5 T
1D.gpr -1 1 1 5 D
2D.gpr 1 -1 2 4 D
3D.gpr -1 1 3 1 D
4D.gpr 1 -1 4 6 D
5D.gpr -1 1 5 3 D
6D.gpr 1 -1 6 2 D
1C.gpr 1 -1 1 4 C
2C.gpr -1 1 2 3 C
3C.gpr 1 -1 3 6 C
4C.gpr -1 1 4 5 C
5C.gpr 1 -1 5 2 C
6C.gpr -1 1 6 1 C
1BS.gpr -1 1 1 1 BS
2BS.gpr 1 -1 2 2 BS
3BS.gpr -1 1 3 3 BS
4BS.gpr 1 -1 4 4 BS
6BS.gpr 1 -1 6 6 BS
Thank you for your willingness to help. I will stay up a bit so that I
can
respond faster to your questions given the time difference between us.
Thank you so much.
Osee
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU<mailto:smyth@wehi.edu.au>]
Sent: Monday, September 10, 2012 6:34 PM
To: Osee Sanogo
Cc: Sanogo, Yibayiri O; Bioconductor mailing list
Subject: Re: Advice with RemoveBatchEffect and Rank Product package
Dear Osee,
Can you explain what the columns Fish.Number, Slide, Brain.Part,
Weight
and Length mean in your experiment? If the first six arrays are from
biologically independent samples, why do they have the same weight and
length?
Best wishes
Gordon
On Mon, 10 Sep 2012, Osee Sanogo wrote:
Dear Gordon,
>From what you said, it seems that I am oversimplifying my experiment
by
attempting to analyze it with RankProd, which doesn't offer the option
for
complex modeling.
Could you please explain to me how I could analyze the experiment
using
Limma?
Please let me know if you'd like me to provide further details of the
experiment.
Thank you so much.
Osee
On 9/10/12 2:04 AM, "Gordon K Smyth"
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Osee,
No, you can't use removeBatchEffect to control for dye bias.
Can you ignore the dye effect? Not in general, but who knows?
Your experiment seems too complex to be properly analysed using
RankProd.
For one thing, it seems clear that you have obtained multiple parts of
the
brain from the same biological replicates, meaning that your samples
are
paired by fish number.
I could explain how to analyse this experiment using limma. However,
if
you are determined that you will use RankProd, it might be best to
email
the authors of that package for advice.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Sun, 9 Sep 2012, Osee Sanogo wrote:
Dear Gordon,
Thank you for getting back to me about my questions.
My experiment is trying to identify differentially expressed genes in
four
regions of the brain in response to a stressor. I have 6 biol.
replicates in
each brain region for the control and experimental groups in each
region,
and the comparison is being done within brain region (i.e., T control
vs T
exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample
were
run in two-color Agilent Array.
You're right that the design I sent was from the separate channel
analysis,
in which I attempted to account for array and dye effect, and then run
the
data in RankProd. Now I know that this is not right. Ok, I will use
the
single channel analysis in Limma.
I still would like to run the two-channel data (ratios) in RankProd,
as my
previous experience found this useful for my dara (low replicate
numbers).
My questions: 1) Could I use RemoveBatchEffect to control for dye bias
before running the two-channel data in RankProd? If yes, how should I
do
this using the RemoveBatch Effect function?
2) I found that about 3% of my probes have dye effect. Can I
omit controlling for dye effect without compromising the result of my
analysis?
The data were loess/scale normalized into an expression set (Data_RP).
Here is the design of the experiment
FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length
1 1T.gpr 1 -1 1 2 T 39 0.63
2 2T.gpr -1 1 2 1 T 39 0.63
3 3T.gpr 1 -1 3 4 T 39 0.63
4 4T.gpr -1 1 4 3 T 39 0.63
5 5T.gpr 1 -1 5 6 T 39 0.63
6 6T.gpr -1 1 6 5 T NA NA
7 1D.gpr -1 1 1 5 D 47 1.21
8 2D.gpr 1 -1 2 4 D 47 1.21
9 3D.gpr -1 1 3 1 D 47 1.21
10 4D.gpr 1 -1 4 6 D 47 1.21
11 5D.gpr -1 1 5 3 D 47 1.21
12 6D.gpr 1 -1 6 2 D NA NA
13 1C.gpr 1 -1 1 4 C 47 1.31
14 2C.gpr -1 1 2 3 C 47 1.31
15 3C.gpr 1 -1 3 6 C 47 1.31
16 4C.gpr -1 1 4 5 C 47 1.31
17 5C.gpr 1 -1 5 2 C 47 1.31
18 6C.gpr -1 1 6 1 C NA NA
19 1BS.gpr -1 1 1 1 BS 89 1.44
20 2BS.gpr 1 -1 2 2 BS 89 1.44
21 3BS.gpr -1 1 3 3 BS 89 1.44
22 4BS.gpr 1 -1 4 4 BS NA NA
23 5BS.gpr -1 1 5 5 BS NA NA
24 6BS.gpr 1 -1 6 6 BS NA NA
Thank you for your help and please let me know if you need further
explanation of the experiment.
Best regards,
Osee
On 9/9/12 7:24 PM, "Gordon K Smyth"
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Osee,
You are attempting to do a number of things that don't seem correct to
me.
First, you seem to attempting a separate channel analysis of two color
microarray data, but ignoring the pairing of the red and green
channels.
It isn't correct to do this. I don't see any way to use RankProd, or
any
other package designed for independent samples, correctly in this
context.
If you must do a separate channel analysis, you would be better off
using
the separate channel analysis facilities of the limma package.
Second, when you set batch=rep(1,24), you are defining a batch that
consists of every array in your data set. Obviously it doesn't make
sense
to remove batch effects unless there are at least two batches.
Third, I don't follow your design matrix.
It would be better to go back to the start, and describe in more basic
terms what is the nature of your data and what comparison you are
trying
to make.
Best wishes
Gordon
Date: Sat, 8 Sep 2012 11:40:45 +0000
From: "Sanogo, Yibayiri O"
<sanogo@illinois.edu<mailto:sanogo@illinois.edu>>
To: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package
Dear Members of the list,
(I apologize for posting this again -I sent it earlier to the list but
from another account and I was listed me as non-Member -and Member I
am
since 2008:-)).
I have been using Rank Prod to analyze Agilent two-color data.
However, I
would like to remove the dye effect prior to analysis. I read on the
forum
that RemoveBatchEffect should be used in the Limma linear model, a
type of
modeling that is not in Rank Product.
I have two questions:
1) Would it be appropriate to use RemoveBatchEffect to correct for dye
effect prior to running the expression data using Rank Prod?
2) a) If no, what other function could I use to do this?
b) If yes, I would like a help with the correct design and how to
properly indicate the batch.
Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS
=are
different areas of the brain):
design1
BS C D T
1 0 0 0 1
2 0 0 0 -1
3 0 0 0 1
4 0 0 0 -1
5 0 0 0 1
6 0 0 0 -1
7 0 0 -1 0
8 0 0 1 0
9 0 0 -1 0
10 0 0 1 0
11 0 0 -1 0
12 0 0 1 0
13 0 1 0 0
14 0 -1 0 0
15 0 1 0 0
16 0 -1 0 0
17 0 1 0 0
18 0 -1 0 0
19 -1 0 0 0
20 1 0 0 0
21 -1 0 0 0
22 1 0 0 0
23 -1 0 0 0
24 1 0 0 0
attr(,"assign")
[1] 1 1 1 1
I've tried this (Data_RP are my data, the M values of the expression
set):
DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL,
design=design1)
but it is returning an error message
" Error in contr.sum(levels(batch)) :
not enough degrees of freedom to define contrasts"
Please help me correct this code.
Thank you so much for your help.
Osee
-- -- --
Y. Osee Sanogo
Integrative Biology
Institute for Genomic Biology
University of Illinois at Urbana
505 S. Goodwin Ave
Urbana, IL-61801
Tel: 217-333 2308 (Office)
217-417 9593 (Cell)
______________________________________________________________________
The information in this email is confidential and
inte...{{dropped:10}}
------------------------------
Message: 15
Date: Fri, 14 Sep 2012 09:09:55 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>>
To: Natasha Sahgal
<nsahgal@well.ox.ac.uk<mailto:nsahgal@well.ox.ac.uk>>
Cc: Bioconductor mailing list
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: [BioC] edgeR: confusing BCV plot
Message-ID: <pine.wnt.4.64.1209140905340.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140905340.6004@pc765.wehi.edu.au="">>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Date: Wed, 12 Sep 2012 13:43:25 +0000
From: Natasha Sahgal
<nsahgal@well.ox.ac.uk<mailto:nsahgal@well.ox.ac.uk>>
To: "James W. MacDonald" <jmacdon@uw.edu<mailto:jmacdon@uw.edu>>
Cc: "'bioconductor@r-project.org<mailto:'bioconductor@r-project.org>'"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] edgeR: confusing BCV plot
Dear Jim,
Regarding the BCV plots, what I did not understand was the strange
profile (at least strange to me), and the low coefficients of BV.
Based on some figures from the user guide, it appeared to be very
different - an increase towards the higher logCPM.
1) I'm not sure how to interpret these and if it is a good thing or
not? (perhaps I have misunderstood the concept of the BCV)
Suggests to me that there is something unusual about your data.
Especially the J shape at very low counts, which I have not seen
before
for RNA-seq data.
2) How does this affect the differential expression, if at all?
Genes that have larger estimated dispersions are less likely to be
judged
to be differentially expressed.
Gordon
Re the filtering, for some reason I was under the impression
increasing the counts per million would reduce (if not remove) zero
counts in all samples. I agree with what you say about half the
samples being unconstrained.
I had 3 filters here, just to see what the difference would be. I
still need to figure out the best or optimal one to use.
Many Thanks and Best Wishes,
Natasha
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 16
Date: Fri, 14 Sep 2012 09:15:03 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>>
To: Javier Sim?n-S?nchez
<simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>>
Cc: Bioconductor mailing list
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: [BioC] Install devel version of edgeR
Message-ID: <pine.wnt.4.64.1209140912460.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140912460.6004@pc765.wehi.edu.au="">>
Content-Type: text/plain; charset="x-unknown"; Format="flowed"
Dear Javier,
This is a question for the Bioc managers, but the following page might
answer your question:
http://www.bioconductor.org/developers/useDevel/
Best wishes
Gordon
On Thu, 13 Sep 2012, Javier Sim?n-S?nchez wrote:
Hello Gordon,
Thanks a lot for your reply.
How can I upgrade to the latest version of edge R? I have checked on
the
list and it looks like i need version 2.99.3. However, the one you
provide
at
http://www.bioconductor.org/packages/2.10/bioc/html/edgeR.html is
edgeR_2.6.12. How can I get the needed version?
Thanks a lot for your help.
On Thu, Sep 13, 2012 at 2:43 AM, Gordon K Smyth
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Javier,
This error has been discused a number of times on this list. The
solution
is to upgrade edgeR to the current devel version.
Also please see the Bioconductor posting guide:
http://www.bioconductor.org/**help/mailing-
list/posting-**guide/<http: www.bioconductor.org="" help="" mailing-list="" posting-guide=""/>
Best wishes
Gordon
Date: Tue, 11 Sep 2012 13:04:55 +0200
From: Javier Sim?n-S?nchez
<simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] edgeR GLM error
Hello,
My name is Javier Sim?n Seanchez and I'm a post-doc at the VUmc in
Amsterdam.
The reason of this e-mail is that im running edgeR in an expression
dataset
and getting the following error when calculating the GLM common
dispersion:
*Error in beta[k, ] <- betaj[decr, ] :
NAs are not allowed in subscripted assignments
*
Im running cases versus controls and I want to modulate for different
tissues. How can I overcome this error?
Thanks a lot in advance
--
Javier Simon-Sanchez
______________________________**______________________________**______
____
The information in this email is confidential and intended solely for
the
addressee.
You must not disclose, forward, print or use it without the permission
of
the sender.
______________________________**______________________________**______
____
--
Javier Simon-Sanchez
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:5}}
------------------------------
Message: 17
Date: Fri, 14 Sep 2012 00:26:10 +0100
From: "Heidi Dvinge" <heidi@ebi.ac.uk<mailto:heidi@ebi.ac.uk>>
To: "Juan Fern?ndez Tajes"
<jfernandezt@udc.es<mailto:jfernandezt@udc.es>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] HTqPCR error
Message-ID:
<dab804a28f53356c21982e871dcfe0d7.squirrel@webmail.ebi.ac.uk<m ailto:dab804a28f53356c21982e871dcfe0d7.squirrel@webmail.ebi.ac.uk="">>
Content-Type: text/plain;charset=utf-8
Dear Heidi,
After trying the two combinations that you talked about, just the
second
one worked, i.e:
raw1 <- readCtData(files="Prueba_Blanco001.txt", n.features=96,
n.data=96,
column.info=list(flag=9, feature=5, type=6, Ct=7, position=1),
skip=12,
sep="\t")
the first one, gave the same previous error.
temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96,
+ format="BioMark")
Error en `$<-.data.frame`(`*tmp*`, "Call", value = character(0)) :
replacement has 0 rows, data has 9216
Okay, so it looks like the order and/or column names in your file is
different from the Fluidigm files the readCtData is based on, and the
n.features/n.data parameters are working as they should (phew). Could
you
perhaps send me the header of your file, so I can see what the
different
is?
\Heidi
Thanks
Juan
---------------------------------------------------------------
Juan Fernandez Tajes, ph. D
Grupo XENOMAR
Departamento de Biolog?a Celular y Molecular
Facultad de Ciencias-Universidade da Coru?a
Tlf. +34 981 167000 ext 2030
e-mail: jfernandezt@udc.es<mailto:jfernandezt@udc.es>
----------------------------------------------------------------
De: "Heidi Dvinge" <heidi@ebi.ac.uk<mailto:heidi@ebi.ac.uk>>
Para: "\"Juan Fern?ndez Tajes\""
<jfernandezt@udc.es<mailto:jfernandezt@udc.es>>
Enviados: Jueves, 13 de Septiembre 2012 2:56:43
Asunto: Re: HTqPCR error
Hi Juan,
Dear Heidi,
My name is Juan Fernandez-Tajes and I?m using HTqPCR for analyzing
some
Biomark 96*96 data. When I tried the following command I got an error:
raw1 <- readCtData(files="Prueba_Blanco001.txt", format="BioMark",
n.features=96, n.data=96)
Error en `$<-.data.frame`(`*tmp*`, "Call", value = character(0)) :
replacement has 0 rows, data has 9216
However when I import the data with the following command it works:
temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96,
column.info=list(flag=9, feature=5, type=6, Ct=7, position=1),
skip=12,
sep="\t")
raw <- changeCtLayout(temp, sample.order=rep(1:96, each=96))
Any ideas?
Not at the top of my head. However, you actually change two things
here,
namely both the specification of n.features/n.data and using 'format'
vs
'column.info'. To help me figure out which of these two things causes
the
error, can you try running the remaining 2 combinations? I.e.:
temp <- readCtData(files="Prueba_Blanco001.txt", n.features=96*96,
format="BioMark")
raw1 <- readCtData(files="Prueba_Blanco001.txt", n.features=96,
n.data=96,
column.info=list(flag=9, feature=5, type=6, Ct=7, position=1),
skip=12,
sep="\t")
Thanks
\Heidi
Many thanks in advance
Juan
---------------------------------------------------------------
Juan Fernandez Tajes, ph. D
Grupo XENOMAR
Departamento de Biolog?a Celular y Molecular
Facultad de Ciencias-Universidade da Coru?a
Tlf. +34 981 167000 ext 2030
e-mail: jfernandezt@udc.es<mailto:jfernandezt@udc.es>
----------------------------------------------------------------
------------------------------
Message: 18
Date: Fri, 14 Sep 2012 09:50:35 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>>
To: Javier Sim?n-S?nchez
<simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>>
Cc: Bioconductor mailing list
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] Install devel version of edgeR
Message-ID: <pine.wnt.4.64.1209140949370.6004@pc765.wehi.edu.au<mailto :pine.wnt.4.64.1209140949370.6004@pc765.wehi.edu.au="">>
Content-Type: text/plain; charset="x-unknown"; Format="flowed"
Alternatively you can download from:
http://bioconductor.org/packages/2.11/bioc/html/edgeR.html
Gordon
On Fri, 14 Sep 2012, Gordon K Smyth wrote:
Dear Javier,
This is a question for the Bioc managers, but the following page might
answer
your question:
http://www.bioconductor.org/developers/useDevel/
Best wishes
Gordon
On Thu, 13 Sep 2012, Javier Sim?n-S?nchez wrote:
Hello Gordon,
Thanks a lot for your reply.
How can I upgrade to the latest version of edge R? I have checked on
the
list and it looks like i need version 2.99.3. However, the one you
provide
at
http://www.bioconductor.org/packages/2.10/bioc/html/edgeR.html is
edgeR_2.6.12. How can I get the needed version?
Thanks a lot for your help.
On Thu, Sep 13, 2012 at 2:43 AM, Gordon K Smyth
<smyth@wehi.edu.au<mailto:smyth@wehi.edu.au>> wrote:
Dear Javier,
This error has been discused a number of times on this list. The
solution
is to upgrade edgeR to the current devel version.
Also please see the Bioconductor posting guide:
http://www.bioconductor.org/**help/mailing-
list/posting-**guide/<http: www.bioconductor.org="" help="" mailing-list="" posting-guide=""/>
Best wishes
Gordon
Date: Tue, 11 Sep 2012 13:04:55 +0200
From: Javier Sim?n-S?nchez
<simonsanchezj@gmail.com<mailto:simonsanchezj@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] edgeR GLM error
Hello,
My name is Javier Sim?n Seanchez and I'm a post-doc at the VUmc in
Amsterdam.
The reason of this e-mail is that im running edgeR in an expression
dataset
and getting the following error when calculating the GLM common
dispersion:
*Error in beta[k, ] <- betaj[decr, ] :
NAs are not allowed in subscripted assignments
*
Im running cases versus controls and I want to modulate for different
tissues. How can I overcome this error?
Thanks a lot in advance
--
Javier Simon-Sanchez
______________________________**______________________________**______
____
The information in this email is confidential and intended solely for
the
addressee.
You must not disclose, forward, print or use it without the permission
of
the sender.
______________________________**______________________________**______
____
--
Javier Simon-Sanchez
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:5}}
------------------------------
Message: 19
Date: Fri, 14 Sep 2012 11:08:11 +0200
From: Paolo Kunderfranco
<paolo.kunderfranco@gmail.com<mailto:paolo.kunderfranco@gmail.com>>
To: Rory Stark
<rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: [BioC] DiffBind - overlap between different peak callers
Message-ID:
<cagxwfc9ztp+3qqyw7ds4+qcyufnbmij+7hzlcnrzohzamhzirg@mail.gmai l.com<mailto:cagxwfc9ztp+3qqyw7ds4+qcyufnbmij+7hzlcnrzohzamhzirg@mail.="" gmail.com="">>
Content-Type: text/plain; charset=ISO-8859-1
Dear Rory,
I finally analyze my data with DiffBind and it seems working well,
I was wondering whether is possible or it will be possible in the next
release to export in a bed file (wig file) consensus peaks overlapping
between two conditions, in my example consensus peaks called by two
different peak callers.
Cheers,
Paolo
------------------------------
Message: 20
Date: Fri, 14 Sep 2012 11:23:26 +0200
From: Paolo Kunderfranco
<paolo.kunderfranco@gmail.com<mailto:paolo.kunderfranco@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: Re: [BioC] DiffBind - sample sheet for multiple replicates
and peak
Message-ID:
<cagxwfc- bywww9mjmr7nadcofnlj="ZyM3sssbc2mYNyDhE5+haQ@mail.gmail.com<mailto" :cagxwfc-bywww9mjmr7nadcofnlj="ZyM3sssbc2mYNyDhE5+haQ@mail.gmail.com">>
Content-Type: text/plain; charset=ISO-8859-1
Dear Antonio
I think that this problem was resolved in previous messagges, have a
look to:
https://stat.ethz.ch/pipermail/bioconductor/2012-August/047351.html?
For the complete code you could browse August mailing list.
SampleID should be unique for each sample, and moreover also bam file
file should be unique, you should make a copy of all your bamReads and
bamControl
In my example, I compared 2 different peak callers in 3 cell lines:
SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks
mES_H3K27me3_m,ES,H3K27,mES_H3K27me3,1,reads/H3K27me3/ES_H3K27me3_m.be
d,reads/H3K27me3/ES_input_m.bed,peaks/H3K27me3_ES_M.bed
CMp_H3K27me3_m,CMN,H3K27,CMp_H3K27me3,1,reads/H3K27me3/CMN_H3K27me3_m.
bed,reads/H3K27me3/CMN_input_m.bed,peaks/H3K27me3_CMN_M.bed
CMa_H3K27me3_m,CMA,H3K27,CMa_H3K27me3,1,reads/H3K27me3/CMA_H3K27me3_m.
bed,reads/H3K27me3/CMA_input_m.bed,peaks/H3K27me3_CMA_M.bed
mES_H3K27me3_s,ES,H3K27,mES_H3K27me3,2,reads/H3K27me3/ES_H3K27me3_s.be
d,reads/H3K27me3/ES_input_s.bed,peaks/H3K27me3_ES_S.bed
CMp_H3K27me3_s,CMN,H3K27,CMp_H3K27me3,2,reads/H3K27me3/CMN_H3K27me3_s.
bed,reads/H3K27me3/CMN_input_s.bed,peaks/H3K27me3_CMN_S.bed
CMa_H3K27me3_s,CMA,H3K27,CMa_H3K27me3,2,reads/H3K27me3/CMA_H3K27me3_s.
bed,reads/H3K27me3/CMA_input_s.bed,peaks/H3K27me3_CMA_S.bed
Like this it should work,
Cheers,
Paolo
------------------------------
Message: 21
Date: Fri, 14 Sep 2012 11:58:09 +0200
From: Ant?nio Miguel de Jesus Domingues
<amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>>
To: bioconductor@r-project.org<mailto:bioconductor@r-project.org>
Subject: [BioC] DiffBind - error in dba.count
Message-ID:
<capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq@mail.gmai l.com<mailto:capacvoc+9hgaxzatgj7j3guktemaummn6sjhoxm1b4ygwvr4rq@mail.="" gmail.com="">>
Content-Type: text/plain
Hi again,
I am trying DiffBind and loaded my data that looks like this:
H3K4m3
4 Samples, 13203 sites in matrix (13792 total):
ID Tissue Factor Condition Peak.caller Replicate Intervals
1 wt1 Hela H3K4me3 control1 raw 1 14111
2 wt2 Hela H3K4me3 control2 raw 2 13771
3 treat1 Hela H3K4me3 condition1 raw 1 14865
4 treat2 Hela H3K4me3 condition2 raw 2 13393
But I ran into problems trying to calculate the affinity scores with
dba.count:
H3K4m3 = dba.count(H3K4m3)
Error in cond$counts : $ operator is invalid for atomic vectors
In addition: Warning message:
In mclapply(arglist, fn, ..., mc.preschedule = FALSE) :
6 function calls resulted in an error
The peaks are in bed files (chr, start, end, score) and the reads are
in
SAM format.
Can anyone help me with this?
Cheers.
Ant?nio
sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] DiffBind_1.0.9 Biobase_2.14.0
loaded via a namespace (and not attached):
[1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7
edgeR_2.4.6
[5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0
limma_3.10.3
[9] zlibbioc_1.0.1
On 13 September 2012 18:06, Ant?nio Miguel de Jesus Domingues <
amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote:
Hi all,
I am trying to use DiffBind to compare peaks called in control vs
condition. I have 2 replicates for each and I've also called peaks
using 2
different peak callers (to wi, MACS and QuEST). I've also prepared a
sample
data sheet that looks like this:
SampleID Tissue Factor Condition Replicate Peak.caller
bamReads
bamControl Peaks
control Hela TF wt 1
MACS path path path
control Hela TF wt 1
QuEST path path path
control2 Hela TF wt 2
MACS path path path
control 2 Hela TF wt 2
QuEST path path path
(and the same for the conditions)
My plan was to load all the data and then using diffbind selecte a set
of
common peaks for the peak callers before proceeding with the analysis.
However, when I load the data (data =
dba(sampleSheet="samplesheet.csv"))
the peaks for each caller are not recognized as a different variable.
How
I can do that and is this silly?
I could also derive a set of common peaks independently but it would
be
neat to do it all with the same package and that seems to be possible
but I
could not find how to do it in the documentation.
Thanks,
Ant?nio
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
--
--
Ant?nio Miguel de Jesus Domingues, PhD
Neugebauer group
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
Pfotenhauerstrasse 108
01307 Dresden
Germany
e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de>
tel. +49 351 210 2481
The Unbearable Lightness of Molecular Biology
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 115, Issue 14
*********************************************
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-
named person(s). If you are not the intended recipient, notify the
sender immediately, delete this email from your system and do not
disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current
legislation. We have taken steps to ensure that this email and
attachments are free from any virus, but it remains your
responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered charity in England and Wales (1089464), Scotland (SC041666)
and the Isle of Man (1103)
A company limited by guarantee. Registered company in England and
Wales (4325234) and the Isle of Man (5713F).
Registered Office Address: Angel Building, 407 St John Street, London
EC1V 4AD.
[[alternative HTML version deleted]]