Dear List Patrons,
I am processing some microarray data obtained with Affymetrix Mouse
Gene 2.0 ST chips. Using the xps package, I have performed step-wise
RMA processing - Background correction, Quantiles Normalization, and
Median Polish summarization. The output of xps is a tab-delimited text
file with 33,960 non-log base 2 transformed expression measures, the
UnitName (an 8 digit number starting at 17550607) and a Unit ID
(starts at 0 but isn't continuous). What I would like to do is import
this data into the limma package, from my output text file. I know
that I can make an expressionset and use this expressionset directly
after making a design matrix, via the instructions here:
http://permalink.gmane.org/gmane.science.biology.informatics.conductor
/50591
I would like to add the gene names into the expressionset so that I
can make volcano plots and graphs with limma, in accord with the limma
users guide.
I know how to access the gene names and relate them to the UnitName
code copied from here:
(
http://biowhat.ucsd.edu/homer/basicTutorial/affymetrix.html)
library(mogene20sttranscriptcluster.db)
librarypd.mogene.2.0.st)
mat1 <- as.matrix(exprs_mp)
post_rma <- ExpressionSet(assayData=mat1, annotation="mogene.2.0.st")
my_frame <- data.frame(exprs(post_rma))
Annot <-
data.frame(ACCNUM=sapply(contents(mogene20sttranscriptclusterACCNUM),
paste, collapse=", "),
SYMBOL=sapply(contents(mogene20sttranscriptclusterSYMBOL), paste,
collapse=", "),
DESC=sapply(contents(mogene20sttranscriptclusterGENENAME), paste,
collapse=", "))
all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
write.table(all,file="21Dec13_data.ann.txt",sep="\t")
Is there a way to modify the above code so that I can add the gene
names and gene information to my expresionset, in a way that makes
them AS accessible to the limma package as they would be if I had used
the affy package to process the data initially?
Also, I would like to compare the output of VSN with RMA using eBayes
in limma. I don't wish to import the raw intensities into limma, not
because I couldn't use the same methods (vsn is in limma), only
because I would have to start my data processing over. Any help or
comments is greatly appreciated.
Thank you.
Sincerely,
Matt
Matthew E. Thornton
Research Lab Specialist
Saban Research Institute
USC/Children?s Hospital Los Angeles
513X, Mail Stop 35
4661 W. Sunset Blvd.
Los Angeles, CA 90027-6020
matthew.thornton at med.usc.edu
________________________________________
From: bioconductor-bounces@r-project.org [bioconductor-
bounces@r-project.org] on behalf of bioconductor-request@r-project.org
[bioconductor-request@r-project.org]
Sent: Monday, December 23, 2013 3:00 AM
To: bioconductor at r-project.org
Subject: Bioconductor Digest, Vol 130, Issue 25
Send Bioconductor mailing list submissions to
bioconductor at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at r-project.org
You can reach the person managing the list at
bioconductor-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. Re: Complete variant toolbox:
gmapR/VariantTools/VariantAnnotation (Thomas Girke)
2. DESeq / DESeq2 for translation efficiency in polysome
profiling or ribsome profiling experiments (Gilgi [guest])
3. forgeSeqFiles (Melo [guest])
4. Multiple contrasts in DESeq2 (Hugo Varet)
----------------------------------------------------------------------
Message: 1
Date: Sun, 22 Dec 2013 09:36:40 -0800
From: Thomas Girke <thomas.girke@ucr.edu>
To: Valerie Obenchain <vobencha at="" fhcrc.org="">
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Complete variant toolbox:
gmapR/VariantTools/VariantAnnotation
Message-ID: <20131222173640.GA4345 at Thomas-Girkes-MacBook-Pro.local>
Content-Type: text/plain; charset="us-ascii"
Sorry for not responding sooner. - I agree, providing the variant
mapping
position relative to transcripts will be the most useful one, at least
for features that are part of transcripts. For others that are not,
e.g.
intron and intergenic features, one could report the variant mappings
relative to the start position of these features.
Thomas
On Thu, Dec 19, 2013 at 03:08:19AM +0000, Valerie Obenchain wrote:
> Hi,
>
> On 12/17/2013 09:40 AM, Robert Castelo wrote:
> > hi Valerie cc Thomas,
> >
> > sorry for hijacking the thread, regarding the request made below..
> >
> > On 12/09/2013 09:07 PM, Valerie Obenchain wrote:
> > [...]
> >> I could add a 'REFLOC' column to the otuput of locateVariants()
that
> >> would essentially be the "equivalent" to 'CDSLOC' from
predictCoding().
> >
> > for the purpose of ordering cDNA primers flanking variants which
one may
> > want to validate through sanger sequencing, it is useful to have
at hand
> > the position of the variant with respect to the beginning of the
> > transcript (cDNA) where it has been observed, thus not just from
the
> > beginning of the CDS but from the beginning of the transcript.
> >
> > is this newer 'REFLOC' going to contain this position? if not,
would it
> > be possible to get also a column for that from the
locateVariants()
> > call? (e.g., TXLOC)
>
> Yes, I think it makes sense to have 'REFLOC' be the position in the
> reference starting from the beginning of the transcript. Unless
others
> have different thoughts this is what I'll go ahead with.
>
> Valerie
>
>
>
>
>
> >
> >
> > thanks!!
> > robert.
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha at fhcrc.org
> Phone: (206) 667-3158
> Fax: (206) 667-1319
------------------------------
Message: 2
Date: Sun, 22 Dec 2013 23:37:06 -0800 (PST)
From: "Gilgi [guest]" <guest@bioconductor.org>
To: bioconductor at r-project.org, gilgi.friedlander at weizmann.ac.il
Cc: DESeq Maintainer <sanders at="" fs.tum.de="">
Subject: [BioC] DESeq / DESeq2 for translation efficiency in polysome
profiling or ribsome profiling experiments
Message-ID: <20131223073706.DA826143685 at mamba.fhcrc.org>
Hello,
I wanted to ask whether it is possible to use DESeq for the analysis
of translation efficiency in polysome profiling or ribsome profiling.
In polysome profiling - polysomes are isolated (e.g. by sucrose
gradients), and mRNA that were associated with the polysomes are then
isolated and sequenced.
In ribosome profiling, mRNA fragments that are protected by the
ribosome are sequenced.
When such experiments are conducted in parallel to total RNA-seq, one
can ask questions regarding the translation efficiency.
For example: if we have 4 samples: ribosome sequencing (without
treatment), total sequencing (without treatment), sequencing (with
treatment), total sequencing (with treatment), one can ask how does
the treatment affects the protein synthesis rate (which is the ratio
ribosome/total).
I wanted to ask whether it is possible to define such a model in DESeq
for such analysis, in order to answer how is this ratio affected by
the treatment.
Thanks a lot
-- output of sessionInfo():
none
--
Sent via the guest posting facility at bioconductor.org.
------------------------------
Message: 3
Date: Sun, 22 Dec 2013 23:47:30 -0800 (PST)
From: "Melo [guest]" <guest@bioconductor.org>
To: bioconductor at r-project.org, memoly101 at yahoo.com
Subject: [BioC] forgeSeqFiles
Message-ID: <20131223074730.1C4F514369A at mamba.fhcrc.org>
Hello everyone,
I am so new to R as well as Bioconductor but found them very helpful
so I'm trying to use.
Now I need to make a BSgenome pckg for my own organism. Therefore I
have made a folder named seqs_srcdir which contains 17,000 files (one
gene_sequence per file), but when I downloaded, and even unzipped
BSgenome I kept getting the Error: could not find function
"forgeSeqFiles".
I know I need to make the seed files so I gave the command line:
forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa",
seqs_srcdir="/Users/Me/Documents/Microarray",
seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE)
Error: could not find function "forgeSeqFiles".
I would appreciate it if you could please advice,
Melo
-- output of sessionInfo():
forgeSeqlengthsFile(seqnames, prefix="pi1>", suffix=".fa",
seqs_srcdir="/Users/Me/Documents/Microarray",
seqs_destdir="/Users/Me/Documents/Microarray/Seeds", verbose=TRUE)
Error: could not find function "forgeSeqFiles".
--
Sent via the guest posting facility at bioconductor.org.
------------------------------
Message: 4
Date: Mon, 23 Dec 2013 11:10:40 +0100
From: Hugo Varet <hugo.varet@pasteur.fr>
To: bioconductor at r-project.org
Subject: [BioC] Multiple contrasts in DESeq2
Message-ID: <52B80C20.5020009 at pasteur.fr>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Dear Michael Love and list members,
I'm used to analyze RNA-seq data with your nice DESeq2 package (1.2.0)
and I sometimes have to test complex null hypotheses using contrasts.
For one of my project, I would like using multiple contrasts, i.e.
testing for example (H0: beta1-beta2 = beta3-beta4 = 0) against (H1:
beta1-beta2 != 0 and/or beta3-beta4 != 0), where the beta's are the
regression coefficients of the models. To do so, I would like using a
numeric matrix for the contrast argument in results(), but it only
accepts a numeric vector.
Hence, I was wondering if you planned to extend DESeq2 to the use of
multiple contrasts?
Best regards,
Hugo
--
Hugo Varet
Plate-Forme Transcriptome et Epig?nome
Institut Pasteur
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 130, Issue 25
*********************************************