Hello,
I have a gene expression data set build up form rows of genes
expression as
fellow:
GeneID GeneName Sample1 .......... Samplen
Category +1 ...........-1
1 gene1 0.5 ..............0.67
2 gene2 0.34 ............. 0.78
How I could use bioconductor to analyze this data set and get the most
informative genes, classification.. Clustering and etc
Malik
[[alternative HTML version deleted]]
On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote:
> Hello,
>
> I have a gene expression data set build up form rows of genes
> expression as
> fellow:
>
> GeneID GeneName Sample1 .......... Samplen
>
> Category +1 ...........-1
>
> 1 gene1 0.5 ..............0.67
>
> 2 gene2 0.34 ............. 0.78
>
>
>
> How I could use bioconductor to analyze this data set and get the
most
> informative genes, classification.. Clustering and etc
>
Malik,
You will have to decide what specific questions you want to answer
using your data. To get a sense of what bioconductor has to offer,
try
looking here:
http://www.bioconductor.org/
faq.html#What%20documentation%20exists%20for%20Bioconductor
The vignettes give a lot of detail about how to use different
packages.
The BioConductor Short Courses are very helpful as a starting place.
When you run into specific problems, ask here. If you want more help
here, you will probably have to be more specific about your data, what
you have tried, and what hasn't worked. Single channel or two-color?
Patient samples or cell lines or something else? Expression or CGH?
How many classes of sample? What are the research
questions/hypotheses?
Sean
Hello,
I have data that been preprocessed to have the gene expression for
each
genes, where I have 19200 genes involved in the experiments and I have
186
samples. The samples define 32 phenotypes (classes). I would like to
find
the significant genes among 10 different combinations of classes and
then
find out the intersection between those lists of significant genes.
My problem was is how to read this simple data to any package of
bioconductor, since I saw that bioconductor input format is more
requiring
the image format (or I'm missing some thing here). I want to read the
input
file where I want to keep track of the gene Id and the gene name.
So please only provide me with simple example reading this input
format to
any basic package of bioconductor. For simplicit consider that we have
a
table as fellow:
GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5
......SampleN
Class C1 C1 C2 C3 C4 C1
1 gene1 0.04 0.05 0.06 0.7 0.8 .......
0.9
Where the second row have the class labels, and then at the third row
we
have the gene expressions (just numbers!!).
So I want to read this format to a specific bioconductor package (say
limma/?) and start applying diffirent functions.
So again I want to know how to read this file to the package???
Message-----
From: Sean Davis [mailto:sdavis2@mail.nih.gov]
Sent: Wednesday, April 20, 2005 2:56 AM
To: yousef@wistar.org
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] where to start?
On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote:
> Hello,
>
> I have a gene expression data set build up form rows of genes
> expression as
> fellow:
>
> GeneID GeneName Sample1 .......... Samplen
>
> Category +1 ...........-1
>
> 1 gene1 0.5 ..............0.67
>
> 2 gene2 0.34 ............. 0.78
>
>
>
> How I could use bioconductor to analyze this data set and get the
most
> informative genes, classification.. Clustering and etc
>
Malik,
You will have to decide what specific questions you want to answer
using your data. To get a sense of what bioconductor has to offer,
try
looking here:
http://www.bioconductor.org/
faq.html#What%20documentation%20exists%20for%20Bioconductor
The vignettes give a lot of detail about how to use different
packages.
The BioConductor Short Courses are very helpful as a starting place.
When you run into specific problems, ask here. If you want more help
here, you will probably have to be more specific about your data, what
you have tried, and what hasn't worked. Single channel or two-color?
Patient samples or cell lines or something else? Expression or CGH?
How many classes of sample? What are the research
questions/hypotheses?
Sean
.
On Apr 20, 2005, at 12:06 PM, Malik Yousef wrote:
> Hello,
> I have data that been preprocessed to have the gene expression for
each
> genes, where I have 19200 genes involved in the experiments and I
have
> 186
> samples. The samples define 32 phenotypes (classes). I would like to
> find
> the significant genes among 10 different combinations of classes and
> then
> find out the intersection between those lists of significant genes.
>
> My problem was is how to read this simple data to any package of
> bioconductor, since I saw that bioconductor input format is more
> requiring
> the image format (or I'm missing some thing here). I want to read
the
> input
> file where I want to keep track of the gene Id and the gene name.
> So please only provide me with simple example reading this input
> format to
> any basic package of bioconductor. For simplicit consider that we
have
> a
> table as fellow:
> GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5
> ......SampleN
> Class C1 C1 C2 C3 C4
C1
> 1 gene1 0.04 0.05 0.06 0.7 0.8 .......
> 0.9
>
You can look at read.table for reading your file. You may need to
make
some adjustments to your file, though, as read.table reads
tab-delimited data. As Michael pointed out in another message
earlier,
you will really benefit from using the original image files if you
have
them.
You will probably benefit from reading:
http://cran.r-project.org/doc/manuals/R-data.html#Spreadsheet_002dlike
-
data
> Where the second row have the class labels, and then at the third
row
> we
> have the gene expressions (just numbers!!).
>
> So I want to read this format to a specific bioconductor package
(say
> limma/?) and start applying diffirent functions.
Have you read the limma user guide? You will probably need to so that
you can determine how to manipulate your data into a format that limma
understands.
>
> So again I want to know how to read this file to the package???
>
For specific help on a function, type something like:
?read.table
For searching for help on a specific topic, you can try:
help.search('topic')
Unfortunately, a big part of using R and bioconductor is data
manipulation (moving from one format to another), which will require
learning some basic skills in R. The Introduction to R manual is
quite
helpful. Reading at least part of it and going through the examples
it
provides is pretty much necessary.
Hope this helps a bit.
Sean
Hi Malik
What do you want to do? Bioconductor has many packages which you
could
use. My preference when it comes to detecting differential expression
is for limma, but there are many others such as siggenes, genefilter,
multtest etc
For multtest, read the multtest.pdf after installing that library.
You
will want to read your data into an R data frame using read.table().
For limma, again you will probably need to read your data in using
read.table(). Then you can either create an exprSet class (type
?exprSet) or an MAList (documented in the limma help). The limma
package is easier to use if you have the original data outputs from
your
image analysis software....
For clustering, there are plenty of mails on this list that have dealt
with this, but the functions you may want to start with are hclust(),
dist() and cor().
But really, that's all just for starters! What do you want to do??
:-)
Mick
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Malik
Yousef
Sent: 20 April 2005 06:57
To: bioconductor@stat.math.ethz.ch
Cc: yousef@wistar.org
Subject: [BioC] where to start?
Hello,
I have a gene expression data set build up form rows of genes
expression
as
fellow:
GeneID GeneName Sample1 .......... Samplen
Category +1 ...........-1
1 gene1 0.5 ..............0.67
2 gene2 0.34 ............. 0.78
How I could use bioconductor to analyze this data set and get the most
informative genes, classification.. Clustering and etc
Malik
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
OK, off the top of my head, *if* I wanted to use limma to apply some
linear models across my data set (which would, I think, tell me which
genes were changing significantly between one or more phenotypes):
1) Edit the data in Excel and get rid of the Class row
2) Save the data as text (tab or space delimited, it doesn't matter)
3) read the data in to R using read.table (which by default splits
data
into columns based on white-space)
4) manipulate the data - what we are trying to get is a matrix of data
in R, where the rownames() of the matrix are the genenames, the
colnames() of the matrix are the experiment names and the values are
the
expression values. What you get back from read.table() will be like
this, but not quite. Install the Biobase library, load it and type
?exprSet. Read that entire help file, execute the example, and pay
particular attention to the format of the geneData matrix
5) after reading the above helpfile you will have an exprSet object,
which can be used by limma.
6) read the limma manual. Some of the bits of the manual which refer
to
affy data include examples of using limma with exprSet objects.
:-) Most of all have fun. R can be a tough learning curve, but it's
worth it. Let me know how you get on.
Mick
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Malik
Yousef
Sent: 20 April 2005 17:06
To: 'Sean Davis'
Cc: yousef@wistar.org; bioconductor@stat.math.ethz.ch
Subject: RE: [BioC] where to start?
Hello,
I have data that been preprocessed to have the gene expression for
each
genes, where I have 19200 genes involved in the experiments and I have
186 samples. The samples define 32 phenotypes (classes). I would like
to
find the significant genes among 10 different combinations of classes
and then find out the intersection between those lists of significant
genes.
My problem was is how to read this simple data to any package of
bioconductor, since I saw that bioconductor input format is more
requiring the image format (or I'm missing some thing here). I want to
read the input file where I want to keep track of the gene Id and the
gene name. So please only provide me with simple example reading this
input format to any basic package of bioconductor. For simplicit
consider that we have a table as fellow:
GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5
......SampleN
Class C1 C1 C2 C3 C4 C1
1 gene1 0.04 0.05 0.06 0.7 0.8 .......
0.9
Where the second row have the class labels, and then at the third row
we
have the gene expressions (just numbers!!).
So I want to read this format to a specific bioconductor package (say
limma/?) and start applying diffirent functions.
So again I want to know how to read this file to the package???
Message-----
From: Sean Davis [mailto:sdavis2@mail.nih.gov]
Sent: Wednesday, April 20, 2005 2:56 AM
To: yousef@wistar.org
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] where to start?
On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote:
> Hello,
>
> I have a gene expression data set build up form rows of genes
> expression as
> fellow:
>
> GeneID GeneName Sample1 .......... Samplen
>
> Category +1 ...........-1
>
> 1 gene1 0.5 ..............0.67
>
> 2 gene2 0.34 ............. 0.78
>
>
>
> How I could use bioconductor to analyze this data set and get the
most
> informative genes, classification.. Clustering and etc
>
Malik,
You will have to decide what specific questions you want to answer
using your data. To get a sense of what bioconductor has to offer,
try
looking here:
http://www.bioconductor.org/
faq.html#What%20documentation%20exists%20for%20Bioconductor
The vignettes give a lot of detail about how to use different
packages.
The BioConductor Short Courses are very helpful as a starting place.
When you run into specific problems, ask here. If you want more help
here, you will probably have to be more specific about your data, what
you have tried, and what hasn't worked. Single channel or two-color?
Patient samples or cell lines or something else? Expression or CGH?
How many classes of sample? What are the research
questions/hypotheses?
Sean
.
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
"Malik Yousef" <yousef@wistar.org> writes:
> any basic package of bioconductor. For simplicit consider that we
have a
> table as fellow:
> GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5
......SampleN
> Class C1 C1 C2 C3 C4
C1
> 1 gene1 0.04 0.05 0.06 0.7 0.8 .......
> 0.9
I would remove the Class line and put it in another file. Then
reading your data into R is easily done with read.table (which can use
any delimiter you specify, BTW). Then you can add the class
information.
If the above raises more questions for you than it answers, then I'm
afraid you will need to consult some of the R documentation which will
explain how to do these sorts of things.
Hope that helps a bit.
+ seth