where to start?

0

Entering edit mode

Malik Yousef ▴ 40

@malik-yousef-1211

Last seen 9.6 years ago

Hello, I have a gene expression data set build up form rows of genes expression as fellow: GeneID GeneName Sample1 .......... Samplen Category +1 ...........-1 1 gene1 0.5 ..............0.67 2 gene2 0.34 ............. 0.78 How I could use bioconductor to analyze this data set and get the most informative genes, classification.. Clustering and etc Malik [[alternative HTML version deleted]]

Classification Clustering Category Classification Clustering Category • 1.2k views

ADD COMMENT • link updated 19.0 years ago by Seth Falcon ★ 7.4k • written 19.0 years ago by Malik Yousef ▴ 40

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote: > Hello, > > I have a gene expression data set build up form rows of genes > expression as > fellow: > > GeneID GeneName Sample1 .......... Samplen > > Category +1 ...........-1 > > 1 gene1 0.5 ..............0.67 > > 2 gene2 0.34 ............. 0.78 > > > > How I could use bioconductor to analyze this data set and get the most > informative genes, classification.. Clustering and etc > Malik, You will have to decide what specific questions you want to answer using your data. To get a sense of what bioconductor has to offer, try looking here: http://www.bioconductor.org/ faq.html#What%20documentation%20exists%20for%20Bioconductor The vignettes give a lot of detail about how to use different packages. The BioConductor Short Courses are very helpful as a starting place. When you run into specific problems, ask here. If you want more help here, you will probably have to be more specific about your data, what you have tried, and what hasn't worked. Single channel or two-color? Patient samples or cell lines or something else? Expression or CGH? How many classes of sample? What are the research questions/hypotheses? Sean

ADD COMMENT • link 19.0 years ago Sean Davis 21k

0

Entering edit mode

Hello, I have data that been preprocessed to have the gene expression for each genes, where I have 19200 genes involved in the experiments and I have 186 samples. The samples define 32 phenotypes (classes). I would like to find the significant genes among 10 different combinations of classes and then find out the intersection between those lists of significant genes. My problem was is how to read this simple data to any package of bioconductor, since I saw that bioconductor input format is more requiring the image format (or I'm missing some thing here). I want to read the input file where I want to keep track of the gene Id and the gene name. So please only provide me with simple example reading this input format to any basic package of bioconductor. For simplicit consider that we have a table as fellow: GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5 ......SampleN Class C1 C1 C2 C3 C4 C1 1 gene1 0.04 0.05 0.06 0.7 0.8 ....... 0.9 Where the second row have the class labels, and then at the third row we have the gene expressions (just numbers!!). So I want to read this format to a specific bioconductor package (say limma/?) and start applying diffirent functions. So again I want to know how to read this file to the package??? Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Wednesday, April 20, 2005 2:56 AM To: yousef@wistar.org Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] where to start? On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote: > Hello, > > I have a gene expression data set build up form rows of genes > expression as > fellow: > > GeneID GeneName Sample1 .......... Samplen > > Category +1 ...........-1 > > 1 gene1 0.5 ..............0.67 > > 2 gene2 0.34 ............. 0.78 > > > > How I could use bioconductor to analyze this data set and get the most > informative genes, classification.. Clustering and etc > Malik, You will have to decide what specific questions you want to answer using your data. To get a sense of what bioconductor has to offer, try looking here: http://www.bioconductor.org/ faq.html#What%20documentation%20exists%20for%20Bioconductor The vignettes give a lot of detail about how to use different packages. The BioConductor Short Courses are very helpful as a starting place. When you run into specific problems, ask here. If you want more help here, you will probably have to be more specific about your data, what you have tried, and what hasn't worked. Single channel or two-color? Patient samples or cell lines or something else? Expression or CGH? How many classes of sample? What are the research questions/hypotheses? Sean .

ADD REPLY • link 19.0 years ago Malik Yousef ▴ 40

0

Entering edit mode

On Apr 20, 2005, at 12:06 PM, Malik Yousef wrote: > Hello, > I have data that been preprocessed to have the gene expression for each > genes, where I have 19200 genes involved in the experiments and I have > 186 > samples. The samples define 32 phenotypes (classes). I would like to > find > the significant genes among 10 different combinations of classes and > then > find out the intersection between those lists of significant genes. > > My problem was is how to read this simple data to any package of > bioconductor, since I saw that bioconductor input format is more > requiring > the image format (or I'm missing some thing here). I want to read the > input > file where I want to keep track of the gene Id and the gene name. > So please only provide me with simple example reading this input > format to > any basic package of bioconductor. For simplicit consider that we have > a > table as fellow: > GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5 > ......SampleN > Class C1 C1 C2 C3 C4 C1 > 1 gene1 0.04 0.05 0.06 0.7 0.8 ....... > 0.9 > You can look at read.table for reading your file. You may need to make some adjustments to your file, though, as read.table reads tab-delimited data. As Michael pointed out in another message earlier, you will really benefit from using the original image files if you have them. You will probably benefit from reading: http://cran.r-project.org/doc/manuals/R-data.html#Spreadsheet_002dlike - data > Where the second row have the class labels, and then at the third row > we > have the gene expressions (just numbers!!). > > So I want to read this format to a specific bioconductor package (say > limma/?) and start applying diffirent functions. Have you read the limma user guide? You will probably need to so that you can determine how to manipulate your data into a format that limma understands. > > So again I want to know how to read this file to the package??? > For specific help on a function, type something like: ?read.table For searching for help on a specific topic, you can try: help.search('topic') Unfortunately, a big part of using R and bioconductor is data manipulation (moving from one format to another), which will require learning some basic skills in R. The Introduction to R manual is quite helpful. Reading at least part of it and going through the examples it provides is pretty much necessary. Hope this helps a bit. Sean

ADD REPLY • link 19.0 years ago Sean Davis 21k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.6 years ago

Hi Malik What do you want to do? Bioconductor has many packages which you could use. My preference when it comes to detecting differential expression is for limma, but there are many others such as siggenes, genefilter, multtest etc For multtest, read the multtest.pdf after installing that library. You will want to read your data into an R data frame using read.table(). For limma, again you will probably need to read your data in using read.table(). Then you can either create an exprSet class (type ?exprSet) or an MAList (documented in the limma help). The limma package is easier to use if you have the original data outputs from your image analysis software.... For clustering, there are plenty of mails on this list that have dealt with this, but the functions you may want to start with are hclust(), dist() and cor(). But really, that's all just for starters! What do you want to do?? :-) Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Malik Yousef Sent: 20 April 2005 06:57 To: bioconductor@stat.math.ethz.ch Cc: yousef@wistar.org Subject: [BioC] where to start? Hello, I have a gene expression data set build up form rows of genes expression as fellow: GeneID GeneName Sample1 .......... Samplen Category +1 ...........-1 1 gene1 0.5 ..............0.67 2 gene2 0.34 ............. 0.78 How I could use bioconductor to analyze this data set and get the most informative genes, classification.. Clustering and etc Malik [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.0 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.6 years ago

OK, off the top of my head, *if* I wanted to use limma to apply some linear models across my data set (which would, I think, tell me which genes were changing significantly between one or more phenotypes): 1) Edit the data in Excel and get rid of the Class row 2) Save the data as text (tab or space delimited, it doesn't matter) 3) read the data in to R using read.table (which by default splits data into columns based on white-space) 4) manipulate the data - what we are trying to get is a matrix of data in R, where the rownames() of the matrix are the genenames, the colnames() of the matrix are the experiment names and the values are the expression values. What you get back from read.table() will be like this, but not quite. Install the Biobase library, load it and type ?exprSet. Read that entire help file, execute the example, and pay particular attention to the format of the geneData matrix 5) after reading the above helpfile you will have an exprSet object, which can be used by limma. 6) read the limma manual. Some of the bits of the manual which refer to affy data include examples of using limma with exprSet objects. :-) Most of all have fun. R can be a tough learning curve, but it's worth it. Let me know how you get on. Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Malik Yousef Sent: 20 April 2005 17:06 To: 'Sean Davis' Cc: yousef@wistar.org; bioconductor@stat.math.ethz.ch Subject: RE: [BioC] where to start? Hello, I have data that been preprocessed to have the gene expression for each genes, where I have 19200 genes involved in the experiments and I have 186 samples. The samples define 32 phenotypes (classes). I would like to find the significant genes among 10 different combinations of classes and then find out the intersection between those lists of significant genes. My problem was is how to read this simple data to any package of bioconductor, since I saw that bioconductor input format is more requiring the image format (or I'm missing some thing here). I want to read the input file where I want to keep track of the gene Id and the gene name. So please only provide me with simple example reading this input format to any basic package of bioconductor. For simplicit consider that we have a table as fellow: GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5 ......SampleN Class C1 C1 C2 C3 C4 C1 1 gene1 0.04 0.05 0.06 0.7 0.8 ....... 0.9 Where the second row have the class labels, and then at the third row we have the gene expressions (just numbers!!). So I want to read this format to a specific bioconductor package (say limma/?) and start applying diffirent functions. So again I want to know how to read this file to the package??? Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Wednesday, April 20, 2005 2:56 AM To: yousef@wistar.org Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] where to start? On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote: > Hello, > > I have a gene expression data set build up form rows of genes > expression as > fellow: > > GeneID GeneName Sample1 .......... Samplen > > Category +1 ...........-1 > > 1 gene1 0.5 ..............0.67 > > 2 gene2 0.34 ............. 0.78 > > > > How I could use bioconductor to analyze this data set and get the most > informative genes, classification.. Clustering and etc > Malik, You will have to decide what specific questions you want to answer using your data. To get a sense of what bioconductor has to offer, try looking here: http://www.bioconductor.org/ faq.html#What%20documentation%20exists%20for%20Bioconductor The vignettes give a lot of detail about how to use different packages. The BioConductor Short Courses are very helpful as a starting place. When you run into specific problems, ask here. If you want more help here, you will probably have to be more specific about your data, what you have tried, and what hasn't worked. Single channel or two-color? Patient samples or cell lines or something else? Expression or CGH? How many classes of sample? What are the research questions/hypotheses? Sean . _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 19.0 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 9.6 years ago

"Malik Yousef" <yousef@wistar.org> writes: > any basic package of bioconductor. For simplicit consider that we have a > table as fellow: > GenId GeneName Sample1 Sample2 Sample3 Sample4 Sample5 ......SampleN > Class C1 C1 C2 C3 C4 C1 > 1 gene1 0.04 0.05 0.06 0.7 0.8 ....... > 0.9 I would remove the Class line and put it in another file. Then reading your data into R is easily done with read.table (which can use any delimiter you specify, BTW). Then you can add the class information. If the above raises more questions for you than it answers, then I'm afraid you will need to consult some of the R documentation which will explain how to do these sorts of things. Hope that helps a bit. + seth

ADD COMMENT • link 19.0 years ago Seth Falcon ★ 7.4k

Login before adding your answer.