How to read in log2 ratio data

0

Entering edit mode

Peter Davidsen ▴ 210

@peter-davidsen-4584

Last seen 10.4 years ago

Hi all, I would like to conduct a time-course analysis using the limma package on my chip data (run as dual-color). I have two classes/groups with 8 subjects in each. Each 'experimental unit' has been measured at three different time points. However, I already have all the data as lowess normalized log-ratios => log2(Hy3/Hy5). How do I read in my txt-file with my log2 ratio data into R? And how do I define a vector/data frame? I have arranged the data so I have probe ID in the first column (row 2 to 200) and individual slide data in the following columns (that is, slide 1 data in column 2, and slide 2 data in column 3 and so on...). I have 48 slides in total. The main question I want to answer is which genes are differentially expressed between the two groups of subjects - at time point 1, 2, and 3, respectively. Cheers, Peter [[alternative HTML version deleted]]

probe limma probe limma • 2.6k views

ADD COMMENT • link 14.8 years ago Peter Davidsen ▴ 210

0

Entering edit mode

Peter Davidsen ▴ 210

@peter-davidsen-4584

Last seen 10.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110406="" 3216b806="" attachment-0001.pl="">

ADD COMMENT • link 14.8 years ago Peter Davidsen ▴ 210

0

Entering edit mode

Hi Peter, On 4/6/2011 4:25 AM, Peter Davidsen wrote: > Hi all, > > I would like to conduct a time-course analysis using the limma package on my > chip data (run as dual-color). I have two classes/groups with 8 subjects in > each. Each 'experimental unit' has been measured at three different time > points. > However, I already have all the data as lowess normalized log-ratios => > log2(Hy3/Hy5). How do I read in my txt-file with my log2 ratio data into R? > And how do I define a vector/data frame? This isn't a Bioconductor question. In fact, it isn't really an R-help question. I think you would be either ignored or eviscerated if you asked that over there. If you don't know how to read data in, you are a fair bit from being able to do a time course analysis. I would highly suggest you take the time to read 'An Introduction to R', so you can get past the data-manipulation steps. http://cran.r-project.org/doc/manuals/R-intro.html Best, Jim > > I have arranged the data so I have probe ID in the first column (row 2 to > 200) and individual slide data in the following columns (that is, slide 1 > data in column 2, and slide 2 data in column 3 and so on...). I have 48 > slides in total. > > The main question I want to answer is which genes are differentially > expressed between the two groups of subjects - at time point 1, 2, and 3, > respectively. > > Cheers, > Peter > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.8 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Peter, On 4/6/2011 4:25 AM, Peter Davidsen wrote: > Hi all, > > I would like to conduct a time-course analysis using the limma package on my > chip data (run as dual-color). I have two classes/groups with 8 subjects in > each. Each 'experimental unit' has been measured at three different time > points. > However, I already have all the data as lowess normalized log-ratios => > log2(Hy3/Hy5). How do I read in my txt-file with my log2 ratio data into R? > And how do I define a vector/data frame? This isn't a Bioconductor question. In fact, it isn't really an R-help question. I think you would be either ignored or eviscerated if you asked that over there. If you don't know how to read data in, you are a fair bit from being able to do a time course analysis. I would highly suggest you take the time to read 'An Introduction to R', so you can get past the data-manipulation steps. http://cran.r-project.org/doc/manuals/R-intro.html Best, Jim > > I have arranged the data so I have probe ID in the first column (row 2 to > 200) and individual slide data in the following columns (that is, slide 1 > data in column 2, and slide 2 data in column 3 and so on...). I have 48 > slides in total. > > The main question I want to answer is which genes are differentially > expressed between the two groups of subjects - at time point 1, 2, and 3, > respectively. > > Cheers, > Peter > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.8 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Peter, Jim (as always) made a great point regarding what sort of a question you are asking. In any case, here are three thoughts. First, there are a zillion ways to read data into R. This simple step can be unbelievably easy (or not). Look a the R function read.table. Second, the business of determining what is differentially expressed largely depends on how you want to define "differentially expressed". Back in the day, people often chose to define this as different by two or more fold, and in practice much more sophisticated definitions still rely on fold change as part of their definition, because no one really cares about minute changes, no matter how significant they are. On the other hand, no one cares about mean differences that appear to be driven by a single oddball observation. Last but not least, there is the business of adjusting p values for multiple hypothesis tests. Limma handles all these considerations well, but it is still up to you to determine how you want to define differential expression. Third, if these were my data, I would first like to establish that subjects in the same experimental group were more similar to each other than subjects from different groups. One way to do this is to identify genes that show high variability across samples and then see whether samples from subjects in the same group cluster together based on these genes. If they don't, one generally adopts a more skeptical point of view about the experiment: biological variability may overwhelm whatever treatment effect you have set out to find. Good luck, Tom On Apr 6, 2011, at 4:25 AM, Peter Davidsen wrote: > Hi all, > > I would like to conduct a time-course analysis using the limma > package on my > chip data (run as dual-color). I have two classes/groups with 8 > subjects in > each. Each 'experimental unit' has been measured at three different > time > points. > However, I already have all the data as lowess normalized log-ratios > => > log2(Hy3/Hy5). How do I read in my txt-file with my log2 ratio data > into R? > And how do I define a vector/data frame? > > I have arranged the data so I have probe ID in the first column > (row 2 to > 200) and individual slide data in the following columns (that is, > slide 1 > data in column 2, and slide 2 data in column 3 and so on...). I have > 48 > slides in total. > > The main question I want to answer is which genes are differentially > expressed between the two groups of subjects - at time point 1, 2, > and 3, > respectively. > > Cheers, > Peter > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.8 years ago Thomas Hampton ▴ 750

0

Entering edit mode

Hi Peter, Jim (as always) made a great point regarding what sort of a question you are asking. In any case, here are three thoughts. First, there are a zillion ways to read data into R. This simple step can be unbelievably easy (or not). Look a the R function read.table. Second, the business of determining what is differentially expressed largely depends on how you want to define "differentially expressed". Back in the day, people often chose to define this as different by two or more fold, and in practice much more sophisticated definitions still rely on fold change as part of their definition, because no one really cares about minute changes, no matter how significant they are. On the other hand, no one cares about mean differences that appear to be driven by a single oddball observation. Last but not least, there is the business of adjusting p values for multiple hypothesis tests. Limma handles all these considerations well, but it is still up to you to determine how you want to define differential expression. Third, if these were my data, I would first like to establish that subjects in the same experimental group were more similar to each other than subjects from different groups. One way to do this is to identify genes that show high variability across samples and then see whether samples from subjects in the same group cluster together based on these genes. If they don't, one generally adopts a more skeptical point of view about the experiment: biological variability may overwhelm whatever treatment effect you have set out to find. Good luck, Tom On Apr 6, 2011, at 4:25 AM, Peter Davidsen wrote: > Hi all, > > I would like to conduct a time-course analysis using the limma > package on my > chip data (run as dual-color). I have two classes/groups with 8 > subjects in > each. Each 'experimental unit' has been measured at three different > time > points. > However, I already have all the data as lowess normalized log-ratios > => > log2(Hy3/Hy5). How do I read in my txt-file with my log2 ratio data > into R? > And how do I define a vector/data frame? > > I have arranged the data so I have probe ID in the first column > (row 2 to > 200) and individual slide data in the following columns (that is, > slide 1 > data in column 2, and slide 2 data in column 3 and so on...). I have > 48 > slides in total. > > The main question I want to answer is which genes are differentially > expressed between the two groups of subjects - at time point 1, 2, > and 3, > respectively. > > Cheers, > Peter > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 14.8 years ago Thomas Hampton ▴ 750

Login before adding your answer.