Affy PhenoData

0

Entering edit mode

hsharm03@students.poly.edu ▴ 180

@hsharm03studentspolyedu-5225

Last seen 11.5 years ago

Dear all, I have 4 samples of HT 430mgpm array plate. Three of them are replicates of wild types and 1 is a tumor condition with no replicates. But I do not have any other information regarding how was the experiment conducted and I am not able to figure out how to create a phenodata of the same. When I create the expression set of this data using rma function of affy library I can see the names of the samples as they were and sample numbers namely 1,2,3,4. What I understand is it takes alll the four samples as different and when I do the differential expression analysis using limma , I try to create the model.matrix using the following command : design <- model.matrix(~sample, pData(eset)) But what I understand is that the sample that are present in the data it is taking 1 condition each of 4 samples. Am I understanding it correctly?. If so what should I be doing to get differential expression of genes in tumor as compare to the 3 wild type replicates that I have . I am very new to this field and so I am not sure how to proceed Any help will be much appreciated. Thanks , Himanshu Sharma. [[alternative HTML version deleted]]

affy limma affy limma • 1.4k views

ADD COMMENT • link updated 13.8 years ago by James W. MacDonald 68k • written 13.8 years ago by hsharm03@students.poly.edu ▴ 180

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Himanshu, On 4/18/2012 12:27 PM, hsharm03 at students.poly.edu wrote: > Dear all, > I have 4 samples of HT 430mgpm array plate. Three of them are replicates of wild types and 1 is a tumor condition with no replicates. But I do not have any other information regarding how was the experiment conducted and I am not able to figure out how to create a phenodata of the same. When I create the expression set of this data using rma function of affy library I can see the names of the samples as they were and sample numbers namely 1,2,3,4. What I understand is it takes alll the four samples as different and when I do the differential expression analysis using limma , I try to create the model.matrix using the following command : > > design<- model.matrix(~sample, pData(eset)) > > But what I understand is that the sample that are present in the data it is taking 1 condition each of 4 samples. Am I understanding it correctly?. Yes, you are understanding it correctly. But this leads me to a separate point, below. > If so what should I be doing to get differential expression of genes in tumor as compare to the 3 wild type replicates that I have . The simple answer is that you should use the correct input to model.matrix(), designed for your experiment. I realize that is a vague and wholly unsatisfying answer, but we have arrived at the point for you that occurs for all long term R users, when they either decide to figure stuff out themselves or they become disillusioned and give up. It would be simple for me to tell you exactly what you need to do for this step in your analysis. And then answer the next question, and the one after that. But that helps no one. If you are really going to analyze your own data (not recommended IMO, but that's the beauty of Open Source software - we get both the rope and the tree, and are free to hang ourselves) you will have to learn how to figure out both what you should be doing, and how to do it. So the best advice I can offer is to recommend that you find a local statistician to help with your analysis. Barring that, you should closely read the 'limma User's guide', probably the 'Introduction to R', certainly look at ?formula, ?model.matrix, and ?factor. The Bioconductor Case Studies contain many useful examples. There are also any number of presentations given over the years that you could find via the BioC website, or with good google skills. Best, Jim > > I am very new to this field and so I am not sure how to proceed > Any help will be much appreciated. > Thanks , > Himanshu Sharma. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.8 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James, I completely understand the importance of me learning on my own. But, the thing here is this data is not my own experiment. I have been given these 4 files as a test for a job interview. I do not have any other detail regarding it. I just have these 4 cel files. I know the first 3 are wild types or control and the other one is a tumor. I have to find the differential expression of tumor as compared to control. I have been reading up on the case studies as well as limma user manual. I have understood that my case is the two groups :Affymetrix case as according to section 8.5 of limma user manual. But, as per there manual , I am trying to create my design matrix and still I am not able to get it. I tried doing the contrast matrix also but I am still not getting it. I thought a lot about it and now I used the following command : design <- model.matrix(~factor(rep(1:2, c(3,1)))) fit <- lmFit(eset, design) fit2 <- eBayes(fit) topTable(fit2, 1) ID logFC AveExpr t P.Value adj.P.Val 973 1416642_PM_a_at 13.64906 13.60996 208.6925 6.608690e-09 1.436053e-06 8941 1424635_PM_at 13.55811 13.52669 206.8301 6.839348e-09 1.436053e-06 22282 1437976_PM_x_at 13.41281 13.41262 205.0409 7.070582e-09 1.436053e-06 955 1416624_PM_a_at 13.61291 13.61261 204.7329 7.111384e-09 1.436053e-06 32409 1448109_PM_a_at 13.35854 13.39007 203.3663 7.296015e-09 1.436053e-06 548 1416217_PM_a_at 13.31925 13.33522 202.1300 7.468282e-09 1.436053e-06 21301 1436995_PM_a_at 13.33443 13.35885 202.0929 7.473534e-09 1.436053e-06 33736 1449436_PM_s_at 13.36569 13.33920 201.5598 7.549461e-09 1.436053e-06 1748 1417417_PM_a_at 13.36793 13.26406 201.5213 7.554988e-09 1.436053e-06 8069 1423763_PM_x_at 13.20019 13.21718 200.1702 7.752016e-09 1.436053e-06 B 973 9.389064 8941 9.383131 22282 9.377292 955 9.376273 32409 9.371700 548 9.367491 21301 9.367364 33736 9.365526 1748 9.365393 8069 9.360674 > topTable(fit2, 2) ID logFC AveExpr t P.Value adj.P.Val 44020 1459725_PM_s_at 9.002733 5.546952 66.37302 5.294023e-07 0.0064537 21237 1436931_PM_at -7.605935 9.492585 -53.42400 1.213995e-06 0.0064537 5418 1421112_PM_at -6.853626 7.778874 -51.89407 1.356622e-06 0.0064537 39116 1454821_PM_at -6.814400 9.007994 -48.96033 1.694613e-06 0.0064537 8985 1424679_PM_at 6.579054 5.260500 48.16929 1.803474e-06 0.0064537 3964 1419633_PM_at 6.547493 4.696412 48.04883 1.820817e-06 0.0064537 8706 1424400_PM_a_at -8.609042 9.895516 -47.03876 1.974842e-06 0.0064537 7235 1422929_PM_s_at 6.579742 6.075454 46.85107 2.005251e-06 0.0064537 34850 1450555_PM_at 7.541370 4.892047 46.41624 2.077999e-06 0.0064537 1790 1417459_PM_at 6.722704 5.163969 45.95859 2.158198e-06 0.0064537 B 44020 5.930831 21237 5.650912 5418 5.606440 39116 5.511995 8985 5.484246 3964 5.479928 8706 5.442737 7235 5.435626 34850 5.418903 1790 5.400918 Is this approach correct?. I know I am asking a bit too much but this is the first time I am trying it and so want to be sure about it. Also , if possible can you tell me the difference in both the topTables. Which of the two tables are tbe difference between Tumor and control. What I understand is that co-ef = 2 is the one of my interest because it gives the differential expression in tumor. I am really sorry for so many questions and I hope this will clear up the situation a little bit. Thanks, Himanshu Sharma. > Date: Wed, 18 Apr 2012 14:50:15 -0400 > From: jmacdon@uw.edu > To: hsharm03@students.poly.edu > CC: bioconductor@r-project.org > Subject: Re: [BioC] Affy PhenoData > > Hi Himanshu, > > On 4/18/2012 12:27 PM, hsharm03@students.poly.edu wrote: > > Dear all, > > I have 4 samples of HT 430mgpm array plate. Three of them are replicates of wild types and 1 is a tumor condition with no replicates. But I do not have any other information regarding how was the experiment conducted and I am not able to figure out how to create a phenodata of the same. When I create the expression set of this data using rma function of affy library I can see the names of the samples as they were and sample numbers namely 1,2,3,4. What I understand is it takes alll the four samples as different and when I do the differential expression analysis using limma , I try to create the model.matrix using the following command : > > > > design<- model.matrix(~sample, pData(eset)) > > > > But what I understand is that the sample that are present in the data it is taking 1 condition each of 4 samples. Am I understanding it correctly?. > > Yes, you are understanding it correctly. But this leads me to a separate > point, below. > > > If so what should I be doing to get differential expression of genes in tumor as compare to the 3 wild type replicates that I have . > > The simple answer is that you should use the correct input to > model.matrix(), designed for your experiment. I realize that is a vague > and wholly unsatisfying answer, but we have arrived at the point for you > that occurs for all long term R users, when they either decide to figure > stuff out themselves or they become disillusioned and give up. > > It would be simple for me to tell you exactly what you need to do for > this step in your analysis. And then answer the next question, and the > one after that. But that helps no one. > > If you are really going to analyze your own data (not recommended IMO, > but that's the beauty of Open Source software - we get both the rope and > the tree, and are free to hang ourselves) you will have to learn how to > figure out both what you should be doing, and how to do it. > > So the best advice I can offer is to recommend that you find a local > statistician to help with your analysis. Barring that, you should > closely read the 'limma User's guide', probably the 'Introduction to R', > certainly look at ?formula, ?model.matrix, and ?factor. The Bioconductor > Case Studies contain many useful examples. There are also any number of > presentations given over the years that you could find via the BioC > website, or with good google skills. > > Best, > > Jim > > > > > > I am very new to this field and so I am not sure how to proceed > > Any help will be much appreciated. > > Thanks , > > Himanshu Sharma. > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]

ADD REPLY • link 13.8 years ago hsharm03@students.poly.edu ▴ 180

Login before adding your answer.