Affy data analysis

0

Entering edit mode

hsharm03@students.poly.edu ▴ 180

@hsharm03studentspolyedu-5225

Last seen 9.6 years ago

Dear all,I have data from affy HT430mgpm and I need to analyze the data for differential expression and pathway analysis. I have 3 wildtype controls (Wt neurospheres 2 and 3) for the control analysis. I have two other tumors (1509 and 1701) for the analysis. From the cel files, it doesnt appear that we did replicates for the tumors, just one each, the rationale at the time being that we had wanted to first quickly scan the tumors for common signatures. Those genes that are clearly highly expressed should however represent additional oncogenic signatures, that may stem from the same or related activating pathways.For now, my analysis for controls should give me an accurate expression data for the controls. The tumors will have to be compared across the samples to look for the low hanging fruits.??I am not sure how do I go about doing this since I have 3 replicates for the control but 1 each for different tumors. What should be the strategy that I should use in order to do my analysis. Thanks,Himanshu Sharma [[alternative HTML version deleted]]

Pathways GO affy Pathways GO affy • 1.3k views

ADD COMMENT • link updated 12.0 years ago by James W. MacDonald 65k • written 12.0 years ago by hsharm03@students.poly.edu ▴ 180

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 20 minutes ago

United States

Hi Himanshu Sharma, On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote: > Dear all,I have data from affy HT430mgpm and I need to analyze the data for differential expression and pathway analysis. I have 3 wildtype controls (Wt neurospheres 2 and 3) for the control analysis. I have two other tumors (1509 and 1701) for the analysis. From the cel files, it doesn?t appear that we did replicates for the tumors, just one each, the rationale at the time being that we had wanted to first quickly scan the tumors for common signatures. Those genes that are clearly highly expressed should however represent additional oncogenic signatures, that may stem from the same or related activating pathways.For now, my analysis for controls should give me an accurate expression data for the controls. The tumors will have to be compared across the samples to look for the low hanging fruits.??I am not sure how do I go about doing this since I have 3 replicates for the control but 1 each for different tumors. What should be the strategy that I should use in order to do my analysis. You can just analyze your data as indicated in the limma User's Guide. Note that although you only have one sample for each of the tumor samples, since you have three replicates for the control you end up with 2 degrees of freedom, so can actually fit a model and compute contrasts. Here is an example using some fake data: > x <- matrix(rnorm(5e5), ncol = 5) > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > fit <- lmFit(x, design) > fit2 <- eBayes(fit) > topTable(fit2, 2) logFC t P.Value adj.P.Val B 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > topTable(fit2, 3) logFC t P.Value adj.P.Val B 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 As you can see, limma is happy to run the analysis without any replication for two of the sample types. Best, Jim > Thanks,Himanshu Sharma > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James ,I was able to get the topTable using the method you told me to follow from the limma manual. But when I try to annotate the genes using ENTREZ , it requires id's but the topTable we found does not have the Id's tab. So what should I be doing?. Thanks,Himanshu Sharma. > Date: Mon, 16 Apr 2012 09:49:10 -0400 > From: jmacdon@uw.edu > To: hsharm03@students.poly.edu > CC: bioconductor@r-project.org > Subject: Re: [BioC] Affy data analysis > > Hi Himanshu Sharma, > > On 4/14/2012 6:42 PM, hsharm03@students.poly.edu wrote: > > Dear all,I have data from affy HT430mgpm and I need to analyze the data for differential expression and pathway analysis. I have 3 wildtype controls (Wt neurospheres 2 and 3) for the control analysis. I have two other tumors (1509 and 1701) for the analysis. From the cel files, it doesnt appear that we did replicates for the tumors, just one each, the rationale at the time being that we had wanted to first quickly scan the tumors for common signatures. Those genes that are clearly highly expressed should however represent additional oncogenic signatures, that may stem from the same or related activating pathways.For now, my analysis for controls should give me an accurate expression data for the controls. The tumors will have to be compared across the samples to look for the low hanging fruits.??I am not sure how do I go about doing this since I have 3 replicates for the control but 1 each for different tumors. What should be the strategy that I should use in order to do my analysis. > > You can just analyze your data as indicated in the limma User's Guide. > Note that although you only have one sample for each of the tumor > samples, since you have three replicates for the control you end up with > 2 degrees of freedom, so can actually fit a model and compute contrasts. > Here is an example using some fake data: > > > x <- matrix(rnorm(5e5), ncol = 5) > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > fit <- lmFit(x, design) > > fit2 <- eBayes(fit) > > topTable(fit2, 2) > logFC t P.Value adj.P.Val B > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > topTable(fit2, 3) > logFC t P.Value adj.P.Val B > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > As you can see, limma is happy to run the analysis without any > replication for two of the sample types. > > Best, > > Jim > > > > Thanks,Himanshu Sharma > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago hsharm03@students.poly.edu ▴ 180

0

Entering edit mode

Hi Himanshu, On 4/17/2012 2:54 PM, hsharm03 at students.poly.edu wrote: > Dear James , > I was able to get the topTable using the method you told me to follow > from the limma manual. But when I try to annotate the genes using > ENTREZ , it requires id's but the topTable we found does not have the > Id's tab. So what should I be doing?. If you are using an ExpressionSet when you run lmFit(), then you should automatically get the ID column in your topTable() output. If not, note that the row.names for your topTable() output correspond to the rows of your data, so you can get the probeset IDs that way. Best, Jim > Thanks, > Himanshu Sharma. > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > From: jmacdon at uw.edu > > To: hsharm03 at students.poly.edu > > CC: bioconductor at r-project.org > > Subject: Re: [BioC] Affy data analysis > > > > Hi Himanshu Sharma, > > > > On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote: > > > Dear all,I have data from affy HT430mgpm and I need to analyze the > data for differential expression and pathway analysis. I have 3 > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > I have two other tumors (1509 and 1701) for the analysis. From the cel > files, it doesn?t appear that we did replicates for the tumors, just > one each, the rationale at the time being that we had wanted to first > quickly scan the tumors for common signatures. Those genes that are > clearly highly expressed should however represent additional oncogenic > signatures, that may stem from the same or related activating > pathways.For now, my analysis for controls should give me an accurate > expression data for the controls. The tumors will have to be compared > across the samples to look for the low hanging fruits.??I am not sure > how do I go about doing this since I have 3 replicates for the control > but 1 each for different tumors. What should be the strategy that I > should use in order to do my analysis. > > > > You can just analyze your data as indicated in the limma User's Guide. > > Note that although you only have one sample for each of the tumor > > samples, since you have three replicates for the control you end up > with > > 2 degrees of freedom, so can actually fit a model and compute > contrasts. > > Here is an example using some fake data: > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > fit <- lmFit(x, design) > > > fit2 <- eBayes(fit) > > > topTable(fit2, 2) > > logFC t P.Value adj.P.Val B > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > topTable(fit2, 3) > > logFC t P.Value adj.P.Val B > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > As you can see, limma is happy to run the analysis without any > > replication for two of the sample types. > > > > Best, > > > > Jim > > > > > > > Thanks,Himanshu Sharma > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James,Thanks a lot for your help. I will try it again.Thanks,Himanshu Sharma. > Date: Tue, 17 Apr 2012 15:15:49 -0400 > From: jmacdon@uw.edu > To: hsharm03@students.poly.edu > CC: bioconductor@r-project.org > Subject: Re: [BioC] Affy data analysis > > Hi Himanshu, > > On 4/17/2012 2:54 PM, hsharm03@students.poly.edu wrote: > > Dear James , > > I was able to get the topTable using the method you told me to follow > > from the limma manual. But when I try to annotate the genes using > > ENTREZ , it requires id's but the topTable we found does not have the > > Id's tab. So what should I be doing?. > > If you are using an ExpressionSet when you run lmFit(), then you should > automatically get the ID column in your topTable() output. If not, note > that the row.names for your topTable() output correspond to the rows of > your data, so you can get the probeset IDs that way. > > Best, > > Jim > > > > Thanks, > > Himanshu Sharma. > > > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > > From: jmacdon@uw.edu > > > To: hsharm03@students.poly.edu > > > CC: bioconductor@r-project.org > > > Subject: Re: [BioC] Affy data analysis > > > > > > Hi Himanshu Sharma, > > > > > > On 4/14/2012 6:42 PM, hsharm03@students.poly.edu wrote: > > > > Dear all,I have data from affy HT430mgpm and I need to analyze the > > data for differential expression and pathway analysis. I have 3 > > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > > I have two other tumors (1509 and 1701) for the analysis. From the cel > > files, it doesnt appear that we did replicates for the tumors, just > > one each, the rationale at the time being that we had wanted to first > > quickly scan the tumors for common signatures. Those genes that are > > clearly highly expressed should however represent additional oncogenic > > signatures, that may stem from the same or related activating > > pathways.For now, my analysis for controls should give me an accurate > > expression data for the controls. The tumors will have to be compared > > across the samples to look for the low hanging fruits.??I am not sure > > how do I go about doing this since I have 3 replicates for the control > > but 1 each for different tumors. What should be the strategy that I > > should use in order to do my analysis. > > > > > > You can just analyze your data as indicated in the limma User's Guide. > > > Note that although you only have one sample for each of the tumor > > > samples, since you have three replicates for the control you end up > > with > > > 2 degrees of freedom, so can actually fit a model and compute > > contrasts. > > > Here is an example using some fake data: > > > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > > fit <- lmFit(x, design) > > > > fit2 <- eBayes(fit) > > > > topTable(fit2, 2) > > > logFC t P.Value adj.P.Val B > > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > > topTable(fit2, 3) > > > logFC t P.Value adj.P.Val B > > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > > > As you can see, limma is happy to run the analysis without any > > > replication for two of the sample types. > > > > > > Best, > > > > > > Jim > > > > > > > > > > Thanks,Himanshu Sharma > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago hsharm03@students.poly.edu ▴ 180

0

Entering edit mode

Dear James,So I was able to get the ENTREZ ID and GENENAME . Then I was able to annotate it using GO and now I have 8 GOids of the top genes in my analysis . How do I use the GO IDs to find out Pathway analysis. I checked KEGG but it seems it takes in KO or EC format as input. Any help is much appreciated and I would also like to thank you for your help till now.Thanks,Himanshu Sharma. From: hsharm03@students.poly.edu To: jmacdon@uw.edu; bioconductor@r-project.org Subject: RE: [BioC] Affy data analysis Date: Tue, 17 Apr 2012 19:20:01 +0000 Dear James,Thanks a lot for your help. I will try it again.Thanks,Himanshu Sharma. > Date: Tue, 17 Apr 2012 15:15:49 -0400 > From: jmacdon@uw.edu > To: hsharm03@students.poly.edu > CC: bioconductor@r-project.org > Subject: Re: [BioC] Affy data analysis > > Hi Himanshu, > > On 4/17/2012 2:54 PM, hsharm03@students.poly.edu wrote: > > Dear James , > > I was able to get the topTable using the method you told me to follow > > from the limma manual. But when I try to annotate the genes using > > ENTREZ , it requires id's but the topTable we found does not have the > > Id's tab. So what should I be doing?. > > If you are using an ExpressionSet when you run lmFit(), then you should > automatically get the ID column in your topTable() output. If not, note > that the row.names for your topTable() output correspond to the rows of > your data, so you can get the probeset IDs that way. > > Best, > > Jim > > > > Thanks, > > Himanshu Sharma. > > > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > > From: jmacdon@uw.edu > > > To: hsharm03@students.poly.edu > > > CC: bioconductor@r-project.org > > > Subject: Re: [BioC] Affy data analysis > > > > > > Hi Himanshu Sharma, > > > > > > On 4/14/2012 6:42 PM, hsharm03@students.poly.edu wrote: > > > > Dear all,I have data from affy HT430mgpm and I need to analyze the > > data for differential expression and pathway analysis. I have 3 > > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > > I have two other tumors (1509 and 1701) for the analysis. From the cel > > files, it doesnt appear that we did replicates for the tumors, just > > one each, the rationale at the time being that we had wanted to first > > quickly scan the tumors for common signatures. Those genes that are > > clearly highly expressed should however represent additional oncogenic > > signatures, that may stem from the same or related activating > > pathways.For now, my analysis for controls should give me an accurate > > expression data for the controls. The tumors will have to be compared > > across the samples to look for the low hanging fruits.??I am not sure > > how do I go about doing this since I have 3 replicates for the control > > but 1 each for different tumors. What should be the strategy that I > > should use in order to do my analysis. > > > > > > You can just analyze your data as indicated in the limma User's Guide. > > > Note that although you only have one sample for each of the tumor > > > samples, since you have three replicates for the control you end up > > with > > > 2 degrees of freedom, so can actually fit a model and compute > > contrasts. > > > Here is an example using some fake data: > > > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > > fit <- lmFit(x, design) > > > > fit2 <- eBayes(fit) > > > > topTable(fit2, 2) > > > logFC t P.Value adj.P.Val B > > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > > topTable(fit2, 3) > > > logFC t P.Value adj.P.Val B > > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > > > As you can see, limma is happy to run the analysis without any > > > replication for two of the sample types. > > > > > > Best, > > > > > > Jim > > > > > > > > > > Thanks,Himanshu Sharma > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago hsharm03@students.poly.edu ▴ 180

0

Entering edit mode

http://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/d oc/GOstatsHyperG.pdf On 4/17/2012 11:41 PM, hsharm03 at students.poly.edu wrote: > Dear James, > So I was able to get the ENTREZ ID and GENENAME . Then I was able to > annotate it using GO and now I have 8 GOids of the top genes in my > analysis . How do I use the GO IDs to find out Pathway analysis. I > checked KEGG but it seems it takes in KO or EC format as input. Any > help is much appreciated and I would also like to thank you for your > help till now. > Thanks, > Himanshu Sharma. > > -------------------------------------------------------------------- ---- > From: hsharm03 at students.poly.edu > To: jmacdon at uw.edu; bioconductor at r-project.org > Subject: RE: [BioC] Affy data analysis > Date: Tue, 17 Apr 2012 19:20:01 +0000 > > Dear James, > Thanks a lot for your help. I will try it again. > Thanks, > Himanshu Sharma. > > > Date: Tue, 17 Apr 2012 15:15:49 -0400 > > From: jmacdon at uw.edu > > To: hsharm03 at students.poly.edu > > CC: bioconductor at r-project.org > > Subject: Re: [BioC] Affy data analysis > > > > Hi Himanshu, > > > > On 4/17/2012 2:54 PM, hsharm03 at students.poly.edu wrote: > > > Dear James , > > > I was able to get the topTable using the method you told me to follow > > > from the limma manual. But when I try to annotate the genes using > > > ENTREZ , it requires id's but the topTable we found does not have the > > > Id's tab. So what should I be doing?. > > > > If you are using an ExpressionSet when you run lmFit(), then you should > > automatically get the ID column in your topTable() output. If not, note > > that the row.names for your topTable() output correspond to the rows of > > your data, so you can get the probeset IDs that way. > > > > Best, > > > > Jim > > > > > > > Thanks, > > > Himanshu Sharma. > > > > > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > > > From: jmacdon at uw.edu > > > > To: hsharm03 at students.poly.edu > > > > CC: bioconductor at r-project.org > > > > Subject: Re: [BioC] Affy data analysis > > > > > > > > Hi Himanshu Sharma, > > > > > > > > On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote: > > > > > Dear all,I have data from affy HT430mgpm and I need to analyze > the > > > data for differential expression and pathway analysis. I have 3 > > > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > > > I have two other tumors (1509 and 1701) for the analysis. From the > cel > > > files, it doesn?t appear that we did replicates for the tumors, just > > > one each, the rationale at the time being that we had wanted to first > > > quickly scan the tumors for common signatures. Those genes that are > > > clearly highly expressed should however represent additional > oncogenic > > > signatures, that may stem from the same or related activating > > > pathways.For now, my analysis for controls should give me an accurate > > > expression data for the controls. The tumors will have to be compared > > > across the samples to look for the low hanging fruits.??I am not sure > > > how do I go about doing this since I have 3 replicates for the > control > > > but 1 each for different tumors. What should be the strategy that I > > > should use in order to do my analysis. > > > > > > > > You can just analyze your data as indicated in the limma User's > Guide. > > > > Note that although you only have one sample for each of the tumor > > > > samples, since you have three replicates for the control you end up > > > with > > > > 2 degrees of freedom, so can actually fit a model and compute > > > contrasts. > > > > Here is an example using some fake data: > > > > > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > > > fit <- lmFit(x, design) > > > > > fit2 <- eBayes(fit) > > > > > topTable(fit2, 2) > > > > logFC t P.Value adj.P.Val B > > > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > > > topTable(fit2, 3) > > > > logFC t P.Value adj.P.Val B > > > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > > > > > As you can see, limma is happy to run the analysis without any > > > > replication for two of the sample types. > > > > > > > > Best, > > > > > > > > Jim > > > > > > > > > > > > > Thanks,Himanshu Sharma > > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioconductor mailing list > > > > > Bioconductor at r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > > > > James W. MacDonald, M.S. > > > > Biostatistician > > > > University of Washington > > > > Environmental and Occupational Health Sciences > > > > 4225 Roosevelt Way NE, # 100 > > > > Seattle WA 98105-6099 > > > > > > > > > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 20 minutes ago

United States

Hi Himanshu, Please don't take discussions off list. We like to think of the list archives as a repository of knowledge, and if you take conversations off-list, it eliminates that aspect. On 4/16/2012 10:35 AM, hsharm03 at students.poly.edu wrote: > Dear James, > Thanks a lot for your reply. I will definitely do that. Also , I > wanted to ask where can I get the annotations for the HT430mgpm > array. Once I get the table of top genes. I would like to annotate > them and use them for pathway analysis . Where can I get the > annotation.db for the same. Also, Which method will you recommend in > order to get the eset. I was thinking of using RMA. I should be > comparing all of my control to each tumor differently?. Is that right > ?. I am sorry to be asking you so many questions but I am new to this > field and was thinking about it since many days. The mouse4302.db annotation package should suffice. It is my understanding that the chip you used is the same, without the MM probes. If not, it is easy enough to make your own with the annotation file you can get from Affymetrix and the AnnotationDbi package. See http://bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/ inst/doc/makeProbePackage.pdf if you want to make your own. As for deciding how to analyze your data, that is up to you. I am more than willing to help with questions about how to use BioC packages, but cannot give analysis advice. Best, Jim > Thanks a lot for your help till now . It is really helpful. Hope to > hear back from you. > Thanks, > Himanshu Sharma. > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > From: jmacdon at uw.edu > > To: hsharm03 at students.poly.edu > > CC: bioconductor at r-project.org > > Subject: Re: [BioC] Affy data analysis > > > > Hi Himanshu Sharma, > > > > On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote: > > > Dear all,I have data from affy HT430mgpm and I need to analyze the > data for differential expression and pathway analysis. I have 3 > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > I have two other tumors (1509 and 1701) for the analysis. From the cel > files, it doesn?t appear that we did replicates for the tumors, just > one each, the rationale at the time being that we had wanted to first > quickly scan the tumors for common signatures. Those genes that are > clearly highly expressed should however represent additional oncogenic > signatures, that may stem from the same or related activating > pathways.For now, my analysis for controls should give me an accurate > expression data for the controls. The tumors will have to be compared > across the samples to look for the low hanging fruits.??I am not sure > how do I go about doing this since I have 3 replicates for the control > but 1 each for different tumors. What should be the strategy that I > should use in order to do my analysis. > > > > You can just analyze your data as indicated in the limma User's Guide. > > Note that although you only have one sample for each of the tumor > > samples, since you have three replicates for the control you end up > with > > 2 degrees of freedom, so can actually fit a model and compute > contrasts. > > Here is an example using some fake data: > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > fit <- lmFit(x, design) > > > fit2 <- eBayes(fit) > > > topTable(fit2, 2) > > logFC t P.Value adj.P.Val B > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > topTable(fit2, 3) > > logFC t P.Value adj.P.Val B > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > As you can see, limma is happy to run the analysis without any > > replication for two of the sample types. > > > > Best, > > > > Jim > > > > > > > Thanks,Himanshu Sharma > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James, I am sorry I did not realize that I am taking it off the list. It wont happen the next time. Thanks a lot for your help. The chip I used was without the MM probes. It was really helpful and I hope I can finish off the analysis as soon as possible. Thanks, Himanshu Sharma. > Date: Mon, 16 Apr 2012 12:02:28 -0400 > From: jmacdon@uw.edu > To: hsharm03@students.poly.edu > CC: Bioconductor@r-project.org > Subject: Re: [BioC] Affy data analysis > > Hi Himanshu, > > Please don't take discussions off list. We like to think of the list > archives as a repository of knowledge, and if you take conversations > off-list, it eliminates that aspect. > > On 4/16/2012 10:35 AM, hsharm03@students.poly.edu wrote: > > Dear James, > > Thanks a lot for your reply. I will definitely do that. Also , I > > wanted to ask where can I get the annotations for the HT430mgpm > > array. Once I get the table of top genes. I would like to annotate > > them and use them for pathway analysis . Where can I get the > > annotation.db for the same. Also, Which method will you recommend in > > order to get the eset. I was thinking of using RMA. I should be > > comparing all of my control to each tumor differently?. Is that right > > ?. I am sorry to be asking you so many questions but I am new to this > > field and was thinking about it since many days. > > The mouse4302.db annotation package should suffice. It is my > understanding that the chip you used is the same, without the MM probes. > If not, it is easy enough to make your own with the annotation file you > can get from Affymetrix and the AnnotationDbi package. See > > http://bioconductor.org/packages/release/bioc/vignettes/AnnotationDb i/inst/doc/makeProbePackage.pdf > > if you want to make your own. > > As for deciding how to analyze your data, that is up to you. I am more > than willing to help with questions about how to use BioC packages, but > cannot give analysis advice. > > Best, > > Jim > > > > Thanks a lot for your help till now . It is really helpful. Hope to > > hear back from you. > > Thanks, > > Himanshu Sharma. > > > > > Date: Mon, 16 Apr 2012 09:49:10 -0400 > > > From: jmacdon@uw.edu > > > To: hsharm03@students.poly.edu > > > CC: bioconductor@r-project.org > > > Subject: Re: [BioC] Affy data analysis > > > > > > Hi Himanshu Sharma, > > > > > > On 4/14/2012 6:42 PM, hsharm03@students.poly.edu wrote: > > > > Dear all,I have data from affy HT430mgpm and I need to analyze the > > data for differential expression and pathway analysis. I have 3 > > wildtype controls (Wt neurospheres 2 and 3) for the control analysis. > > I have two other tumors (1509 and 1701) for the analysis. From the cel > > files, it doesnt appear that we did replicates for the tumors, just > > one each, the rationale at the time being that we had wanted to first > > quickly scan the tumors for common signatures. Those genes that are > > clearly highly expressed should however represent additional oncogenic > > signatures, that may stem from the same or related activating > > pathways.For now, my analysis for controls should give me an accurate > > expression data for the controls. The tumors will have to be compared > > across the samples to look for the low hanging fruits.??I am not sure > > how do I go about doing this since I have 3 replicates for the control > > but 1 each for different tumors. What should be the strategy that I > > should use in order to do my analysis. > > > > > > You can just analyze your data as indicated in the limma User's Guide. > > > Note that although you only have one sample for each of the tumor > > > samples, since you have three replicates for the control you end up > > with > > > 2 degrees of freedom, so can actually fit a model and compute > > contrasts. > > > Here is an example using some fake data: > > > > > > > x <- matrix(rnorm(5e5), ncol = 5) > > > > design <- model.matrix(~factor(rep(1:3, c(3,1,1)))) > > > > fit <- lmFit(x, design) > > > > fit2 <- eBayes(fit) > > > > topTable(fit2, 2) > > > logFC t P.Value adj.P.Val B > > > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008 > > > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736 > > > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717 > > > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015 > > > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141 > > > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574 > > > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841 > > > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714 > > > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596 > > > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809 > > > > topTable(fit2, 3) > > > logFC t P.Value adj.P.Val B > > > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077 > > > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617 > > > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452 > > > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249 > > > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920 > > > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524 > > > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235 > > > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398 > > > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357 > > > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741 > > > > > > As you can see, limma is happy to run the analysis without any > > > replication for two of the sample types. > > > > > > Best, > > > > > > Jim > > > > > > > > > > Thanks,Himanshu Sharma > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@r-project.org > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago hsharm03@students.poly.edu ▴ 180

Login before adding your answer.