which statistical test to perform?
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
hi all, i'm new to this microarray data analysis. recently i've been given data consisting of 11 tissues. now i've normalized the data, filtered data using mas5 AP calls. My question is, which statistical test i must perform to calculate the significance values ?? sample data is as below: accumbens amygdala cerebellum corpus.collosum hippocampus midbrain p.lobe putamen s.nigra t.lobe thalamus 1007_s_at 11.93852233 12.21404093 11.46118612 13.41594885 12.42216256 12.89589133 11.58715914 11.85803472 12.79920479 12.07087932 12.55338306 1053_at 7.490706858 7.526181155 7.551069308 7.891002293 7.49104271 7.971097552 8.088918072 7.660258014 7.92423132 7.54689645 7.128753703 117_at 8.486898268 8.773089087 7.642339349 8.560352732 7.676296801 7.865961146 7.250275943 7.929165261 7.874073766 7.940298941 8.10731601 I got some web results, from which i came to know that, chi-square test is of more relevant in this case (to compare 3 or more unmatched groups, binomial). Is it correct to choose chi-square test ?? Sorry if my question is too lame. thanks in advance. -- output of sessionInfo(): R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.3-13 loaded via a namespace (and not attached): [1] tools_2.13.1 -- Sent via the guest posting facility at bioconductor.org.
Microarray Microarray • 989 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
On Wed, Nov 23, 2011 at 5:14 AM, anand mt [guest] <guest at="" bioconductor.org=""> wrote: > > hi all, > > i'm new to this microarray data ?analysis. > recently i've been given data consisting of 11 tissues. > now i've normalized the data, filtered data using mas5 AP calls. My question is, which statistical test i must perform > to calculate the significance values ?? > > sample data is as below: > > > ? ? ? ?accumbens ? ? ? amygdala ? ? ? ?cerebellum ? ? ?corpus.collosum hippocampus ? ? midbrain ? ? ? ?p.lobe ?putamen s.nigra t.lobe ?thalamus > 1007_s_at ? ? ? 11.93852233 ? ? 12.21404093 ? ? 11.46118612 ? ? 13.41594885 ? ? 12.42216256 ? ? 12.89589133 ? ? 11.58715914 ? ? 11.85803472 ? ? 12.79920479 ? ? 12.07087932 ? ? 12.55338306 > 1053_at 7.490706858 ? ? 7.526181155 ? ? 7.551069308 ? ? 7.891002293 ? ? 7.49104271 ? ? ?7.971097552 ? ? 8.088918072 ? ? ? ?7.660258014 ? ? 7.92423132 ? ? ?7.54689645 ? ? ?7.128753703 > 117_at ?8.486898268 ? ? ? ?8.773089087 ? ? ? ?7.642339349 ? ? 8.560352732 ? ? 7.676296801 ? ? 7.865961146 ? ? ? ?7.250275943 ? ? 7.929165261 ? ? 7.874073766 ? ? ? ?7.940298941 ? ? 8.10731601 > > > I got some web results, from which i came to know that, chi-square test is of more relevant in this case (to compare 3 or more unmatched groups, binomial). Is it correct to choose chi-square test ?? > > Sorry if my question is too lame. Hi, Anand. I'm assuming that for you the biological question that you are asking is obvious, but to me it seems unclear. In particular, what groups above are you trying to compare? It seems you have no replicates? Sean > thanks in advance. > > ?-- output of sessionInfo(): > > R version 2.13.1 (2011-07-08) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United States.1252 ? ?LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] MASS_7.3-13 > > loaded via a namespace (and not attached): > [1] tools_2.13.1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States
Hi, Anand. Please try to keep the conversations on the list so that you can get the best answers to your questions. First, you will need to split the data back out to include the values for all your replicates. In other words, do not use the means. Working with means only essentially precludes any statistical testing at all. Second, I would suggest that you take a look at the limma package and the wonderful limma user guide. The statistical framework used in limma is the linear model and it works well for two-class or multi-class problems. Sean On Wed, Nov 23, 2011 at 6:40 AM, anand m t <anandrox05 at="" gmail.com=""> wrote: > Sir, > I'm tying to compare the data from all the brain tissues. The data which > i've shown here is the mean value of all 3 biological replicates of each > tissue. > > On Wed, Nov 23, 2011 at 4:57 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> >> On Wed, Nov 23, 2011 at 5:14 AM, anand mt [guest] >> <guest at="" bioconductor.org=""> wrote: >> > >> > hi all, >> > >> > i'm new to this microarray data ?analysis. >> > recently i've been given data consisting of 11 tissues. >> > now i've normalized the data, filtered data using mas5 AP calls. My >> > question is, which statistical test i must perform >> > to calculate the significance values ?? >> > >> > sample data is as below: >> > >> > >> > ? ? ? ?accumbens ? ? ? amygdala ? ? ? ?cerebellum ? ? ?corpus.collosum >> > hippocampus ? ? midbrain ? ? ? ?p.lobe ?putamen s.nigra t.lobe ?thalamus >> > 1007_s_at ? ? ? 11.93852233 ? ? 12.21404093 ? ? 11.46118612 >> > 13.41594885 ? ? 12.42216256 ? ? 12.89589133 ? ? 11.58715914 ? ? 11.85803472 >> > ? ? 12.79920479 ? ? 12.07087932 ? ? 12.55338306 >> > 1053_at 7.490706858 ? ? 7.526181155 ? ? 7.551069308 ? ? 7.891002293 >> > 7.49104271 ? ? ?7.971097552 ? ? 8.088918072 ? ? ? ?7.660258014 >> > 7.92423132 ? ? ?7.54689645 ? ? ?7.128753703 >> > 117_at ?8.486898268 ? ? ? ?8.773089087 ? ? ? ?7.642339349 >> > 8.560352732 ? ? 7.676296801 ? ? 7.865961146 ? ? ? ?7.250275943 >> > 7.929165261 ? ? 7.874073766 ? ? ? ?7.940298941 ? ? 8.10731601 >> > >> > >> > I got some web results, from which i came to know that, chi- square test >> > is of more relevant in this case (to compare 3 or more unmatched groups, >> > binomial). Is it correct to choose chi-square test ?? >> > >> > Sorry if my question is too lame. >> >> Hi, Anand. >> >> I'm assuming that for you the biological question that you are asking >> is obvious, but to me it seems unclear. ?In particular, what groups >> above are you trying to compare? ?It seems you have no replicates? >> >> Sean >> >> >> > thanks in advance. >> > >> > ?-- output of sessionInfo(): >> > >> > R version 2.13.1 (2011-07-08) >> > Platform: i386-pc-mingw32/i386 (32-bit) >> > >> > locale: >> > [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United >> > States.1252 ? ?LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> > [5] LC_TIME=English_United States.1252 >> > >> > attached base packages: >> > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> > >> > other attached packages: >> > [1] MASS_7.3-13 >> > >> > loaded via a namespace (and not attached): >> > [1] tools_2.13.1 >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > -- > ****************************************************************** > Anand M.T > School of Biotechnology (Bio-Informatics), > International Instituteof Information Technology (I2IT), > P-14, Rajiv Gandhi Infotech park, > Hinjewadi, > Pune-411 057. > INDIA. >
ADD COMMENT
0
Entering edit mode
Dear Sir, Thank you for your valuable suggestion. Will definitely look into it. I've one more question though. Sir, if we have only two datasets (say lung and liver), we can calculate log ratio (lung/liver) and finally fold change (2^log_ratio , considering log2 ratio). But in this case, if i want to determine the differentially expressed genes based on Fold Change, how do i that?? Do i have to take the ratio of expression value of each tissue with all other remaining tissues?? Sorry again, if my question doesn't make any sense. On Wed, Nov 23, 2011 at 5:43 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > Hi, Anand. > > Please try to keep the conversations on the list so that you can get > the best answers to your questions. > > First, you will need to split the data back out to include the values > for all your replicates. In other words, do not use the means. > Working with means only essentially precludes any statistical testing > at all. > > Second, I would suggest that you take a look at the limma package and > the wonderful limma user guide. The statistical framework used in > limma is the linear model and it works well for two-class or > multi-class problems. > > Sean > > > On Wed, Nov 23, 2011 at 6:40 AM, anand m t <anandrox05@gmail.com> wrote: > > Sir, > > I'm tying to compare the data from all the brain tissues. The data which > > i've shown here is the mean value of all 3 biological replicates of each > > tissue. > > > > On Wed, Nov 23, 2011 at 4:57 PM, Sean Davis <sdavis2@mail.nih.gov> > wrote: > >> > >> On Wed, Nov 23, 2011 at 5:14 AM, anand mt [guest] > >> <guest@bioconductor.org> wrote: > >> > > >> > hi all, > >> > > >> > i'm new to this microarray data analysis. > >> > recently i've been given data consisting of 11 tissues. > >> > now i've normalized the data, filtered data using mas5 AP calls. My > >> > question is, which statistical test i must perform > >> > to calculate the significance values ?? > >> > > >> > sample data is as below: > >> > > >> > > >> > accumbens amygdala cerebellum corpus.collosum > >> > hippocampus midbrain p.lobe putamen s.nigra t.lobe > thalamus > >> > 1007_s_at 11.93852233 12.21404093 11.46118612 > >> > 13.41594885 12.42216256 12.89589133 11.58715914 > 11.85803472 > >> > 12.79920479 12.07087932 12.55338306 > >> > 1053_at 7.490706858 7.526181155 7.551069308 7.891002293 > >> > 7.49104271 7.971097552 8.088918072 7.660258014 > >> > 7.92423132 7.54689645 7.128753703 > >> > 117_at 8.486898268 8.773089087 7.642339349 > >> > 8.560352732 7.676296801 7.865961146 7.250275943 > >> > 7.929165261 7.874073766 7.940298941 8.10731601 > >> > > >> > > >> > I got some web results, from which i came to know that, chi- square > test > >> > is of more relevant in this case (to compare 3 or more unmatched > groups, > >> > binomial). Is it correct to choose chi-square test ?? > >> > > >> > Sorry if my question is too lame. > >> > >> Hi, Anand. > >> > >> I'm assuming that for you the biological question that you are asking > >> is obvious, but to me it seems unclear. In particular, what groups > >> above are you trying to compare? It seems you have no replicates? > >> > >> Sean > >> > >> > >> > thanks in advance. > >> > > >> > -- output of sessionInfo(): > >> > > >> > R version 2.13.1 (2011-07-08) > >> > Platform: i386-pc-mingw32/i386 (32-bit) > >> > > >> > locale: > >> > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > >> > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > >> > [5] LC_TIME=English_United States.1252 > >> > > >> > attached base packages: > >> > [1] stats graphics grDevices utils datasets methods base > >> > > >> > other attached packages: > >> > [1] MASS_7.3-13 > >> > > >> > loaded via a namespace (and not attached): > >> > [1] tools_2.13.1 > >> > > >> > -- > >> > Sent via the guest posting facility at bioconductor.org. > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > > > -- > > ****************************************************************** > > Anand M.T > > School of Biotechnology (Bio-Informatics), > > International Instituteof Information Technology (I2IT), > > P-14, Rajiv Gandhi Infotech park, > > Hinjewadi, > > Pune-411 057. > > INDIA. > > > -- ****************************************************************** Anand M.T School of Biotechnology (Bio-Informatics), International Instituteof Information Technology (I2IT), P-14, Rajiv Gandhi Infotech park, Hinjewadi, Pune-411 057. INDIA. "The secret of success comprised in three words.. Work, Finish & Publish" - Michael Faraday [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Anand, You can calculate the log2 ratio of each pair of groups you wish to compare. A few more pointers: 1. It is desirable to have at least 3 samples in each group in order to calculate statistical signficance. 2. The MAS5 algorithm has been shown to lead to many false positives. The RMA and GCRMA algorithms are more reliable, but then you have to work from the cel files. 3. There are special statistical problems in analyzing microarray data because of the large number of genes. The best way to address this problem is the limma program. 4. Consdirations 2 and 3 are adsressable through user friendly Bioconductor programs called AffylmGUI and OneChannelGUI. 5. I can send you my course notes on the theory and workflow of the above approach offline upon request. Best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14 On Nov 23, 2011, at 7:34 AM, anand m t wrote: > Dear Sir, > Thank you for your valuable suggestion. Will definitely look into it. > > I've one more question though. > Sir, if we have only two datasets (say lung and liver), we can > calculate > log ratio (lung/liver) and finally fold change (2^log_ratio , > considering > log2 ratio). > But in this case, if i want to determine the differentially > expressed genes > based on Fold Change, how do i that?? > Do i have to take the ratio of expression value of each tissue with > all > other remaining tissues?? > > Sorry again, if my question doesn't make any sense. > > > On Wed, Nov 23, 2011 at 5:43 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> > wrote: > >> Hi, Anand. >> >> Please try to keep the conversations on the list so that you can get >> the best answers to your questions. >> >> First, you will need to split the data back out to include the values >> for all your replicates. In other words, do not use the means. >> Working with means only essentially precludes any statistical testing >> at all. >> >> Second, I would suggest that you take a look at the limma package and >> the wonderful limma user guide. The statistical framework used in >> limma is the linear model and it works well for two-class or >> multi-class problems. >> >> Sean >> >> >> On Wed, Nov 23, 2011 at 6:40 AM, anand m t <anandrox05 at="" gmail.com=""> >> wrote: >>> Sir, >>> I'm tying to compare the data from all the brain tissues. The data >>> which >>> i've shown here is the mean value of all 3 biological replicates >>> of each >>> tissue. >>> >>> On Wed, Nov 23, 2011 at 4:57 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> >> wrote: >>>> >>>> On Wed, Nov 23, 2011 at 5:14 AM, anand mt [guest] >>>> <guest at="" bioconductor.org=""> wrote: >>>>> >>>>> hi all, >>>>> >>>>> i'm new to this microarray data analysis. >>>>> recently i've been given data consisting of 11 tissues. >>>>> now i've normalized the data, filtered data using mas5 AP calls. >>>>> My >>>>> question is, which statistical test i must perform >>>>> to calculate the significance values ?? >>>>> >>>>> sample data is as below: >>>>> >>>>> >>>>> accumbens amygdala cerebellum >>>>> corpus.collosum >>>>> hippocampus midbrain p.lobe putamen s.nigra t.lobe >> thalamus >>>>> 1007_s_at 11.93852233 12.21404093 11.46118612 >>>>> 13.41594885 12.42216256 12.89589133 11.58715914 >> 11.85803472 >>>>> 12.79920479 12.07087932 12.55338306 >>>>> 1053_at 7.490706858 7.526181155 7.551069308 >>>>> 7.891002293 >>>>> 7.49104271 7.971097552 8.088918072 7.660258014 >>>>> 7.92423132 7.54689645 7.128753703 >>>>> 117_at 8.486898268 8.773089087 7.642339349 >>>>> 8.560352732 7.676296801 7.865961146 7.250275943 >>>>> 7.929165261 7.874073766 7.940298941 8.10731601 >>>>> >>>>> >>>>> I got some web results, from which i came to know that, chi- square >> test >>>>> is of more relevant in this case (to compare 3 or more unmatched >> groups, >>>>> binomial). Is it correct to choose chi-square test ?? >>>>> >>>>> Sorry if my question is too lame. >>>> >>>> Hi, Anand. >>>> >>>> I'm assuming that for you the biological question that you are >>>> asking >>>> is obvious, but to me it seems unclear. In particular, what groups >>>> above are you trying to compare? It seems you have no replicates? >>>> >>>> Sean >>>> >>>> >>>>> thanks in advance. >>>>> >>>>> -- output of sessionInfo(): >>>>> >>>>> R version 2.13.1 (2011-07-08) >>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C >>>>> [5] LC_TIME=English_United States.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods >>>>> base >>>>> >>>>> other attached packages: >>>>> [1] MASS_7.3-13 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tools_2.13.1 >>>>> >>>>> -- >>>>> Sent via the guest posting facility at bioconductor.org. >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>> >>> >>> >>> -- >>> ****************************************************************** >>> Anand M.T >>> School of Biotechnology (Bio-Informatics), >>> International Instituteof Information Technology (I2IT), >>> P-14, Rajiv Gandhi Infotech park, >>> Hinjewadi, >>> Pune-411 057. >>> INDIA. >>> >> > > > > -- > ****************************************************************** > Anand M.T > School of Biotechnology (Bio-Informatics), > International Instituteof Information Technology (I2IT), > P-14, Rajiv Gandhi Infotech park, > Hinjewadi, > Pune-411 057. > INDIA. > > "The secret of success comprised in three words.. Work, Finish & > Publish" - > Michael Faraday > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6