Data set for comparing statistical tests
1
0
Entering edit mode
Jorge Miró ▴ 160
@jorge-miro-5469
Last seen 9.6 years ago
Hi James, thank you. I checked and found what looks lika some arrays as rows and the genes in the arrays as columns: 203508_at 204563_at 204513_s_at 12_13_02_U133A_Mer_Latin_Square_Expt1_R1 0.000 0.000 0.000 12_13_02_U133A_Mer_Latin_Square_Expt2_R1 0.125 0.125 0.125. 12_13_02_U133A_Mer_Latin_Square_Expt3_R1 0.250 0.250 0.250 . . . . . . . . . . . . 12_13_02_U133A_Mer_Latin_Square_Expt1_R2 0.000 0.000 0.000 12_13_02_U133A_Mer_Latin_Square_Expt2_R2 0.125 0.125 0.125 12_13_02_U133A_Mer_Latin_Square_Expt3_R2 0.250 0.250 0.250 . . . . . . . . . . . . 12_13_02_U133A_Mer_Latin_Square_Expt1_R3 0.000 0.000 0.000 12_13_02_U133A_Mer_Latin_Square_Expt2_R3 0.125 0.125 0.125 12_13_02_U133A_Mer_Latin_Square_Expt3_R3 0.250 0.250 0.250 . . . . . . . . > What does the numbers in the pData matrix mean? Is that the concentrations? Is there any paper or lab description with a guide about how to compare statistical tests when using spike in data? I really can not figure out how I should go on with the comparison. It seems that the genes have the same concentrations among the three groups of arrays (from 0.000 to 512.000) so I guess I should take only some from each group and compare test for differentially expressed genes, eg four from group-R1 (concentration 0.000 to 0.500), four from group-R2 (concentrations 4.000 to 32.000) and four from group-R3 (concentrations 64.000 to 512.000). Am I thinking right? Also I checked the size of the pData > dim(pdata) [1] 42 42 are there really only 42 genes in the SpikeIn133 dataset or am I missing something here? Best regards Jorge On Fri, Aug 31, 2012 at 9:02 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Jorge, > > pData(phenoData(SpikeIn133)) > > Best, > > Jim > > > > > On 8/31/2012 2:12 PM, Jorge Mir? wrote: >> >> Hi again, >> >> I have been trying to understand how I should go on with the spike in >> data but in vain. >> Here are the commands I used: >> >> >> ************ Code ************************* >>> >>> library(SpikeIn) >>> data(SpikeIn133) >> >> #Checked phenoData as suggested.... >>> >>> phenoData(SpikeIn133) >> >> An object of class "AnnotatedDataFrame" >> sampleNames: 12_13_02_U133A_Mer_Latin_Square_Expt1_R1 >> 12_13_02_U133A_Mer_Latin_Square_Expt2_R1 ... >> 12_13_02_U133A_Mer_Latin_Square_Expt14_R3 (42 total) >> varLabels: 203508_at 204563_at ... AFFX-ThrX-3_at (42 total) >> varMetadata: labelDescription >> >> # ... but I could not see the concentrations for the samples. Is it >> something else I should do? I tryid with pData too and I could not >> find any information about the samples concentration. >> >> *************************** End of code ******************' >> I guess the SpikeIn133 is a file with raw intensities so I shoud apply >> rma on it and then use eg limma to test for differential expression of >> the genes. Am I right? >> >> I read the manual for SpikeIn but I can't see anything about the >> concentrations for each sample in the data set >> >> (http://www.bioconductor.org/packages/2.10/data/experiment/manuals/ SpikeIn/man/SpikeIn.pdf) >> >> >> Best regards >> Jorge >> >> On Fri, Aug 31, 2012 at 12:01 PM, Benilton Carvalho >> <beniltoncarvalho at="" gmail.com=""> wrote: >>> >>> check the SpikeIn package... in particular the phenoData slot for the >>> datasets available. b >>> >>> On 31 August 2012 10:58, Jorge Mir?<jorgma86 at="" gmail.com=""> wrote: >>>> >>>> Hi everybody, >>>> >>>> I need to compare Student's t-test and the test implemented in the >>>> limma package. Does any body has an idea of how I should do? >>>> >>>> I guess I need a data set with already known differentially expressed >>>> genes (maybe this can be done by specially designing the probesets in >>>> the used arrays?) and then compare the results of a t-tests and limma >>>> test with the expected differentially expressed genes. Where can I get >>>> such a data set? >>>> >>>> Sorry if the question is a bit stupid but I'm new to microarray >>>> analysis and statistics... By the way, should this kind of questions >>>> be posted here or should I use another forum? >>>> >>>> >>>> >>>> Best regards >>>> Jorge >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 >
GO limma GO limma • 687 views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.5 years ago
United States
1) almost any paper on preprocessing Affy arrays will describe the Latin Square spike-in experiment. Affy does, too: http://www.affymetrix.com/support/technical/sample_data/datasets.affx Terry Speed has an older page with more information: http://www.stat.berkeley.edu/users/terry/zarray/Affy/affy_index.html 2) pData is sample-specific covariates; there are 3x14 = 42 samples with 42 picomolar oligo concentrations specified, hence, 42x42. Due to nonspecific binding, there is hybridization to oligos besides those spiked in, hence the utility of the data. You will probably want to log2 transform the intensities for testing purposes. There are 22300 probesets on the hgu133a array: R> show(SpikeIn133) AffyBatch object size of arrays=712x712 features (194115 kb) cdf=hgu133atag (22300 affyids) number of samples=42 number of genes=22300 annotation=hgu133atag notes= 4) An alternative, fully moderated implementation of the moderated t-test described for limma ( in http://www.statsci.org/smyth/pubs/ebayes.pdf ) is now available from CRAN via the package 'fmt' ( http://cran.r-project.org/web/packages/fmt/index.html ). Either or both of these should be helpful in assessing the assumptions made for an empirical Bayes moderated test (cf: the likelihood ratio test for a two-group alternative in a Gaussian model is, of course, student's T test; in high-dimensional settings, James-Stein estimators show we can do better. Read the paper). 5) This isn't just an Affy problem; I'm sure you realize this is a general phenomenon. Somewhere between the purely frequentist, maximum- likelihood worldview and the fully Bayesian, subjective worldview lies empirical Bayes theory, where double dipping for data-driven priors is not just accepted but encouraged. Brad Efron taught a course on the topic, with the text placed online: http://www-stat.stanford.edu/~omkar/329/ On Fri, Aug 31, 2012 at 3:46 PM, Jorge Miró <jorgma86@gmail.com> wrote: > Hi James, > > thank you. I checked and found what looks lika some arrays as rows and > the genes in the arrays as columns: > > 203508_at 204563_at 204513_s_at > 12_13_02_U133A_Mer_Latin_Square_Expt1_R1 0.000 0.000 0.000 > 12_13_02_U133A_Mer_Latin_Square_Expt2_R1 0.125 0.125 0.125. > 12_13_02_U133A_Mer_Latin_Square_Expt3_R1 0.250 0.250 0.250 > . > . . . > . > . . . > . > . . . > 12_13_02_U133A_Mer_Latin_Square_Expt1_R2 0.000 0.000 0.000 > 12_13_02_U133A_Mer_Latin_Square_Expt2_R2 0.125 0.125 0.125 > 12_13_02_U133A_Mer_Latin_Square_Expt3_R2 0.250 0.250 0.250 > . > . . . > . > . . . > . > . . . > 12_13_02_U133A_Mer_Latin_Square_Expt1_R3 0.000 0.000 0.000 > 12_13_02_U133A_Mer_Latin_Square_Expt2_R3 0.125 0.125 0.125 > 12_13_02_U133A_Mer_Latin_Square_Expt3_R3 0.250 0.250 0.250 > . > . . . > . > . . . > > > > What does the numbers in the pData matrix mean? Is that the concentrations? > Is there any paper or lab description with a guide about how to > compare statistical tests when using spike in data? I really can not > figure out how I should go on with the comparison. It seems that the > genes have the same concentrations among the three groups of arrays > (from 0.000 to 512.000) so I guess I should take only some from each > group and compare test for differentially expressed genes, eg four > from group-R1 (concentration 0.000 to 0.500), four from group-R2 > (concentrations 4.000 to 32.000) and four from group-R3 > (concentrations 64.000 to 512.000). > > Am I thinking right? > > > Also I checked the size of the pData > > dim(pdata) > [1] 42 42 > > are there really only 42 genes in the SpikeIn133 dataset or am I > missing something here? > > Best regards > Jorge > > > On Fri, Aug 31, 2012 at 9:02 PM, James W. MacDonald <jmacdon@uw.edu> > wrote: > > Hi Jorge, > > > > pData(phenoData(SpikeIn133)) > > > > Best, > > > > Jim > > > > > > > > > > On 8/31/2012 2:12 PM, Jorge Miró wrote: > >> > >> Hi again, > >> > >> I have been trying to understand how I should go on with the spike in > >> data but in vain. > >> Here are the commands I used: > >> > >> > >> ************ Code ************************* > >>> > >>> library(SpikeIn) > >>> data(SpikeIn133) > >> > >> #Checked phenoData as suggested.... > >>> > >>> phenoData(SpikeIn133) > >> > >> An object of class "AnnotatedDataFrame" > >> sampleNames: 12_13_02_U133A_Mer_Latin_Square_Expt1_R1 > >> 12_13_02_U133A_Mer_Latin_Square_Expt2_R1 ... > >> 12_13_02_U133A_Mer_Latin_Square_Expt14_R3 (42 total) > >> varLabels: 203508_at 204563_at ... AFFX-ThrX-3_at (42 total) > >> varMetadata: labelDescription > >> > >> # ... but I could not see the concentrations for the samples. Is it > >> something else I should do? I tryid with pData too and I could not > >> find any information about the samples concentration. > >> > >> *************************** End of code ******************' > >> I guess the SpikeIn133 is a file with raw intensities so I shoud apply > >> rma on it and then use eg limma to test for differential expression of > >> the genes. Am I right? > >> > >> I read the manual for SpikeIn but I can't see anything about the > >> concentrations for each sample in the data set > >> > >> ( > http://www.bioconductor.org/packages/2.10/data/experiment/manuals/Sp ikeIn/man/SpikeIn.pdf > ) > >> > >> > >> Best regards > >> Jorge > >> > >> On Fri, Aug 31, 2012 at 12:01 PM, Benilton Carvalho > >> <beniltoncarvalho@gmail.com> wrote: > >>> > >>> check the SpikeIn package... in particular the phenoData slot for the > >>> datasets available. b > >>> > >>> On 31 August 2012 10:58, Jorge Miró<jorgma86@gmail.com> wrote: > >>>> > >>>> Hi everybody, > >>>> > >>>> I need to compare Student's t-test and the test implemented in the > >>>> limma package. Does any body has an idea of how I should do? > >>>> > >>>> I guess I need a data set with already known differentially expressed > >>>> genes (maybe this can be done by specially designing the probesets in > >>>> the used arrays?) and then compare the results of a t-tests and limma > >>>> test with the expected differentially expressed genes. Where can I get > >>>> such a data set? > >>>> > >>>> Sorry if the question is a bit stupid but I'm new to microarray > >>>> analysis and statistics... By the way, should this kind of questions > >>>> be posted here or should I use another forum? > >>>> > >>>> > >>>> > >>>> Best regards > >>>> Jorge > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6