----- Original Message ----- From: <bioconductor-request@stat.math.ethz.ch> To: <bioconductor at="" stat.math.ethz.ch=""> Sent: Monday, January 25, 2010 12:00 PM Subject: Bioconductor Digest, Vol 83, Issue 23

Today's Topics:

1. Re: Seeking assistance on ROC (Susan Bosco)
2. Re: question about lmFit model (Sunny Srivastava)
3. Agilent G4112A Arrays (Chuming Chen)
4. Re: Agilent G4112A Arrays (Prashantha Hebbar) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 25 Jan 2010 09:25:13 +0530 (IST) > From: Susan Bosco <susanbosco86 at="" yahoo.com=""> > To: Sean Davis <seandavi at="" gmail.com=""> > Cc: prashantha hebbar <prashantha.hebbar at="" manipal.edu="">, > bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Seeking assistance on ROC > Message-ID: <818904.90406.qm at web95305.mail.in2.yahoo.com> > Content-Type: text/plain > > Dear Sean, > > Thanks again. > > I corrected the script changing the value of 'truth' variable with > rbinom() function. Since my data size is quite large(data is of 244K),I > tried with the first 200,for which I was able to find proper ROC curve. > However, when I include the complete data, the plot changes. For the whole > data,I get > a linear graph with small variations. > > My sessionInfo() looks like this: > For 100 values of the data: > library(ROC) > load("RGKma.RData") > state= rbinom(length(RGKma$M[1:100,3]),1,0.33) > data = RGKma$M[1:200,3] > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > pdf("ROCk.pdf") > plot(R1, show.thresh=TRUE,col = "red") > dev.off() > > For the complete data: > library(ROC) > load("RGKma.RData") > state= rbinom(length(RGKma$M[,3]),1,0.33) > data = RGKma$M[,3] > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > pdf("ROCallk.pdf") > plot(R1, show.thresh=TRUE,col = "red") > dev.off() > > I would appreciate if you could > help me out with this problem that I encountered with a large data size. > > Thanking you sincerely, > Susan. > > > --- On Wed, 20/1/10, Sean Davis <seandavi at="" gmail.com=""> wrote: > > From: Sean Davis <seandavi at="" gmail.com=""> > Subject: Re: [BioC] Seeking assistance on ROC > To: "Susan Bosco" <susanbosco86 at="" yahoo.com=""> > Cc: bioconductor at stat.math.ethz.ch, "prashantha hebbar" > <prashantha.hebbar at="" manipal.edu=""> > Date: Wednesday, 20 January, 2010, 12:05 PM > > > > On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86 at="" yahoo.com=""> > wrote: > > > Dear > Sean, > > Thank you so much for the help. > > > I tried with a range of thresholds from 0-0.9..As you had mentioned,the > true positive rates no doubt increased with thresholds below > 0.9.However I did get some false positive rates even at a minimum > threshold > of 0.1.Could you kindly explain the reason? > > > > Is > there any method of finding the optimal threshold,maximizing the true > positive rates while minimizing the false positives,instead of randomly > choosing between 0-0.9? > > > Hi, Susan. The ROC curve IS that method. The ROC curve represents ALL > thresholds as applied to the data. If you plot with show.thresh=TRUE, you > will see the thresholds that were tried and where they are on the curve. > > > If the threshold to which you are referring is the one that you used to > determine the variable you called "state", then we are talking about two > different things. The "truth" variable is meant to be assigned by some > source other than the data themselves. If you do not know the true state > of your samples and find yourself assigning the state the data, then ROC > curve analysis will not be of any use. > > > Sean > > > Thanks in advance, > > Susan. > > > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 2 > Date: Mon, 25 Jan 2010 00:05:18 -0500 > From: Sunny Srivastava <research.baba at="" gmail.com=""> > To: sabrina s <sabrina.shao at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] question about lmFit model > Message-ID: > <85bae9e21001242105x310c5ab1wc81164170b9afc6b at mail.gmail.com> > Content-Type: text/plain > > Dear Sabrina, > Experienced members of the group will have better things to say but here > is > my \$0.25. > As a statistician - I would prefer Design 1. The reason is - that data > should never be ignored. > > Also, more the data, Limma can take more advantage of this information in > the Empirical Bayesian Estimation of S.D. Lower p-values are because of > this > fact. (Taking less data might result in inflated SDs which can also result > in lower p-values.) > > Comparing Differential expression and Fold Change is like comparing Apple > and oranges. Differential expression has nothing to do with low fold > change. > As a statistician, I would always trust differential expression than > Fold-Change. > If you think that fold-change is important for you then you should select > the differentially expressed genes ONLY if their log fold-change is above > say 2. > > you can do this in limma using topTable and/or decideTests. > > Pls correct me if I am wrong. > > Thx > S. > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > >> Hi, Jenny: >> Thanks for the quick reply. And thanks for pointing out about posting. I >> thought maybe my subject was not good enough to be noticed and that is >> why >> I >> posted again. This is my first post, so long way to go! >> Regarding your second point: I don't think my question is a general one >> about why ANOVA is better than a series of t-tests. I actually did both, >> but >> realized that the result from one single model ( use all samples) gave me >> much lower p-values, but when I looked at the expression value, the fold >> change was nothing , like 0.5. That is why I wonder if the inflated DOF >> gave >> me much low p-values. Any thoughts on that? >> >> Thanks! >> >> Sabrina >> >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich at="" illinois.edu="">> >wrote: >> >> > Hi Sabrina, >> > >> > First, a little list ettiquette. If you don't get a response to a post >> > within a day, it's not considered polite to just repost the same >> > question >> > verbatim the next day under a different Subject. >> > >> > Second: your question isn't specific to the modeling of lmFit. Instead, >> > it's a general statistical question about why it's better to one ANOVA >> model >> > instead of a series of t-tests. I suggest you consult a basic >> > statistical >> > textbook or a local statistician to find the answer. >> > >> > Cheers, >> > Jenny >> > >> > >> > At 10:39 AM 1/21/2010, sabrina s wrote: >> > >> >> Hello, everyone: >> >> >> >> I have a question related to conceptual understanding of lmFit. >> >> >> >> I have the following experiment that I want to conduct, but I am not >> sure >> >> which is the right way to use design matrix and contrasts. Here is the >> >> experiment: >> >> >> >> say I have 3 different strains that are genetically different, A, B >> >> and >> C >> >> where A is the control. I also have two different treatments, >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in >> >> total, I have 24 arrays. What I want to find out is the significantly >> >> differentially expressed genes for the following comparison: >> >> 1) for control strain A: T1 vs T2 >> >> 2) under T1, B vs. A (control) >> >> 3) under T1, C vs. A >> >> 4) for B, T1 vs T2 >> >> 5) for C, T1 vs T2 >> >> 6) interaction term of A and B , T1 and T2 >> >> 7) interaction term of A and C, T1 and T2. >> >> >> >> There are two ways I could use lmFit >> >> >> >> One is: >> >> >> >> for the design matrix, I will include all 3 strains and 2 conditions, >> >> I use the following code: >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >> >> sample1: 1 ,0 ,0, 0, 0 , 0 >> >> sample2 : >> >> >> >> Then make a contrast matrix and follow the code below: >> >> >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) >> >> fitGene2<-eBayes(fitGene2,proportion=p); >> >> >> >> >> >> Two: >> >> Instead of using all samples at one time to fit into a lmFit function, >> >> I >> >> use >> >> two design matrix only involves A and B, T1 and T2, >> >> and second design matrix that involves A and C, T1 and T2, and make >> >> contrast >> >> matrix and fit separately. and later on I can compare these two >> >> results if I want to. >> >> >> >> >> >> >> >> The question I have is: which one is the right one? For the first >> method, >> >> I >> >> will have large DOF , and much lower p-values, but it was testing the >> >> same thing as the second one, so am I creating an artifact? Thanks for >> >> your help! >> >> >> >> >> >> >> >> >> >> Sabrina >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich at illinois.edu >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 3 > Date: Mon, 25 Jan 2010 01:32:06 -0500 > From: Chuming Chen <chumingchen at="" gmail.com=""> > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Agilent G4112A Arrays > Message-ID: <4B5D3AE6.3050507 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear All, > > I am trying to find out the differentially expressed genes from some > Agilent Human Whole Genome (G4112A) Arrays data. > > I have tried LIMMA package, but LIMMA gave the error message "no > residual degrees of freedom in linear model fits" and stopped. My guess > is that my data has no replicates in the experiment. > > Is there any other packages I can use to find differentially expressed > genes which does not require replicates in the experiment? > > Thanks for your help. > > Chuming > > > > ------------------------------ > > Message: 4 > Date: Sun, 24 Jan 2010 22:40:12 -0800 (PST) > From: Prashantha Hebbar <prashantha.hebbar at="" yahoo.com=""> > To: bioconductor at stat.math.ethz.ch, Chuming Chen > <chumingchen at="" gmail.com=""> > Subject: Re: [BioC] Agilent G4112A Arrays > Message-ID: <410581.88367.qm at web110108.mail.gq1.yahoo.com> > Content-Type: text/plain > > Dear Chen, > > You need not to look for any other packages. Since, you do not have any > replicates, do not fit linear model, instead just do normalization with in > arrays and look at the M (log ratio) values. > > Regards, > > Prashantha Hebbar Kiradi, > > Dept. of Biotechnology, > > Manipal Life Sciences Center, > > Manipal University, > > Manipal, India > > > > --- On Mon, 1/25/10, Chuming Chen <chumingchen at="" gmail.com=""> wrote: > > From: Chuming Chen <chumingchen at="" gmail.com=""> > Subject: [BioC] Agilent G4112A Arrays > To: bioconductor at stat.math.ethz.ch > Date: Monday, January 25, 2010, 6:32 AM > > Dear All, > > I am trying to find out the differentially expressed genes from some > Agilent Human Whole Genome (G4112A) Arrays data. > > I have tried LIMMA package, but LIMMA gave the error message "no residual > degrees of freedom in linear model fits" and stopped. My guess is that my data has no replicates in the experiment.

Is there any other packages I can use to find differentially expressed genes which does not require replicates in the experiment?

Thanks for your help.

Chuming
