Bioconductor Digest, Vol 83, Issue 23
0
0
Entering edit mode
@m-carmen-ruiz-de-villa-3906
Last seen 7.1 years ago
Josep El teu exercici es va penjar b? a la uoc, el que no havia rebut eren els missatges que semblava que havies enviat. Si de cas durant un temps intentar? confirmar-te que he rebut el que enviis a la uoc i aix? verificarem que no hi ha cap problema. Salutacions M. Carme ----- Original Message ----- From: <bioconductor-request@stat.math.ethz.ch> To: <bioconductor at="" stat.math.ethz.ch=""> Sent: Monday, January 25, 2010 12:00 PM Subject: Bioconductor Digest, Vol 83, Issue 23 > Send Bioconductor mailing list submissions to > bioconductor at stat.math.ethz.ch > > To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/bioconductor > or, via email, send a message with subject or body 'help' to > bioconductor-request at stat.math.ethz.ch > > You can reach the person managing the list at > bioconductor-owner at stat.math.ethz.ch > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioconductor digest..." > > > Today's Topics: > > 1. Re: Seeking assistance on ROC (Susan Bosco) > 2. Re: question about lmFit model (Sunny Srivastava) > 3. Agilent G4112A Arrays (Chuming Chen) > 4. Re: Agilent G4112A Arrays (Prashantha Hebbar) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 25 Jan 2010 09:25:13 +0530 (IST) > From: Susan Bosco <susanbosco86 at="" yahoo.com=""> > To: Sean Davis <seandavi at="" gmail.com=""> > Cc: prashantha hebbar <prashantha.hebbar at="" manipal.edu="">, > bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Seeking assistance on ROC > Message-ID: <818904.90406.qm at web95305.mail.in2.yahoo.com> > Content-Type: text/plain > > Dear Sean, > > Thanks again. > > I corrected the script changing the value of 'truth' variable with > rbinom() function. Since my data size is quite large(data is of 244K),I > tried with the first 200,for which I was able to find proper ROC curve. > However, when I include the complete data, the plot changes. For the whole > data,I get > a linear graph with small variations. > > My sessionInfo() looks like this: > For 100 values of the data: > library(ROC) > load("RGKma.RData") > state= rbinom(length(RGKma$M[1:100,3]),1,0.33) > data = RGKma$M[1:200,3] > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > pdf("ROCk.pdf") > plot(R1, show.thresh=TRUE,col = "red") > dev.off() > > For the complete data: > library(ROC) > load("RGKma.RData") > state= rbinom(length(RGKma$M[,3]),1,0.33) > data = RGKma$M[,3] > R1<-rocdemo.sca(truth=state,data,dxrule.sca) > pdf("ROCallk.pdf") > plot(R1, show.thresh=TRUE,col = "red") > dev.off() > > I would appreciate if you could > help me out with this problem that I encountered with a large data size. > > Thanking you sincerely, > Susan. > > > --- On Wed, 20/1/10, Sean Davis <seandavi at="" gmail.com=""> wrote: > > From: Sean Davis <seandavi at="" gmail.com=""> > Subject: Re: [BioC] Seeking assistance on ROC > To: "Susan Bosco" <susanbosco86 at="" yahoo.com=""> > Cc: bioconductor at stat.math.ethz.ch, "prashantha hebbar" > <prashantha.hebbar at="" manipal.edu=""> > Date: Wednesday, 20 January, 2010, 12:05 PM > > > > On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86 at="" yahoo.com=""> > wrote: > > > Dear > Sean, > > Thank you so much for the help. > > > I tried with a range of thresholds from 0-0.9..As you had mentioned,the > true positive rates no doubt increased with thresholds below > 0.9.However I did get some false positive rates even at a minimum > threshold > of 0.1.Could you kindly explain the reason? > > > > Is > there any method of finding the optimal threshold,maximizing the true > positive rates while minimizing the false positives,instead of randomly > choosing between 0-0.9? > > > Hi, Susan. The ROC curve IS that method. The ROC curve represents ALL > thresholds as applied to the data. If you plot with show.thresh=TRUE, you > will see the thresholds that were tried and where they are on the curve. > > > If the threshold to which you are referring is the one that you used to > determine the variable you called "state", then we are talking about two > different things. The "truth" variable is meant to be assigned by some > source other than the data themselves. If you do not know the true state > of your samples and find yourself assigning the state the data, then ROC > curve analysis will not be of any use. > > > Sean > > > Thanks in advance, > > Susan. > > > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > > > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 2 > Date: Mon, 25 Jan 2010 00:05:18 -0500 > From: Sunny Srivastava <research.baba at="" gmail.com=""> > To: sabrina s <sabrina.shao at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] question about lmFit model > Message-ID: > <85bae9e21001242105x310c5ab1wc81164170b9afc6b at mail.gmail.com> > Content-Type: text/plain > > Dear Sabrina, > Experienced members of the group will have better things to say but here > is > my $0.25. > As a statistician - I would prefer Design 1. The reason is - that data > should never be ignored. > > Also, more the data, Limma can take more advantage of this information in > the Empirical Bayesian Estimation of S.D. Lower p-values are because of > this > fact. (Taking less data might result in inflated SDs which can also result > in lower p-values.) > > Comparing Differential expression and Fold Change is like comparing Apple > and oranges. Differential expression has nothing to do with low fold > change. > As a statistician, I would always trust differential expression than > Fold-Change. > If you think that fold-change is important for you then you should select > the differentially expressed genes ONLY if their log fold-change is above > say 2. > > you can do this in limma using topTable and/or decideTests. > > Pls correct me if I am wrong. > > Thx > S. > > On Thu, Jan 21, 2010 at 1:32 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > >> Hi, Jenny: >> Thanks for the quick reply. And thanks for pointing out about posting. I >> thought maybe my subject was not good enough to be noticed and that is >> why >> I >> posted again. This is my first post, so long way to go! >> Regarding your second point: I don't think my question is a general one >> about why ANOVA is better than a series of t-tests. I actually did both, >> but >> realized that the result from one single model ( use all samples) gave me >> much lower p-values, but when I looked at the expression value, the fold >> change was nothing , like 0.5. That is why I wonder if the inflated DOF >> gave >> me much low p-values. Any thoughts on that? >> >> Thanks! >> >> Sabrina >> >> On Thu, Jan 21, 2010 at 12:05 PM, Jenny Drnevich <drnevich at="" illinois.edu="">> >wrote: >> >> > Hi Sabrina, >> > >> > First, a little list ettiquette. If you don't get a response to a post >> > within a day, it's not considered polite to just repost the same >> > question >> > verbatim the next day under a different Subject. >> > >> > Second: your question isn't specific to the modeling of lmFit. Instead, >> > it's a general statistical question about why it's better to one ANOVA >> model >> > instead of a series of t-tests. I suggest you consult a basic >> > statistical >> > textbook or a local statistician to find the answer. >> > >> > Cheers, >> > Jenny >> > >> > >> > At 10:39 AM 1/21/2010, sabrina s wrote: >> > >> >> Hello, everyone: >> >> >> >> I have a question related to conceptual understanding of lmFit. >> >> >> >> I have the following experiment that I want to conduct, but I am not >> sure >> >> which is the right way to use design matrix and contrasts. Here is the >> >> experiment: >> >> >> >> say I have 3 different strains that are genetically different, A, B >> >> and >> C >> >> where A is the control. I also have two different treatments, >> >> T1 and T2. For each strain, I have 4 arrays for each treatment, so in >> >> total, I have 24 arrays. What I want to find out is the significantly >> >> differentially expressed genes for the following comparison: >> >> 1) for control strain A: T1 vs T2 >> >> 2) under T1, B vs. A (control) >> >> 3) under T1, C vs. A >> >> 4) for B, T1 vs T2 >> >> 5) for C, T1 vs T2 >> >> 6) interaction term of A and B , T1 and T2 >> >> 7) interaction term of A and C, T1 and T2. >> >> >> >> There are two ways I could use lmFit >> >> >> >> One is: >> >> >> >> for the design matrix, I will include all 3 strains and 2 conditions, >> >> I use the following code: >> >> A_T1, A_T2, B_T1, B_T2, C_T1, C_T2 >> >> sample1: 1 ,0 ,0, 0, 0 , 0 >> >> sample2 : >> >> >> >> Then make a contrast matrix and follow the code below: >> >> >> >> fitGene<-lmFit(gene,design=design,weights=arrayWt); >> >> fitGene2<-contrasts.fit(fitGene,cont.matrix) >> >> fitGene2<-eBayes(fitGene2,proportion=p); >> >> >> >> >> >> Two: >> >> Instead of using all samples at one time to fit into a lmFit function, >> >> I >> >> use >> >> two design matrix only involves A and B, T1 and T2, >> >> and second design matrix that involves A and C, T1 and T2, and make >> >> contrast >> >> matrix and fit separately. and later on I can compare these two >> >> results if I want to. >> >> >> >> >> >> >> >> The question I have is: which one is the right one? For the first >> method, >> >> I >> >> will have large DOF , and much lower p-values, but it was testing the >> >> same thing as the second one, so am I creating an artifact? Thanks for >> >> your help! >> >> >> >> >> >> >> >> >> >> Sabrina >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > Jenny Drnevich, Ph.D. >> > >> > Functional Genomics Bioinformatics Specialist >> > W.M. Keck Center for Comparative and Functional Genomics >> > Roy J. Carver Biotechnology Center >> > University of Illinois, Urbana-Champaign >> > >> > 330 ERML >> > 1201 W. Gregory Dr. >> > Urbana, IL 61801 >> > USA >> > >> > ph: 217-244-7355 >> > fax: 217-265-5066 >> > e-mail: drnevich at illinois.edu >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > > > ------------------------------ > > Message: 3 > Date: Mon, 25 Jan 2010 01:32:06 -0500 > From: Chuming Chen <chumingchen at="" gmail.com=""> > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Agilent G4112A Arrays > Message-ID: <4B5D3AE6.3050507 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear All, > > I am trying to find out the differentially expressed genes from some > Agilent Human Whole Genome (G4112A) Arrays data. > > I have tried LIMMA package, but LIMMA gave the error message "no > residual degrees of freedom in linear model fits" and stopped. My guess > is that my data has no replicates in the experiment. > > Is there any other packages I can use to find differentially expressed > genes which does not require replicates in the experiment? > > Thanks for your help. > > Chuming > > > > ------------------------------ > > Message: 4 > Date: Sun, 24 Jan 2010 22:40:12 -0800 (PST) > From: Prashantha Hebbar <prashantha.hebbar at="" yahoo.com=""> > To: bioconductor at stat.math.ethz.ch, Chuming Chen > <chumingchen at="" gmail.com=""> > Subject: Re: [BioC] Agilent G4112A Arrays > Message-ID: <410581.88367.qm at web110108.mail.gq1.yahoo.com> > Content-Type: text/plain > > Dear Chen, > > You need not to look for any other packages. Since, you do not have any > replicates, do not fit linear model, instead just do normalization with in > arrays and look at the M (log ratio) values. > > Regards, > > Prashantha Hebbar Kiradi, > > Dept. of Biotechnology, > > Manipal Life Sciences Center, > > Manipal University, > > Manipal, India > > > > --- On Mon, 1/25/10, Chuming Chen <chumingchen at="" gmail.com=""> wrote: > > From: Chuming Chen <chumingchen at="" gmail.com=""> > Subject: [BioC] Agilent G4112A Arrays > To: bioconductor at stat.math.ethz.ch > Date: Monday, January 25, 2010, 6:32 AM > > Dear All, > > I am trying to find out the differentially expressed genes from some > Agilent Human Whole Genome (G4112A) Arrays data. > > I have tried LIMMA package, but LIMMA gave the error message "no residual > degrees of freedom in linear model fits" and stopped. My guess is that my > data has no replicates in the experiment. > > Is there any other packages I can use to find differentially expressed > genes which does not require replicates in the experiment? > > Thanks for your help. > > Chuming > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > > ------------------------------ > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > End of Bioconductor Digest, Vol 83, Issue 23 > ******************************************** >
Bayesian ROC graph limma Bayesian ROC graph limma • 701 views
ADD COMMENT

Login before adding your answer.

Traffic: 167 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6