GOstats gene set size selection

0

Entering edit mode

alex lam RI ▴ 30

@alex-lam-ri-2752

Last seen 9.7 years ago

Hi Sean and other BioC users, Thanks for the replies a couple of weeks ago. Now I am trying to use Category as suggested and I think the underlying principles are better than Gostats for what I want to do, especially that I don't have to use an arbitary threshold on my test statistics to select a subset of genes. I followed the code in the vignette of Category until the matrix Z gets divided by sqrt(rowSums). Because what I am doing is an eQTL genome scan, at any one position I have the likelihood ratio test statistics for all probesets rather than two-sample t-statistics. I read in the vignette that X should be approximately normal. So, I figure that maybe I should standardize the likelihood ratio statistics to z-scores before multiplying with the adjacency matrix. Is it the correct thing to do? for(cM in 1:lengthOfGenome) { lrt <- LRT[expressedAffyIds, cM] # ... filter out duplicates entrezGenes and create adjacency matrix ... z.score <- (lrt - mean(lrt)) / sd(lrt) tA <- AmER2 %*% z.score tA <- tA / sqrt(rs2) names(tA) <- row.names(AmER2) qqnorm(tA) } Cheers, Alex -----Original Message----- From: Sean MacEachern [mailto:sean.maceach@gmail.com] Sent: 17 April 2008 17:07 To: alex lam (RI); bioconductor at stat.math.ethz.ch Subject: Re: [BioC] GOstats gene set size selection Hi Alex, I'm not too sure if this helps with your question, but I'll put my two cents in... I am working with chickens and trying to create a large list of genes for an eQTL study from an initial simple microarray design that compares resistant vs susceptible birds, due to the small number of genes that I have found with differential expression I have attempted to increase the size of my list by examining significant GO terms. Most of the GO terms I have pulled out using hyperGTest are not very helpful due to their breadth. I have found the Category package a little more helpful. Kegg pathways are a little more specific and you can create an adjacency matrix and use the rowSums() command to filter your dataset. I think you can also treat GO terms as categories if you need to. It might be a little of topic, but it could be worth looking at. Cheers, Sean On 4/17/08 7:28 AM, "alex lam (RI)" <alex.lam at="" roslin.ed.ac.uk=""> wrote: > Dear colleagues, > > I have been following the GOstats vignette to test GO terms association. > I would like to know whether it is possible to set limits on the > number of selected genes in GO term and the size of that term on my affy chip? > > For example, can I tell hyperGTest to skip testing a GO term if the > number of significant genes in that term is under, say, 3, or if there > are more than 400 genes of that GO term on the chip? > > Currently I found many of my significant GO terms not very specific. > As I am trying to incorporate GOstats to an expression QTL (eQTL) > genome scan, I get a lot of output. Therefore, ideally I would like to > filter out these terms before test rather than screening the results > after test. Is there such an option with hyperGTest? > > Many thanks for your advice, > Alex > >> sessionInfo() > R version 2.6.2 Patched (2008-03-24 r44882) x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US > .U > TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UT > F- > 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ > ID > ENTIFICATION=C > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] GOstats_2.4.0 Category_2.4.0 genefilter_1.16.0 > [4] survival_2.34 RBGL_1.14.0 annotate_1.16.1 > [7] xtable_1.5-2 GO.db_2.0.2 AnnotationDbi_1.0.6 > [10] RSQLite_0.6-8 DBI_0.2-4 Biobase_1.16.3 > [13] graph_1.16.1 > > loaded via a namespace (and not attached): > [1] cluster_1.11.10 >> > > -------------------------------------------- > Alex C. Lam > Roslin Institute (Edinburgh) > Midlothian > EH25 9PS > United Kingdom > Tel: +44 131 5274471 > > Former email address: alex.lam at bbsrc.ac.uk New email address: > alex.lam at roslin.ed.ac.uk Both addresses are functional > > Roslin Institute is a company limited by guarantee, registered in > Scotland (registered number SC157100) and a Scottish Charity > (registered number SC023592). Our registered office is at Roslin, > Midlothian, EH25 9PS. VAT registration number 847380013. > > The information contained in this e-mail (including any attachments) is > confidential and is intended for the use of the addressee only. The > opinions expressed within this e-mail (including any attachments) are > the opinions of the sender and do not necessarily constitute those of > Roslin Institute (Edinburgh) ("the Institute") unless specifically > stated by a sender who is duly authorised to do so on behalf of the > Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

Microarray GO GOstats Category Microarray GO GOstats Category • 769 views

ADD COMMENT • link 16.0 years ago alex lam RI ▴ 30

Login before adding your answer.