PreProcess and Limma 'done'. What directions now?

0

Entering edit mode

Massimo Pinto ▴ 390

@massimo-pinto-3396

Last seen 11.5 years ago

Greetings BioC list readers, I have been working over the past few weeks on data normalization first (Agi4X44PreProcess) and linear models then (limma). I sense that I may have reached a point where questions may be addressed regarding regulation of individual genes of interest, as well as gene ontology. Would you please so kindly provide advice with regards to my next step? In brief, I would like to 1) Query expression of genes of my interest, which are known to be implicated in the biology that I am working on 2) Get information about gene ontology 3) Ask about involvement of signaling pahways May be that the goProfiles manual is my next trail? Thank you in advance, Massimo -- Massimo Pinto Post Doctoral Research Fellow Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome http://claimid.com/massimopinto

goProfiles goProfiles • 2.6k views

ADD COMMENT • link updated 16.7 years ago by Steve Lianoglou ★ 13k • written 16.7 years ago by Massimo Pinto ▴ 390

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 8 days ago

United States

Hi Massimo, On Jun 9, 2009, at 9:13 AM, Massimo Pinto wrote: > Greetings BioC list readers, > > I have been working over the past few weeks on data normalization > first (Agi4X44PreProcess) and linear models then (limma). I sense that > I may have reached a point where questions may be addressed regarding > regulation of individual genes of interest, as well as gene ontology. > > Would you please so kindly provide advice with regards to my next > step? In brief, I would like to > > 1) Query expression of genes of my interest, which are known to be > implicated in the biology that I am working on I'm not sure what you're asking to do here, actually, since after doing your normalization/etc. this is the easy part, right? You can just select these rows out of your normalized data matrix, or highlight their probe values on an MA plot, or plot their points on a pdf of your signal intensities. If you're looking for their differential expression calls, you'd just extract their +1/-1 calls post-limma-facto, if you will :-) > 2) Get information about gene ontology There are several package you can check. You've already mentioned goProfiles, but you might also want to consider GOstats as well. There's also topGO. > 3) Ask about involvement of signaling pahways I'm not sure about this one, but searching the bioconductor d/l page for the word "pathway" comes up with several leads, like SPIA and sigPathway. I've never used either of these, but perhaps you can let us know how you like them :-) -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos

ADD COMMENT • link 16.7 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve and all, On Tue, Jun 9, 2009 at 5:15 PM, Steve Lianoglou<mailinglist.honeypot at="" gmail.com=""> wrote: > Hi Massimo, > > On Jun 9, 2009, at 9:13 AM, Massimo Pinto wrote: > >> Greetings BioC list readers, >> >> I have been working over the past few weeks on data normalization >> first (Agi4X44PreProcess) and linear models then (limma). I sense that >> I may have reached a point where questions may be addressed regarding >> regulation of individual genes of interest, as well as gene ontology. >> >> Would you please so kindly provide advice with regards to my next >> step? In brief, I would like to >> >> 1) Query expression of genes of my interest, which are known to be >> implicated in the biology that I am working on > > I'm not sure what you're asking to do here, actually, since after doing your > normalization/etc. this is the easy part, right? > > You can just select these rows out of your normalized data matrix, or > highlight their probe values on an MA plot, or plot their points on a pdf of > your signal intensities. If you're looking for their differential expression > calls, you'd just extract their +1/-1 calls post-limma-facto, if you will > :-) I have beem through normalization and basic limma operations. What I would like to do here is to ask questions like "what happened to this and that particular gene"? These would be genes whose expression was already measured via other techiniques (Rt-PCR as well as Western Blotting of the proteins that they encode). Ideally, I would like to produce a mini-list of such genes and have their Gene ID displayed (in tables and any graphs) in place of their Agilent Probe ID. Should be possible, since I have an annotation file. > mappings[1:10,1:4] PROBE ACCNUM SYMBOL ENTREZID 1 A_24_P66027 NM_004900 APOBEC3B 9582 2 A_32_P77178 AA085955 ADAR 103 3 A_23_P212522 NM_014616 ATP11B 23200 4 A_24_P934473 AK092846 LOC100132006 100132006 5 A_24_P9671 NM_001539 DNAJA1 3301 6 A_24_P801451 NM_006709 EHMT2 10919 7 A_32_P30710 NM_000978 RPL23 9349 8 A_24_P704878 NA NA NA 9 A_32_P86028 NM_001017 RPS13 6207 10 A_23_P65830 NM_198527 HDDC3 374659 >> 2) Get information about gene ontology > > There are several package you can check. You've already mentioned > goProfiles, but you might also want to consider GOstats as well. There's > also topGO. thank you. > >> 3) Ask about involvement of signaling pahways > > I'm not sure about this one, but searching the bioconductor d/l page for the > word "pathway" comes up with several leads, like SPIA and sigPathway. > > I've never used either of these, but perhaps you can let us know how you > like them :-) Will do! Thank you. Massimo > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > http://cbio.mskcc.org/~lianos > > > > -- Massimo Pinto Post Doctoral Research Fellow Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome http://claimid.com/massimopinto

ADD REPLY • link 16.7 years ago Massimo Pinto ▴ 390

0

Entering edit mode

On Wed, Jun 10, 2009 at 5:44 AM, Massimo Pinto<pintarello at="" gmail.com=""> wrote: > Hi Steve and all, > > On Tue, Jun 9, 2009 at 5:15 PM, Steve > Lianoglou<mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi Massimo, >> >> On Jun 9, 2009, at 9:13 AM, Massimo Pinto wrote: >> >>> Greetings BioC list readers, >>> >>> I have been working over the past few weeks on data normalization >>> first (Agi4X44PreProcess) and linear models then (limma). I sense that >>> I may have reached a point where questions may be addressed regarding >>> regulation of individual genes of interest, as well as gene ontology. >>> >>> Would you please so kindly provide advice with regards to my next >>> step? In brief, I would like to >>> >>> 1) Query expression of genes of my interest, which are known to be >>> implicated in the biology that I am working on >> >> I'm not sure what you're asking to do here, actually, since after doing your >> normalization/etc. this is the easy part, right? >> >> You can just select these rows out of your normalized data matrix, or >> highlight their probe values on an MA plot, or plot their points on a pdf of >> your signal intensities. If you're looking for their differential expression >> calls, you'd just extract their +1/-1 calls post-limma-facto, if you will >> :-) > > I have beem through normalization and basic limma operations. What I > would like to do here is to ask questions like "what happened to this > and that particular gene"? These would be genes whose expression was > already measured via other techiniques (Rt-PCR as well as Western > Blotting of the proteins that they encode). Ideally, I would like to > produce a mini-list of such genes and have their Gene ID displayed (in > tables and any graphs) in place of their Agilent Probe ID. Should be > possible, since I have an annotation file. > >> mappings[1:10,1:4] > > ? ? ? ? ?PROBE ? ?ACCNUM ? ? ? SYMBOL ?ENTREZID > 1 ? A_24_P66027 NM_004900 ? ? APOBEC3B ? ? ?9582 > 2 ? A_32_P77178 ?AA085955 ? ? ? ? ADAR ? ? ? 103 > 3 ?A_23_P212522 NM_014616 ? ? ? ATP11B ? ? 23200 > 4 ?A_24_P934473 ?AK092846 LOC100132006 100132006 > 5 ? ?A_24_P9671 NM_001539 ? ? ? DNAJA1 ? ? ?3301 > 6 ?A_24_P801451 NM_006709 ? ? ? ?EHMT2 ? ? 10919 > 7 ? A_32_P30710 NM_000978 ? ? ? ?RPL23 ? ? ?9349 > 8 ?A_24_P704878 ? ? ? ?NA ? ? ? ? ? NA ? ? ? ?NA > 9 ? A_32_P86028 NM_001017 ? ? ? ?RPS13 ? ? ?6207 > 10 ?A_23_P65830 NM_198527 ? ? ? ?HDDC3 ? ?374659 It seems to me that you have two questions -- one relating to annotation and one relating to statistical reporting on selected genes Apropos annotation: It should not be necessary to use this table. I haven't used Agilent but I see from the Agi4* vignette that -Agi4x44PreProcess employees the corresponding Bioconductor annotation -packages (human: "hgug4112a.db"; mouse: "mgug4122a.db") to assign -to each probe the ACCNUM, SYMBOL, ENTREZID, DESCRIPTION, GO TERMS AND GO IDS. therefore you have access to annotation through queries resolved by AnnotationDbi. there are various ways to do this; i will review three a) direct queries: once you have installed hgug4112a.db, run hgug4112a() to see what mappings are available. To obtain the gene symbol and entrez id corresponding to a given probe, use > library(hgug4112a.db) Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: DBI > get("A_24_P66027", hgug4112aENTREZID) [1] "9582" > get("A_24_P66027", hgug4112aSYMBOL) [1] "APOBEC3B" b) the GSEABase infrastructure for gene set management We can use this to do bulk translations > f4 = c("A_24_P66027", "A_32_P77178", "A_23_P212522", "A_24_P934473") > s4 = GeneSet(f4, geneIdType=AnnotationIdentifier("hgug4112a.db")) > s4 setName: NA geneIds: A_24_P66027, A_32_P77178, A_23_P212522, A_24_P934473 (total: 4) geneIdType: Annotation (hgug4112a.db) collectionType: Null details: use 'details(object)' > s4b = s4 > geneIdType(s4b) = SymbolIdentifier() > s4b setName: NA geneIds: APOBEC3B, ADAR, ATP11B, LOC100132006 (total: 4) geneIdType: Symbol (hgug4112a.db) collectionType: Null details: use 'details(object)' c) the unfortunately named annaffy package -- which will create a nice HTML page with annotations. using t4 vector defined above > library(annaffy) Loading required package: GO.db Loading required package: KEGG.db > t4 = aafTableAnn(f4, "hgug4112a.db") > saveHTML(t4, file="t4.html") point your browser to t4.html and you will see a very informative hyperlinked page -- with a misleading default title, but you can change that the second part of your question concerns statistical reporting -- you have used limma, so the topTable output can be used to obtain statistics on selected genes, but you have to choose n suitably so that the genes you are interested in are on the list it is possible to enhance the HTML page generated above with numerical and visual additions that reflect the statistical findings related to genes. these steps are described in the Bioconductor monograph of 2005, chapter by Colin Smith, or the annaffy vignette. > >>> 2) Get information about gene ontology >> >> There are several package you can check. You've already mentioned >> goProfiles, but you might also want to consider GOstats as well. There's >> also topGO. > > thank you. > >> >>> 3) Ask about involvement of signaling pahways >> >> I'm not sure about this one, but searching the bioconductor d/l page for the >> word "pathway" comes up with several leads, like SPIA and sigPathway. >> >> I've never used either of these, but perhaps you can let us know how you >> like them :-) > > Will do! > Thank you. > Massimo > > >> -- >> Steve Lianoglou >> Graduate Student: Physiology, Biophysics and Systems Biology >> Weill Medical College of Cornell University >> >> http://cbio.mskcc.org/~lianos >> >> >> >> > > > > -- > Massimo Pinto > Post Doctoral Research Fellow > Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome > http://claimid.com/massimopinto > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Vincent Carey, PhD Biostatistics, Channing Lab 617 525 2265

ADD REPLY • link 16.7 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Hi Massimo >I have beem through normalization and basic limma operations. Does this mean you have extracted lists of genes using topTable? You can add your annotation to these lists of genes using the "genelist" parameter of topTable. Something like "genelist=cbind(fit$genes, mappings)". Assuming your mappings are in the same order as your fit data. >What I would like to do here is to ask questions like "what happened to this >and that particular gene"? You can extract any gene you like from a topTable gene list. Along with the expression/p-values etc. You can extract the actual expression values using the Affymetrix probe name or your gene annotation name. > in place of their Agilent Probe ID. Should be > possible, since I have an annotation file. Yes. Regards John

ADD REPLY • link 16.7 years ago john seers IFR ▴ 810

0

Entering edit mode

Hi John, On Wed, Jun 10, 2009 at 12:15 PM, john seers (IFR)<john.seers at="" bbsrc.ac.uk=""> wrote: > > Hi Massimo > >>I have beem through normalization and basic limma operations. > > Does this mean you have extracted lists of genes using topTable? You can > add your annotation to these lists of genes using the "genelist" > parameter of topTable. Something like "genelist=cbind(fit$genes, > mappings)". Assuming your mappings are in the same order as your fit > data. > yes they seem so > fit2.Cn.eb$genes[1:10,1] [1] "A_24_P66027" "A_32_P77178" "A_23_P212522" "A_24_P934473" "A_24_P9671" "A_24_P801451" "A_32_P30710" "A_24_P704878" "A_32_P86028" [10] "A_23_P65830" > mappings[1:10,1] [1] A_24_P66027 A_32_P77178 A_23_P212522 A_24_P934473 A_24_P9671 A_24_P801451 A_32_P30710 A_24_P704878 A_32_P86028 A_23_P65830 25574 Levels: A_23_P100001 A_23_P100011 A_23_P100022 A_23_P100074 A_23_P100092 A_23_P100103 A_23_P100111 A_23_P100127 ... A_32_P99933 therefore using your suggestion, topTable() nicely returns > toptable(fit2.Cn.eb,genelist=cbind(mappings$DESCRIPTION, fit2.Cn.eb$genes), number=5) mappings.DESCRIPTION ID logFC t P.Value adj.P.Val B 14442 fatty acid binding protein 4, adipocyte A_23_P8812 -3.055038 -17.02303 7.819269e-14 1.999700e-09 18.06843 15172 beta-site APP-cleaving enzyme 2 A_24_P14584 -2.287226 -15.25732 6.721762e-13 6.639688e-09 16.72509 8406 hypothetical protein MGC10981 A_32_P178092 -1.899104 -15.14233 7.788795e-13 6.639688e-09 16.62917 3390 beta-site APP-cleaving enzyme 2 A_23_P154875 -2.006084 -14.41987 2.009131e-12 1.284538e-08 16.00055 4688 forkhead box N3 A_23_P140405 -1.175783 -13.22849 1.047477e-11 5.357635e-08 14.85805 >>What I would like to do here is to ask questions like "what happened to > this >>and that particular gene"? > > You can extract any gene you like from a topTable gene list. Along with > the expression/p-values etc. You can extract the actual expression > values using the Affymetrix probe name or your gene annotation name. > to tackle this I will look more closely into the suggestion written by Vincent (whom I would like to thank here, for a start). Thanks to you too. Massimo [...]

ADD REPLY • link 16.7 years ago Massimo Pinto ▴ 390

0

Entering edit mode

Hello John On Wed, Jun 10, 2009 at 12:15 PM, john seers (IFR)<john.seers at="" bbsrc.ac.uk=""> wrote: > > Hi Massimo > >>I have beem through normalization and basic limma operations. > > Does this mean you have extracted lists of genes using topTable? You can > add your annotation to these lists of genes using the "genelist" > parameter of topTable. Something like "genelist=cbind(fit$genes, > mappings)". Assuming your mappings are in the same order as your fit > data. > >>What I would like to do here is to ask questions like "what happened to > this >>and that particular gene"? > > You can extract any gene you like from a topTable gene list. Along with > the expression/p-values etc. You can extract the actual expression > values using the Affymetrix probe name or your gene annotation name. > I have played around with topTable() and topTableF() and I am getting to knowing it better now. However, I am missing the point you are making here above. I cannot see how do you pass topTable() info with regards to which genes or which set of genes you wish to know about. I am assuming that topTable() was designed for this purpose, but since this is coming towards interrogating user-specified sets of genes, I may be wrong. Cheers Massimo

ADD REPLY • link 16.7 years ago Massimo Pinto ▴ 390

0

Entering edit mode

Hi Massimo You are probably reading too much into what I said. I was just trying to say to you that you can combine your annotation information with the Affymetrix probes and the limma output. And Vincent gave you information on how you can search the annotation data. >>What I would like to do here is to ask questions like "what happened to this >>and that particular gene"? Without this question being more definite it is not easy to predict what you need to do. What I was saying is you can search on your expression values by Affymetrix probe (or Annotation id), or on your toptables, to extract any gene or genes of interest. That defines what has happened to this and that particular gene. What you want to do further is an open question. Are you asking how to search for a gene in a toptable? Something like: geneofinterest<-"myaffymetrixprobe" myrows<-which(mytoptable[, "correctcolumnname"] == geneofinterest) mygeneofinterest<-mytoptable[myrows,] Sorry if I do not understand what you need to know. Regards John -----Original Message----- From: Massimo Pinto [mailto:pintarello@gmail.com] Sent: 11 June 2009 11:29 To: john seers (IFR) Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] PreProcess and Limma 'done'. What directions now? Hello John On Wed, Jun 10, 2009 at 12:15 PM, john seers (IFR)<john.seers at="" bbsrc.ac.uk=""> wrote: > > Hi Massimo > >>I have beem through normalization and basic limma operations. > > Does this mean you have extracted lists of genes using topTable? You can > add your annotation to these lists of genes using the "genelist" > parameter of topTable. Something like "genelist=cbind(fit$genes, > mappings)". Assuming your mappings are in the same order as your fit > data. > >>What I would like to do here is to ask questions like "what happened to > this >>and that particular gene"? > > You can extract any gene you like from a topTable gene list. Along with > the expression/p-values etc. You can extract the actual expression > values using the Affymetrix probe name or your gene annotation name. > I have played around with topTable() and topTableF() and I am getting to knowing it better now. However, I am missing the point you are making here above. I cannot see how do you pass topTable() info with regards to which genes or which set of genes you wish to know about. I am assuming that topTable() was designed for this purpose, but since this is coming towards interrogating user-specified sets of genes, I may be wrong. Cheers Massimo

ADD REPLY • link 16.7 years ago john seers IFR ▴ 810

0

Entering edit mode

John, you ended up getting very close to were I needed to be. To compare microarray results to previous data which I had got using other techniques to measure gene expression, I was asked by a colleague to report the micraorray expression levels of selected genes of our interest. much inspired by what you did, I proceed as follows: First I created a vector of probeID: > GOI.redox # vector of genes of interest in the context: redox [1] "A_23_P150281" "A_24_P89457" "A_23_P3038" "A_23_P133474" "A_23_P28075" "A_23_P214544" "A_23_P250671" "A_32_P78121" "A_23_P154840" # "A_23_P105138" then, for a given topTable such as > t0Cn.Vs.6moACn.topTable <- topTable(fit2.Cn.eb, coef=1, genelist=cbind(mappings$DESCRIPTION, fit2.Cn.eb$genes), number=Inf, adjust="BH") I defined a loop to find where all those probes are in the topTable >for (i in 0:length(GOI.redox)) { + myrows[i] <- which(t0Cn.Vs.6moACn.topTable[, "ID"] == GOI.redox[i]) + } > t0Cn.Vs.6moACn.topTable[myrows,] # outputs a specified portion of your topTable with the genes that you wanted. that does it. May be not the most elegant syntax, as I suppose one may well do without a loop, but it does just what I need to show my colleague. Cheers, once again! Massimo On Thu, Jun 11, 2009 at 1:02 PM, john seers (IFR)<john.seers at="" bbsrc.ac.uk=""> wrote: > > Hi Massimo > > You are probably reading too much into what I said. I was just trying to > say to you that you can combine your annotation information with the > Affymetrix probes and the limma output. And Vincent gave you information > on how you can search the annotation data. > >>>What I would like to do here is to ask questions like "what happened > to this >>>and that particular gene"? > > Without this question being more definite it is not easy to predict what > you need to do. What I was saying is you can search on your expression > values by Affymetrix probe (or Annotation id), or on your toptables, to > extract any gene or genes of interest. That defines what has happened to > this and that particular gene. What you want to do further is an open > question. > > Are you asking how to search for a gene in a toptable? Something like: > > geneofinterest<-"myaffymetrixprobe" > myrows<-which(mytoptable[, "correctcolumnname"] == geneofinterest) > mygeneofinterest<-mytoptable[myrows,] > > > Sorry if I do not understand what you need to know. > > Regards > > John > > > > > -----Original Message----- > From: Massimo Pinto [mailto:pintarello at gmail.com] > Sent: 11 June 2009 11:29 > To: john seers (IFR) > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] PreProcess and Limma 'done'. What directions now? > > Hello John > > On Wed, Jun 10, 2009 at 12:15 PM, john seers > (IFR)<john.seers at="" bbsrc.ac.uk=""> wrote: >> >> Hi Massimo >> >>>I have beem through normalization and basic limma operations. >> >> Does this mean you have extracted lists of genes using topTable? You > can >> add your annotation to these lists of genes using the "genelist" >> parameter of topTable. Something like "genelist=cbind(fit$genes, >> mappings)". Assuming your mappings are in the same order as your fit >> data. >> >>>What I would like to do here is to ask questions like "what happened > to >> this >>>and that particular gene"? >> >> You can extract any gene you like from a topTable gene list. Along > with >> the expression/p-values etc. You can extract the actual expression >> values using the Affymetrix probe name or your gene annotation name. >> > > I have played around with topTable() and topTableF() and I am getting > to knowing it better now. However, I am missing the point you are > making here above. I cannot see how do you pass topTable() info with > regards to which genes or which set of genes you wish to know about. > > I am assuming that topTable() was designed for this purpose, but since > this is coming towards interrogating user-specified sets of genes, I > may be wrong. > > Cheers > Massimo > -- Massimo Pinto Post Doctoral Research Fellow Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome http://claimid.com/massimopinto

ADD REPLY • link 16.7 years ago Massimo Pinto ▴ 390

Login before adding your answer.