Entering edit mode
Paul Geeleher
★
1.3k
@paul-geeleher-2679
Last seen 10.2 years ago
Hi Vincent,
Thank you for your response. Just letting you know that your advice
was very useful and that with a few minor adjustments I was able to
perform a similar analysis to that I had performed using the KEGG
pathways.
-Paul.
On Wed, Jan 21, 2009 at 4:00 PM, Vincent Carey
<stvjc at="" channing.harvard.edu=""> wrote:
>
> On Wed, Jan 21, 2009 at 6:56 AM, Paul Geeleher <paulgeeleher at="" gmail.com="">
> wrote:
>>
>> Hi All,
>>
>> I've been following the instructions here:
>>
>>
>> http://www.bioconductor.org/workshops/2007/seattle_bioc_intro_nov_0
7/folder.2007-11-30.5595085375/
>>
>> to find dysregulated kegg pathways in a dataset. What I'm now
>> wondering is if I can use the same methodology to find co-regulated
>> genes / genes with common transcription factors?
>>
>> I'd assume its simply of redefining the gene set
>>
>> gsc <- GeneSetCollection(eset, setType = KEGGCollection())
>> to
>> gsc <- GeneSetCollection(eset, setType =
>> CoRegulatedGenesOrSomeFunctionLikeThat())
>>
>>
>> I suppose what I'm asking is if such a gene set exists in
>> Bioconductor? And if not can this be done somewhere else?
>
> GSEABase has infrastructure to import the Broad MSIGDB from its XML
> serialization;
> see http://www.broad.mit.edu/gsea/downloads.jsp, where you will need
to
> register.
>
> If you use getBroadSets() in GSEABase to import the entire MSIGDB
you will
> have access to
> 5452 gene sets. Broad categorizes these in five groups; group c3
includes
> motif gene sets
> which includes a subclass called transcription factor targets.
>
> Digging through a GSEABase GeneSetCollection can proceed in various
ways.
> What I will
> show is probably not the most elegant approach:
>
> Assume you have imported the whole MSIGDB as msig2.5
>
>> isC3 = which(sapply(msig2.5,
function(x)bcCategory(collectionType(x))) ==
>> "c3")
>> C3coll = msig2.5[isC3]
>> C3coll
> GeneSetCollection
> names: RGAGGAARY_V$PU1_Q6, KRCTCNNNNMANAGC_UNKNOWN, ...,
GTTATAT,MIR-410
> (837 total)
> unique identifiers: PCDHGA5, CTXL, ..., pp9099 (15718 total)
> types in collection:
> geneIdType: SymbolIdentifier (1 total)
> collectionType: BroadCollection (1 total)
>> C3coll[[1]]
> setName: RGAGGAARY_V$PU1_Q6
> geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
> geneIdType: Symbol
> collectionType: Broad
> bcCategory: c3 (Motif)
> bcSubCategory: NA
> details: use 'details(object)'
>> details(C3coll[[1]])
> setName: RGAGGAARY_V$PU1_Q6
> geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
> geneIdType: Symbol
> collectionType: Broad
> bcCategory: c3 (Motif)
> bcSubCategory: NA
> setIdentifier: c3:261
> description: Genes with promoter regions [-2kb,2kb] around
transcription
> start site containing the
> motif RGAGGAARY which matches annotation for SPI1: spleen focus
forming
> virus (SFFV) proviral integ
> ration oncogene spi1
> (longDescription available)
> organism: Human,Mouse,Rat,Dog
> pubMedIds:
> urls: msigdb_v2.5.xml
> contributor: Xiaohui Xie
> setVersion: 0.0.1
> creationDate: Thu Jul 10 16:59:23 2008
>
> invocation of the longDescription method against C3coll[[1]] leads
> to an interesting structure that will need to be parsed -- seems to
be
> in a marked up medline format.
>
> once you have found the gene sets you are interested in, GSEABase
> contains additional infrastructure to convert the identifiers for
> genes used in MSIGDB to array probe set identifiers or entrez
identifiers,
> etc.
>
>
>>
>> Thanks.
>>
>> --
>> Paul Geeleher
>> Department of Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland