Dear List
I want to do GO analysis on my microarray results, and have not done
this before. We have a cDNA array for a non-model organism. The
manufacturers of the array have provided annotations, so I have
Accesision number, gene description, gene synonyms, EC,
molecular_function, biological_process, cellular_component, InterPro,
KEGG, Pfam, EMBL, Ensembl, UniGene, RefSeq, PROSITE, GeneId, org,
and more in a tab delimited txt file.
so I suppose I have all the information I need, how can I use this
with the bioconductor packages?
I have looked at the vignette for SQLForge in the AnnotationDbi
package as suggested on this list before, but as it says "At the
present time, it is possible to make annotation packages for the
most common model organisms" I don't know how to proceed.
Best regards
Ingunn
Hi Ingunn
you could check out the Category package, which has tools for
detecting
association between gene annotation categories and differential
expression contrasts - see its vignette.
Best wishes
Wolfgang
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
-------------------------------------------------------
Berget wrote:
> Dear List
>
> I want to do GO analysis on my microarray results, and have not done
this before. We have a cDNA array for a non-model organism. The
manufacturers of the array have provided annotations, so I have
> Accesision number, gene description, gene synonyms, EC,
molecular_function, biological_process, cellular_component, InterPro,
KEGG, Pfam, EMBL, Ensembl, UniGene, RefSeq, PROSITE, GeneId, org,
> and more in a tab delimited txt file.
>
> so I suppose I have all the information I need, how can I use this
with the bioconductor packages?
>
> I have looked at the vignette for SQLForge in the AnnotationDbi
package as suggested on this list before, but as it says "At the
present time, it is possible to make annotation packages for the
> most common model organisms" I don't know how to proceed.
>
> Best regards
> Ingunn
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Ingunn,
First you should determine whether or not your organism is one of our
supported organisms. Because you claim it is a non-model organism I
suspect it might not be, but it's still worth determining this first.
If it is, then you should be able to get an organism level package
from
our respository and use GOstats in a typical manner. Determining if
it
is should be straightforward for you. You can simply call the
available.dbschemas() function in the AnnotationDbi package to
determine
if your organism is supported by a schema. If it is not, we have a
new
workaround for you that will work with the latest versions of the
AnnoationDbi, GSEABase, GOstats and Category, packages which are
presently in our development branch.
Since I suspect you will need the latter strategy, below is an example
of how you should be able to proceed. It is very similiar to how you
would use the GOstats package traditionally, and you should probably
read the vignette for that package before attempting this for a more
detailed explanation. Please note that in the following example
"frameData" is a data.frame object with 3 cols set to be GO IDs,
evidence codes and gene IDs respectively. This is how you can
introduce
the specific details from your organism. Also, you will want to be
careful to ensure that your gene IDs should match the type of the IDs
in
your 'universeGeneIds' and 'geneIds' and you should use a type of ID
that is truly unique (I recommend something like entrez gene IDs).
library("GOstats")
library("GSEABase")
library("AnnotationDbi")
frame=GOFrame(frameData,organism="Homo sapiens")
allFrame=GOAllFrame(frame)
gsc <- GeneSetCollection(allFrame, setType = GOCollection())
params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params",
geneSetCollection=gsc, geneIds = genes, universeGeneIds = universe,
ontology = "MF", pvalueCutoff = 0.05, conditio
nal = FALSE, testDirection = "over")
Over <- hyperGTest(params)
Please let me know if you have questions or comments. This is a new
capability, that we are adding so that we can provide better support
for
non-model organisms.
Marc
Ingunn Berget wrote:
> Dear List
>
> I want to do GO analysis on my microarray results, and have not done
this before. We have a cDNA array for a non-model organism. The
manufacturers of the array have provided annotations, so I have
> Accesision number, gene description, gene synonyms, EC,
molecular_function, biological_process, cellular_component, InterPro,
KEGG, Pfam, EMBL, Ensembl, UniGene, RefSeq, PROSITE, GeneId, org,
> and more in a tab delimited txt file.
>
> so I suppose I have all the information I need, how can I use this
with the bioconductor packages?
>
> I have looked at the vignette for SQLForge in the AnnotationDbi
package as suggested on this list before, but as it says "At the
present time, it is possible to make annotation packages for the
> most common model organisms" I don't know how to proceed.
>
> Best regards
> Ingunn
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>