Question

AnnotationForge/SQLForge: how to add data after creating the new package

0

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi,

We are using a custom microarray and I am working on creating a new platform package with AnnotationForge which is working a-ok extracting most of the info we need. We do have additional data that we would like to input into the package like sequence data and probe position data but I just can't seem to find how to insert these datasets into the newly created package.

BTW, yes, I have read the vignettes and also found the doc made by Ga ́bor Csa ́rdi but I was looking for more elegant ways, something like a few convenience methods to just get a matrix of values using as keys a value from the created database to automatically create the appropriate table...

Any input gladly accepted ;-)

Sylvain

Sylvain Foisy, Ph. D.

Project Manager Bioinformatics

Montreal Heart Institute

annotation custom platform • 1.5k views

ADD COMMENT • link 8.9 years ago Sylvain Foisy ▴ 80

score 1 · Answer 1 · 2015-06-12

Hi Sylvain,

So you have generated an annotation package and installed it into your library. Why do you now want to insert these other data into the database for your annotation package? Do you really need to do that? Because there are many immediate downsides to that approach, and there are other ways that you can preserve your custom data.

Regarding the data that you mentioned, there already exists a class of packages for storing probe position, and it seems like getting the sequence data for the probes could be extracted from a simple GRanges object combined with the appropriate BSgenome object.

I can also tell you that if you really need to use databases to store the data you have in mind, that AnnotationForge has tools that will let you (for example) make a custom OrgDb package from data.frames. And OrganismDbi allows you to use several different database objects together as if they were a single object (providing that they all implement a select() method). So if you have a situation where you have several different types of annotation databases, these resources can all work together in a modular fashion without having to define bimaps or do any of the things that are discussed in that really old document that Gabor created. So if you are really interested in exploring this kind of stuff I would recommend that you look at the following documents (as I fear you may have been reading some very old files - based on your mentioning the Gabor document):

For a more current overview of the annotation resources:

http://bioconductor.org/help/workflows/annotation/Annotation_Resources/

For information about OrganismDbi:

http://bioconductor.org/packages/release/bioc/vignettes/OrganismDbi/inst/doc/OrganismDbi.pdf

And from AnnotationForge you should look at the help page for the makeOrgPackage() function

Anyhow, I hope this helps,

Marc

score 0 · Answer 2 · 2015-06-15

0

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-5539

Last seen 4.6 years ago

Canada

Hi Marc,

Thanks for the inputs ;-) You might have guess from my post but the ways of Bioconductor are rather alien to me... This array is not for a new organism but for exon expression of a subset of H. sapiens genes. I just want to put into the platform package the positions of individual probes and the probes' sequence as synthesized so that other users even less able than me can get easy access in a simple way.

I added the 2 tables that are needed; I simply need to learn how to make the package aware of their existence and their relationships with the other tables.

Best regards

S

ADD COMMENT • link 8.9 years ago Sylvain Foisy ▴ 80

1

Entering edit mode

The annotation packages are intended to give mappings between gene-level things, that are not particular to your samples. For example, mappings between say an Entrez Gene ID and HUGO symbol.

The correct container for what you want is the SummarizedExperiment class, which has slots that contain either GRanges or GRangesLists (to say where in the genome your measurements come from), and a slot that contains the measurements themselves.

Rather than trying to jam data into a format that is not really suited for the purpose, you should look at SummarizedExperiment (in the GenomicFeatures package), which is highly likely to be useful to you, right out of the box.

ADD REPLY • link 8.9 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

From my understanding on how all this is working, a SummerizeExperiment instance would need both a dataset and an platform package to be of some use? I am trying to build the platform package which do not exists right now...

Best regards

S

ADD REPLY • link 8.9 years ago Sylvain Foisy ▴ 80

score 0 · Answer 3 · 2015-06-18

Hi,

So far, I have edited my SQLite database with my new informations and I am working on editing the zzz.R file created by AnnotationForge. I have some issues where if I add one createSimpleBiMap statement (let's say for myPlatformPROBEID2SEQUENCE) it works but if I try to add a second one in the file (myPlatforrmPROBEID2POSITIONX), I get something like this:

head(as.list(myPlatform2PROBEID2POSITIONX))
Error in head(as.list(myPlatform2PROBEID2POSITIONX)) :
error in evaluating the argument 'x' in selecting a method for function 'head': Error in as.list(myPlatform2PROBEID2POSITIONX) :
error in evaluating the argument 'x' in selecting a method for function 'as.list': Error: object 'myPlatform2PROBEID2POSITIONX' not found

However, when I go through he R console, it works... Yet again a simple thing that I am missing? Grrrhh...

Best regards

S