Question

How to handle multiple assay versions in a ChipDb annotation package

0

Entering edit mode

ahiser • 0

@16bcb1ef

Last seen 2.7 years ago

United States

I'm trying to create a ChipDb annotation package for a platform that has a single human assay. The platform currently has older versions of the assay, and data obtained from those older versions are still in use (but no longer being produced). For example, let's say Platform X can run a single assay, Assay Y, and a new version of Assay Y is released every couple of years. The newest version is v5.0, and the previous version is v4.9. Data produced using v4.9 of Assay Y are still being analyzed in various studies, but all new data being produced are created with v5.0 of Assay Y, so an annotation package for both v4.9 and v5.0 would be useful. The primary difference between v4.9 and v5.0 is the addition and/or removal of a few probes. This means that the list of primary keys (probes) are slightly different between each assay version, and will continue to change in future versions.

I'm thinking I will need to do 1 of 2 things to account for the new assay versions:

Create a new and entirely separate ChipDb package for each new assay release. However, I worry that down the line, this will result in a lot of work to maintain documentation for each package.
Create 1 ChipDb package and increment the major version for each assay release, and any other changes to the package itself (outside of assay-related changes) will need to be done via minor or patch revisions.

I'm unable to find information in the AnnotationDbi documentation about how this situation should be handled - is there any guidance or suggestions for versioning of assays/platforms/chips for annotation packages using ChipDb objects? Which of the 2 options above is preferred?

Thanks!

ChipDb AnnotationDbi AnnotationForge • 1.4k views

ADD COMMENT • link updated 3.0 years ago by James W. MacDonald 68k • written 3.0 years ago by ahiser • 0

score 0 · Answer 1 · 2023-01-12

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 18 hours ago

United States

Is this an internal package, or are you providing to Bioconductor? If the latter, please use the Bioc-devel listserv (bioc-devel@r-project.org) when asking package development questions.

If the changes between platform is simply the addition or subtraction of a few probes, and if the probe IDs aren't changing, you should probably just build a single omnibus package that contains all the probes that ever existed on any version of the array. For a person using version 4.9 of platform Y it won't matter if there are five new probes on the ChipDb package that are applicable to version 5.0 of platform Y. Those probes will in effect be invisible to the end user, because they don't exist on the platform they are using, so they won't ever query for them.

If there are other material changes to the platforms then you may need to have version-specific annotation packages, but given what you have provided I don't see a compelling reason for that.

ADD COMMENT • link 3.0 years ago James W. MacDonald 68k

0

Entering edit mode

The package is intended to be submitted to Bioconductor. But you're right, I see now that I should've submitted this question to the list serv, my apologies. I'll submit future package dev questions there.

If I were to create an omnibus package, how should I handle probes that could change between versions? For example, it's possible (and as I'm now learning, even likely) that the target gene could be updated or modified based on new information, the corresponding Uniprot ID could be changed, etc. So Probe 1 of Platform Y, version 5.0, could have different annotations from the previous Probe 1 of Platform Y, version 4.9. I would think that adding a "Version" column (containing the assay version) would enable users to select for the set of probes that correspond to their version of interest. However, this would require multiple probes with the same primary key/probe ID, one for each package version (assuming the probe in question changed between versions). Do you have any suggestions for how that situation should be handled?

ADD REPLY • link 3.0 years ago ahiser • 0

0

Entering edit mode

It depends. If Probe 1 across all platforms is the same probe, but for some reason what it is thought to measure changes, then I would just update that information in the current version. This is what happens to the ChipDb packages. As an example, a given probeset on an Affymetrix array might have been thought to interrogate Gene X. But then a new genome build comes out and they realize that the sequence for that probe now matches Gene Y, and they update the annotation file to reflect that.

The previous version of the corresponding ChipDb is static, and will still say the probeset interrogates Gene X, and if people want to keep previous analyses consistent they can use the old version of Bioconductor (even if it is arguably wrong, given the updated annotation from the manufacturer). But if they update to the new version of Bioc, they will get the new 'corrected' annotation. But in this instance we are simply providing information that we get from the manufacturer, without doing anything to validate, and the changes simply reflect the natural consequences of changes in our understanding of the genome.

But if Probe 1 on different versions of Platform Y is meant to measure completely different things, then I think you need platform/version specific ChipDb packages. In other words, if Probe 1 on version 5.0 is supposed to measure Gene X, but Probe 1 on version 4.9 was meant to measure Gene Y, and the differences are that the probes are completely different but have identical names, then that is a huge problem. And you will likely need some sort of error checking to ensure that people don't use the v4.9 ChipDb package on v5.0 data.

ADD REPLY • link 3.0 years ago James W. MacDonald 68k