I'm trying to create a ChipDb annotation package for a platform that has a single human assay. The platform currently has older versions of the assay, and data obtained from those older versions are still in use (but no longer being produced). For example, let's say Platform X can run a single assay, Assay Y, and a new version of Assay Y is released every couple of years. The newest version is v5.0, and the previous version is v4.9. Data produced using v4.9 of Assay Y are still being analyzed in various studies, but all new data being produced are created with v5.0 of Assay Y, so an annotation package for both v4.9 and v5.0 would be useful. The primary difference between v4.9 and v5.0 is the addition and/or removal of a few probes. This means that the list of primary keys (probes) are slightly different between each assay version, and will continue to change in future versions.
I'm thinking I will need to do 1 of 2 things to account for the new assay versions:
- Create a new and entirely separate ChipDb package for each new assay release. However, I worry that down the line, this will result in a lot of work to maintain documentation for each package.
- Create 1 ChipDb package and increment the major version for each assay release, and any other changes to the package itself (outside of assay-related changes) will need to be done via minor or patch revisions.
I'm unable to find information in the AnnotationDbi documentation about how this situation should be handled - is there any guidance or suggestions for versioning of assays/platforms/chips for annotation packages using ChipDb objects? Which of the 2 options above is preferred?
Thanks!
The package is intended to be submitted to Bioconductor. But you're right, I see now that I should've submitted this question to the list serv, my apologies. I'll submit future package dev questions there.
If I were to create an omnibus package, how should I handle probes that could change between versions? For example, it's possible (and as I'm now learning, even likely) that the target gene could be updated or modified based on new information, the corresponding Uniprot ID could be changed, etc. So Probe 1 of Platform Y, version 5.0, could have different annotations from the previous Probe 1 of Platform Y, version 4.9. I would think that adding a "Version" column (containing the assay version) would enable users to select for the set of probes that correspond to their version of interest. However, this would require multiple probes with the same primary key/probe ID, one for each package version (assuming the probe in question changed between versions). Do you have any suggestions for how that situation should be handled?
It depends. If Probe 1 across all platforms is the same probe, but for some reason what it is thought to measure changes, then I would just update that information in the current version. This is what happens to the
ChipDb
packages. As an example, a given probeset on an Affymetrix array might have been thought to interrogate Gene X. But then a new genome build comes out and they realize that the sequence for that probe now matches Gene Y, and they update the annotation file to reflect that.The previous version of the corresponding
ChipDb
is static, and will still say the probeset interrogates Gene X, and if people want to keep previous analyses consistent they can use the old version of Bioconductor (even if it is arguably wrong, given the updated annotation from the manufacturer). But if they update to the new version of Bioc, they will get the new 'corrected' annotation. But in this instance we are simply providing information that we get from the manufacturer, without doing anything to validate, and the changes simply reflect the natural consequences of changes in our understanding of the genome.But if Probe 1 on different versions of Platform Y is meant to measure completely different things, then I think you need platform/version specific
ChipDb
packages. In other words, if Probe 1 on version 5.0 is supposed to measure Gene X, but Probe 1 on version 4.9 was meant to measure Gene Y, and the differences are that the probes are completely different but have identical names, then that is a huge problem. And you will likely need some sort of error checking to ensure that people don't use the v4.9ChipDb
package on v5.0 data.