In the current GenomicFeatures package (Bioc 3.2), makeTxDbPackage uses the "Data source" metadata to set the Provider and ProviderVersion fields of the txdb template, even though these should sometimes (always?) be different. Could this be fixed for Bioc 3.3?
Moreover, it would help if makeTxDbFromGFF had arguments allowing the user to specify these provider meta-data.
Also in makeTxDbPackage, the SOURCEURL field of the txdb template should also be filled by "Resource URL" if it exists, whether or not the txdb object was made from the UCSC or BioMart.
ps: here is a link to the relevant code on Bioc's GitHub mirror
.getTxDbVersion <- function(txdb){
type <- .getMetaDataValue(txdb,'Data source')
if (type=="UCSC") {
version <- paste(.getMetaDataValue(txdb,'Genome'),
"genome based on the",
.getMetaDataValue(txdb,'UCSC Table'), "table")
} else if (type=="BioMart") {
version <- .getMetaDataValue(txdb,'BioMart database version')
} else {
version <- .getMetaDataValue(txdb,'Data source')
}
version
}
PROVIDER is set with 'Data source' from the metadata however PROVIDERVERSION only uses 'Data source' if the PROVIDER is not 'UCSC' or 'BioMart'. It sounds like you really want the option to specify these fields in which case they should be passed to makeTxDbPackage() not makeTxDbFromGFF() since 'provider' and 'provider_version' fields are specific to the DESCRIPTION and not the metadata table of a TxDb. I've added 'provider' and 'providerVersion' args to makeTxDbPackage().
I agree SOURCEURL should always come from 'Resource Url' and have made that change.
We've just had the 3.3 release and now changes to that branch are limited to bug fixes. These modifications have been checked into the devel branch, GenomicFeatures 1.25.3.
great, thanks!