I was just wondering - is there a way to access/download the all-time download statistics for R packages, just like you can access/download it for the last 12 moth via http://bioconductor.org/packages/stats/ ?
At some point we were generating the download stats for the last 24 months. Maybe we should go back to that. Then people can compare the same period of the year between the current and previous year. Right now it's hard to assess the trend for a package because of the seasonal variations. Would that be useful? We could also generate the stats for each past year (starting, say, 5 years ago) and provide permanent links to that, and add a new link for each completed year.
In the past some people also expressed the desire to have access to the download data in a form that is easy to compute on so they can generate their own stats. What would be a good format for that? Right now we extract the download history from the Apache and Amazon Cloud Front-End logs and import it in an SQLite database. The DB is big (12G) so we would need to split it and should also anonymize it (i.e. mangle the IPs in some way) if we wanted to make it available for download. Suggestions about the best way to do this are welcome.
What I was thinking about would basically be a tap separated file with 4 columns: year, month, # distinct IPs and # downloads, which just had the whole history (one row pr month) ranging all the way back to when the package was first uploaded.
Then there could be two on the fly made graphs, one for each month in the previous year, and one for each year the package have existed.
For backward-compatibility one could split it into two tabels, one that looks like the current (and have the current stable link), and one that have all the data (having another stable link).
Unfortunately i'm not qualified to give input for how to best store the data but i would guess there are very efficient methods for this specific problem out there.
I agree with this suggestion -- that four column summary table would be great! (And small enough that indefinite history doesn't seem like it would be a problem...)
What I was thinking about would basically be a tap separated file with 4 columns: year, month, # distinct IPs and # downloads, which just had the whole history (one row pr month) ranging all the way back to when the package was first uploaded.
Then there could be two on the fly made graphs, one for each month in the previous year, and one for each year the package have existed.
For backward-compatibility one could split it into two tabels, one that looks like the current (and have the current stable link), and one that have all the data (having another stable link).
Unfortunately i'm not qualified to give input for how to best store the data but i would guess there are very efficient methods for this specific problem out there.
I agree with this suggestion -- that four column summary table would be great! (And small enough that indefinite history doesn't seem like it would be a problem...)