If you're absolutely sure that your table was generated by starting with a table of integer counts and scaling each column by a single normalization factor, you can sort the values in each column and use the intervals between consecutive unique values to infer the normalization factor that was used. Very importantly, this method assumes that each column has a sufficient density of counts that consecutive integers appear very often. This should generally be true of most RNA-seq count data. But as an example, if all the counts were somehow multiples of 10, this method would fail and give the wrong answers, since it would infer 10 counts as 1. If the original counts were non-integers, such as estimated gene counts generated from RSEM, Kallisto, or Salmon, then this method may also fail.
Luckily, your data set seems to satisfy all the above requirements, so here's the code to recover the original counts (probably):
library(assertthat)
library(magrittr)
infer.counts <- function(x, digits=3) {
assert_that(all(x >= 0))
assert_that(digits >= 2)
## Get all diffs between successive unique values
diffs <- x %>% sort %>% unique %>% diff
## Round to a few digits to work around inexact representation
approxdiffs <- signif(diffs, digits)
## Find the rounded interval that occurs most often
approxguess <- approxdiffs %>% table %>% .[which.max(.)] %>% names %>% as.numeric
## Find all the intervals that were rounded to the selected one, and take their mean
unit.guess <- diffs[approxdiffs == approxguess] %>% mean
message("Guessing 1 count = ", unit.guess)
## Divide the original vector by the unit guess, and round to a
## few significant digits, which should ideally round everything
## to integers.
round(x / unit.guess, digits)
}
file.url <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE82227&format=file&file=GSE82227%5Fcounts%5FDESeq%2Enormalized%2Ecsv%2Egz"
normcounts <- read.csv(textConnection(readLines(gzcon(url(file.url)))), row.names="id")
counts <- apply(normcounts, 2, infer.counts)
And here's the first few rows & columns of the data before and after:
> normcounts[1:5,1:5]
data.frame with 5 rows and 5 columns
SR094_01 SR094_02 SR094_03 SR094_05 SR094_06
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 2.80095 1.78999 2.171321 4.691401 2.887229
ENSG00000000419 1125.98170 1024.76918 976.008977 933.588869 1022.078941
ENSG00000000457 81.22754 63.54464 260.558570 302.595387 341.655390
ENSG00000000460 171.79157 191.52891 169.363071 145.433442 110.677098
ENSG00000000938 15206.35491 14216.99427 12689.202370 18185.044490 20616.737360
> counts[1:5,1:5]
SR094_01 SR094_02 SR094_03 SR094_05 SR094_06
ENSG00000000003 3 2 2 4 3
ENSG00000000419 1206 1145 899 796 1062
ENSG00000000457 87 71 240 258 355
ENSG00000000460 184 214 156 124 115
ENSG00000000938 16287 15885 11688 15505 21422
I'll stress again that it doesn't take much to fool this simplistic method. A few non-integer or even just badly-rounded values could cause it to give the wrong answer. So any time you use this, give the inferred counts some thorough scrutiny to make sure they look like real counts.
(I'm sure there's a more robust method that would work on data that was originally only mostly and/or approximately integer counts, such as RSEM output, but the above is quick and dirty and seems to work for this particular data set.)
This is fantastic! Very helpful!