Dear Bioconductor community,
First of all, thank you for this forum that helped me many times during my PhD years. This is the first time I'm posting a question, but I've been following the forum for a long time now.
My problem is the following : I would like to introduce bias coefficients into DESeq2, before differential expression analysis.
Basically, what I would like to do is to multiply all my raw counts in a given sample by a coefficient k.
So let's say I have two samples A and B, for two genes, a and b. I would like to transform my initial count matrix
A B
a x1 x2
b x3 x4
in
A B
a x1*k1 x2*k2
b x3*k1 x4*k2
However, my current problem is that would only be accounted as sequecing depth, thus normalized by DESeq2. Is there a way to introduce a bias after normalization ? Does that even make sense to use DESeq2 ?
For those interested in why, it would be to use DESeq2 to compare absolute production of RNA (let's say two cell type population, with different abundance, thus not only comparing the relative enrichment of a gene in each cell type, but the total enrichment of that gene in a mix of these two cells with known cell type abundance).
Thank you very much for your help and kind support and excuse me if my question is not perfectly posed (yet!),
David Benacom
Dear Micheal,
Thank you for your answer, as I missed that feature while looking in the documentation. Here is the toy example I made using the normFactors.
For that, I calculated the sizeFactors, then I used them as an initial value that I multiplied by the inverse of my coefficients (to scale up), then, I normalized again the size factors with the geometric mean.
It behaves as I want to, i.e it gives me a statistical value for enrichment, that is modulated by my coefficients. It also corrects for library size. Does it sound statistically sound ? Thank you,
Did you read the support site posts that use the
normMatrix
argument ofestimateSizeFactors
?Here is one:
Proper way to implement DESeq2 when comparing samples with trisomies?
Dear Micheal,
This example is really clear. As I understand, here you normalize effectively for multicopy, so that a gene that is inflated by a variation in chromosome copy number is effectively normalized for expression, and thus does not appear in the DE genes in the "results" part, because the number of DNA copies is effectively four time more.
For my part, what I want to do is give more 'weight' to some samples, so that their genes effectively appear as DE. So for instance, my NormMatrix should offset all the genes in sample A, so that they appear 3 times more expressed that they really are. But I cannot just multiply all my counts by three in my weighted samples, because then it would appear as a library size that is three time bigger, which get normalized (in the next example)
Similarly, if I just set the norm matrix to 3 for all genes of the first two samples, it also gets normalized.
Does that make any sense ?
It sounds from your description like you want the offset to go in the other direction then?
E.g. say you have a count of 100. If you supply normMatrix and that sample gets a value 2x higher than the other samples, then it's halving the 100, so it appears more like a 50 to the rest of the regression formula (in that we model E(count) = X beta + offset).