This is up to the user as to how they wish to proceed. Typically the bias data which one provides to the nullp function is the gene length (as this is the bias we are most keen to correct for). But sometimes you want to correct for other biases, counts bias (the number of counts a gene receieves is effectively the product of both gene length and gene expression).
One could input the normalised counts (normalised for library size) as the bias data, thereby eliminating the effect of that bias beforehand, and obtain weights for counts bias. Then one would need to use the sample bias and the count bias weights. Alternatively one could use the raw counts to obtain the bias data, which would then effectively include sample bias as well. The weights would be different (depending on how large the sample bias was).
Both approaches are reasonable, it is up to you to decide how and which biases you which to control for with the pwf. I think raw counts are usually better, more intuitive. My suggestion would be to try both methods and have a check which method gives the greater bias from the pwf plot and use that one.
Thanks a lot,
It's not so clear from the manual though, but if so then best option would be to choose any gene parameter with the highest correlation with the DE proportion, as long as such parameter is technical (does not represent true biological factor).
Thanks a lot,
It's not so clear from the manual though, but if so then best option would be to choose any gene parameter with the highest correlation with the DE proportion, as long as such parameter is technical (does not represent true biological factor).
Assaf