Entering edit mode
Dear List
I have peak counts from RNA-IP samples and corresponding inputs, for
two
different conditions.
I would like to find DE-binding between the two IP conditions after
removing the differential expression effect.
In a previous post (titled "differential binding question") Mark
Robinson
suggested to do GLM analysis.
Before doing the DE analysis I have to normalize the data.
Using DESeq "estimateSizeFactors" function I get the following
sizeFactors
> sizeFactors( cds )
cond1_IP cond1_IP.1 cond1_Input cond1_Input.1
cond2_IP
6.3672619 6.1015548 0.3209480 0.2553967 3.2300114
cond2_IP.1 cond2_IP.2 cond2_Input cond2_Input.1
1.7808445 1.7027369 0.2480639 0.2530747
With edgeR, these are the normalize factors I get using both TMM and
RLE
methods
> dTMM$samples
group lib.size norm.factors
cond1_IP H 8345160 0.9916792
cond1_IP.1 H 9395446 1.2221615
cond1_Input H 1126656 0.4489350
cond1_Input.1 H 219823 2.1955057
cond2_IP S 5707895 0.8339317
cond2_IP.1 S 5914904 0.5014391
cond2_IP.2 S 5602070 0.5043970
cond2_Input S 223442 1.9909578
cond2_Input.1 S 226840 1.9934207
>dRLE$samples
group lib.size norm.factors
cond1_IP H 8345160 1.2656111
cond1_IP.1 H 9395446 1.0772223
cond1_Input H 1126656 0.4725259
cond1_Input.1 H 219823 1.9271892
cond2_IP S 5707895 0.9386643
cond2_IP.1 S 5914904 0.4994138
cond2_IP.2 S 5602070 0.5041749
cond2_Input S 223442 1.8415393
cond2_Input.1 S 226840 1.8505947
The "real" library size (number of reads that have been successfully
aligned in each sample) are
cond1_IP 24055908
cond1_IP 16654296
cond1_lnput 12919153
cond1_Input 33778948
cond2_IP 17340233
cond2_IP 29284664
cond2_IP 27788144
cond2_Input 33477921
cond2_Input 33980303
As you can see, DESeq and edgeR are weighting-up Input samples and
weighting-down IP. I suppose this is due to the fact that many less
Input
reads are found in peak regions compared to IP which makes DESeq and
edgeR
to think that the Input library size is much lower than IP. In fact,
the
original library size of Input samples is in most cases larger than
the IP.
What do you think, shall I use the original library sizes as
normalization
factors instead of the calculated ones? I know this is possible with
DESeq,
but I couldn't find how to do it with edgeR.
Thanks
Mali
[[alternative HTML version deleted]]