I ran into some complications while analyzing/normalizing my data. It would be kind of you if you could help me.
We have ChIP-seq data for 5 histones from treated and untreated mice. Each histone has two biological replicates and one technical replicate of only one biological replicate. Before performing any differential testing, I wanted to check if the normalization with the size factors from DESeq2 will improve my data or show some possible experiment errors etc. Also, I am intended to use normalized counts (by size factors) outside the DESeq to check for differential analysis and also VST and rlog transformed data as input for PEER.
Raw counts for histone K4me3:
Square root of counts from histone K4me3:
After normalization with the sizeFactors:
cds=estimateSizeFactors(dds)
o=counts(cds,normalized=TRUE)
First pic are the normalized counts and the second is the square root on the normalized counts.
VST and rlog transformations:
It seems that something strange is happening there. Also the same plots for other data show some kind of strange curves after the normalization.
Yeah, to echo Ryan, you should take a look at Aaron and Gordon's workflow which covers normalization extensively:
http://master.bioconductor.org/help/workflows/chipseqDB/
also published on F1000:
http://f1000research.com/articles/4-1080/v1
Just a small question that I am still not sure about: why would I choose VST or log transformations and not just take data normalized with the suitable size factors if my goal to find DE genes (not using DESeq2)?
In DESeq2 we use the transformations for making EDA plots, such as PCA, hierarchical clustering, etc.
If you are using some other software, that software will probably have it's own internal correction for sequencing depth.
It's up to you to read the manual of that other software or ask the maintainer of that software what is expected as input: raw counts, counts normalized for sequencing depth, or some other expected input.