Incorporating 'offset' into edgeR pipeline
alakatos
Last seen 2.1 years ago
United States

Hello All,

I would like to adjust my RNAseq dataset for GC content.

I used EDASeq to calculate the offset.


data <-newSeqExpressionSet(counts=as.matrix(d$counts),featureData=feature,phenoData=data.frame(pheno),row.names=rownames(d))
dataOffset <- withinLaneNormalization(data,"gc", which="full",offset=TRUE)
offset <- offst(dataOffset)

My goal is to incorporate “GC offset” into my edgeR pipeline  (not to override edgeR calculated offset).

I found some suggestions online but I am not sure I am doing right.

d <- calcNormFactors(d, robust=TRUE, method ="TMM", offset=offset)
d <- estimateDisp(d, design,)
fit <- glmFit(d, design)



Would you please advise?

Thanks a lot,


Aaron Lun
Last seen 17 hours ago
The city by the bay

Supplying offset to calcNormFactors will do nothing, see ?calcNormFactors.

If you need to supply offsets to edgeR, use the scaleOffset function to store the offset matrix inside the DGEList. This will ensure that the offsets are interpretable as log-library sizes (natural log), which is necessary for sensible calculation of the average log-CPM.

However, any specification of the offset matrix will override the use of normalization factors in downstream calculations. There is no straightforward way to "combine" offset matrices with the TMM normalization factors. edgeR can't possibly know what biases are removed by your offset matrix; it just assumes that you know what you're doing, and that the offset matrix gets rid of all biases that might be relevant to your contrasts of interest.

In short, supplying the EDAseq offset matrix will mean that only the EDAseq normalization will be performed. The TMM normalization factors will be mostly ignored for the purposes of DE testing.


