I´m working with this dataset(GSE234297) that provides a salmon counts file and a salmon offsets file(GSE234297_gene_raw_counts.txt.gz GSE234297_gene_offset_matrix.txt.gz) . The paper authors use them like this to do DEG:~
# create DGEList
y <- DGEList(counts = counts, samples = samples, genes = genes, group = samples$Disease_status)
# add salmon offsets
y <- scaleOffset(y, offset = as.matrix(salmon_offset))
# filter low count genes
keep <- filterByExpr(y, group = y$samples$group)
table(keep)
y <- y[keep, , keep.lib.sizes=FALSE]
# set design matrix
design <- model.matrix(~ Sex + GC_content + group, data = y$samples)
design
# estimate dispersion parameters using edgeR robust method
y <- estimateGLMRobustDisp(y, design)
But my question is if I want to use the salmon offsets for Wilcoxon (which can't use offsets directly), could I simply do:
Copy <- DGEList(counts)
y$offset <- offsets
cpms <- edgeR::cpm(y, offset = y$offset, log = FALSE)
To get properly normalized values that account for the Salmon bias corrections?
Thank you. What is the difference between setting robust=TRUE in glmQLFit and estimateGLMRobustDisp? The paper i mentioned in the post used both: