Hi everyone!
I'm trying to read the code of the functions being used in the EdgeR package during RNA-seq normalization and DE analysis. My background in statistics isn't very good and I like reading the code so I understand what is being done while I try to reproduce it step by step. However, as I go in detail in each of the functions used, I got stuck in the C++ function .cxx_ave_log_cpm
and I don't know how to access the code inside. The R function that contains this one is the following:
aveLogCPM.default
function (y, lib.size = NULL, offset = NULL, prior.count = 2,
dispersion = NULL, weights = NULL, ...)
{
y <- as.matrix(y)
if (nrow(y) == 0L)
return(numeric(0))
if (.isAllZero(y)) {
if ((is.null(lib.size) || max(lib.size) == 0) && (is.null(offset) ||
max(offset) == -Inf)) {
abundance <- rep(-log(nrow(y)), nrow(y))
return((abundance + log(1e+06))/log(2))
}
}
if (is.null(dispersion))
dispersion <- 0.05
isna <- is.na(dispersion)
if (all(isna))
dispersion <- 0.05
if (any(isna))
dispersion[isna] <- mean(dispersion, na.rm = TRUE)
dispersion <- .compressDispersions(y, dispersion)
weights <- .compressWeights(y, weights)
offset <- .compressOffsets(y, lib.size = lib.size, offset = offset)
prior.count <- .compressPrior(y, prior.count)
maxit <- formals(mglmOneGroup)$maxit
tol <- formals(mglmOneGroup)$tol
ab <- .Call(.cxx_ave_log_cpm, y, offset, prior.count, dispersion, ## <----- Here is where the C++ function is being called
weights, maxit, tol)
return(ab)
}
This function is being called inside the aveLogCPM.default
inside the aveLogCPM.DGEList
function.
I'm using:
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1
edgeR_3.42.4
Thanks for all your help! This is my first post in Bioconductor support group so please forgive me if I forgot to include any other information you believe might be needed.
Best,
Josue
Exactly what I was looking for. Thanks so much Mike Smith