Question

Questions about pseudo counts from RUVSeq

0

Entering edit mode

Yuxiang • 0

@142f26c1

Last seen 1 day ago

Mexico

Hello everyone,

Now I have two batches of RNA-Seq data, and I want to make gene expression levels comparable both between different genes within the same sample and for the same gene across different samples. This will allow me to perform ssGSEA, WGCNA and machine learning analyses on the expression matrix.

My idea is to use EDASeq to adjust for GC content, gene length, and library size. Then, I intend to use RUVr to correct for batch effects.

Here is an example of my code:

data <- newSeqExpressionSet(
  counts = as.matrix(Total_express),
  phenoData = sample_Info,
  featureData = gene_Info
)

gc_norm <- withinLaneNormalization(data, "GC", which = "upper")
gl_norm <- withinLaneNormalization(gc_norm, "exon_length", which = "upper")
lib_norm <- betweenLaneNormalization(gl_norm, which="full")

Controls <- makeGroups(factor(pData(lib_norm)[,1]))
Total_adj <- RUVs(x = lib_norm, k = 1, scIdx = Controls)
Total_adj <- Total_adj$normalizedCounts

My questions are:

According to the RUVSeq user manual, for differential expression analysis, factors of unwanted variation, not pseudo counts, should be used to correct batch effects. I understand this is because the correction process disrupts the negative binomial distribution of the count matrix, which DESeq2 and edgeR rely on. However, since ssGSEA and WGCNA do not depend on the negative binomial distribution, do you think using a log-transformed pseudo count matrix is acceptable? I truly appreciate your suggestions.

Thank you for you suggestions in advance.

Best Regards

RUVSeq • 190 views

ADD COMMENT • link written 8 weeks ago by Yuxiang • 0