Questions about pseudo counts from RUVSeq
0
0
Entering edit mode
Yuxiang • 0
@142f26c1
Last seen 1 day ago
Mexico

Hello everyone,

Now I have two batches of RNA-Seq data, and I want to make gene expression levels comparable both between different genes within the same sample and for the same gene across different samples. This will allow me to perform ssGSEA, WGCNA and machine learning analyses on the expression matrix.

My idea is to use EDASeq to adjust for GC content, gene length, and library size. Then, I intend to use RUVr to correct for batch effects.

Here is an example of my code:

data <- newSeqExpressionSet(
  counts = as.matrix(Total_express),
  phenoData = sample_Info,
  featureData = gene_Info
)

gc_norm <- withinLaneNormalization(data, "GC", which = "upper")
gl_norm <- withinLaneNormalization(gc_norm, "exon_length", which = "upper")
lib_norm <- betweenLaneNormalization(gl_norm, which="full")

Controls <- makeGroups(factor(pData(lib_norm)[,1]))
Total_adj <- RUVs(x = lib_norm, k = 1, scIdx = Controls)
Total_adj <- Total_adj$normalizedCounts

My questions are:

According to the RUVSeq user manual, for differential expression analysis, factors of unwanted variation, not pseudo counts, should be used to correct batch effects. I understand this is because the correction process disrupts the negative binomial distribution of the count matrix, which DESeq2 and edgeR rely on. However, since ssGSEA and WGCNA do not depend on the negative binomial distribution, do you think using a log-transformed pseudo count matrix is acceptable? I truly appreciate your suggestions.

Thank you for you suggestions in advance.

Best Regards

RUVSeq • 190 views
ADD COMMENT

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6