voom(calcNormFactors()) for normalization of RNAseq counts for WGCNA
1
0
Entering edit mode
Gimena • 0
@ef404090
Last seen 7 weeks ago
United States

Hi, I am trying to do a WGCNA analysis on my bulk RNAseq data.

I have 3 cell lines, and conditions (for each cell line): control, ExposureLow, ExposureHigh; and I am also doing a surrogate variable analysis to consider as non interesting covariates.

i am interested in the Exposure effect.

A1) regarding normalization: I am trying to use voom and calcNormFactors to normalize and prepare the data for WGCNA; and I am also regressing out the effect of Line+ the SVs by calculating this beta coeficient and substracting it from the normalizes counts. I do get a very nice PCA clustering by Exposure condition. But I wonder if this preprocessing of the data is correct.

A2) Regarding WGCNA, the free scale topology suggest me a power of 28 for > 0.8< and with blockwiseModules() the tree is awful, very noisy (see image). I can still play with the parameters and get some modules, even significant for my conditions, but is this ok?

B1) I also tried normalization with DEseq2 but the PCA clusters are by line and not by exposure. I wonder if I can correct for the line and svs as with voom... Or if I am missing some other transformation.

B2) the scale free topology is much better, I could pick a power of 7. and blockwiseModules() are very different and the modules are then no significant for my conditions.

C) I also have TPM already from RSEM... in case is better than starting from read counts, should I still do A or B or just log transform?

Are both approaches correctly done? Which one do you think is best given my data and results?

Why removing genes with low counts in approach B but not in A? Should I nut be using same criteria?

Thank you in advance!

#A1  norm by voom
design=model.matrix(~Exposure + Line +sv1+ sv2+ sv3+sv4+ sv5,
  data = GenX_meta)

dge.voom = voom(calcNormFactors(DGEList(counts = counts),
                                method = 'TMM'), design, plot = T)
Y = as.matrix(dge.voom$E)
X = as.matrix(design)
beta = (solve(t(X)%*%X)%*%t(X))%*%t(Y)
regressed_dataX = Y - t(X[,c(4:ncol(X))] %*% beta[c(4:nrow(beta)),])

##A2

powers = c(seq(1,9,by=1),seq(10,30,by=2))
sft_cor = pickSoftThreshold(data= t(regressed_dataX), 
                              networkType = "signed", corFnc="cor",verbose=2,
                              powerVector=powers,blockSize = 20000)

plot(sft_cor$fitIndices[,1], 
     -sign(sft_cor$fitIndices[,3])*sft_cor$fitIndices[,2],
     xlab="Soft Thresh Power", ylab="Scale free R^2",type="n")
text(sft_cor$fitIndices[,1], 
     -sign(sft_cor$fitIndices[,3])*sft_cor$fitIndices[,2], 
     labels = powers, cex = 0.7, col="red",  xlab="Soft Thresh Power", 
     ylab="Scale free R^2")
abline(h=0.8, col="black")
plot(sft_cor$fitIndices[,1], sft_cor$fitIndices[,5], 
     xlab = "Soft threshold power", ylab = "Mean connectivity", type = "n")
text(sft_cor$fitIndices[,1], sft_cor$fitIndices[,5], 
     labels = powers, cex = 0.7, col="black")

net = blockwiseModules(datExpr=t(regressed_dataX), maxBlockSize=15000,networkType="signed",TOMtype="signed", corType="bicor",  
                       power = 28, mergeCutHeight= 0.25, minModuleSize= 40, pamStage=TRUE, reassignThreshold=1e-6, 
                       saveTOMFileBase="WGCNA_signed_cor_counts_regress_28_0.25_5", saveTOMs=TRUE, 
                       verbose = 5, deepSplit=2)

#2A

# create dds
dds <- DESeqDataSetFromMatrix(countData = round(data),
                              colData = colData,
                              design = ~ Line + Exposure)



## remove all genes with counts < 10 in more than 75% of samples (16*0.75)

dds75 <- dds[rowSums(counts(dds) >= 10) >= (16*0.75),]
nrow(dds75) # 13525 genes

# perform variance stabilization
dds_norm <- vst(dds75)

# get normalized counts
norm.counts <- assay(dds_norm) %>% 
  t()

#2B

net = blockwiseModules(datExpr=norm.counts, maxBlockSize=15000,networkType="signed",corType="bicor",  
                       power = 7, mergeCutHeight= 0.25, minModuleSize= 40, pamStage=TRUE, reassignThreshold=1e-6, 
                       saveTOMFileBase="WGCNA_DEseq_signed_cor_counts_regress_7", saveTOMs=TRUE, 
                       verbose = Inf, deepSplit=2)

Example of Dendrogram with voomExample of Dendrogram with Deseq approach

Voom Limma WGCNA • 463 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

WGCNA is a CRAN package, not Bioconductor. Plus you are asking for analysis advice rather than technical questions about Bioc packages, so you are way off-topic. You should probably ask over on biostars.org instead.

0
Entering edit mode

HI James,

I am sorry, my concerns were actually mainly related to normalization and matrix designs with limma and DEseq2 packages... WGCNA plots just point out to me that I might not be normalizing it correctly (rather than trying to ask about blockwisemodules arguments or issues, I believe the previous steps are the incorrect ones).

thank you for the suggestion on biostar.org. I will try that.

Gimena

ADD REPLY

Login before adding your answer.

Traffic: 335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6