Question: RNseq data for WGCNA
gravatar for joseph
4 months ago by
joseph40 wrote:

I face some problem when I apply WGCNA in the RNseq data. Here’s the thing,

I have the raw court data, which looks like as:


                P6_A1 P6_A2 P6_A3 P6_A4 P6_A5 P6_A6 P6_A7 P6_A8 P6_A9 P6_A10 P6_A11 P6_A12 P6_B1 P6_B2 P6_B3 P6_B4 P6_B5 P6_B6 P6_B7 P6_B8

ENSG00000000419     0   375   527   337     0   607   595   149     0      0      0      0     0     0   567     0   322     1     0    42

ENSG00000000457     0     0     0     0     0     0     0     0     0      0      0      0     0     0     0     0     1     0   283     0

Then, I directly apply the count data to vsd = varianceStabilizingTransformation(RawCount_all_Filter), which looks fine.

I got the Expr = and Bench = t(Phenotype_all).

Then, checking the data, and cluster the sample as usual.

The height looks not like as usual in the microarray data, which I don’t if that’s correct.

But the scale free topology figure looks fine:

I select power = 10.

Then calculate the Co-expression similarity and adjacency matrix and TOM,

softPower = beta1

adjacency = adjacency(Expr, power = softPower,type = "signed")

TOM = TOMsimilarity(adjacency,TOMType = "signed")

dissTOM = 1-TOM


I get modules like this:

> table(dynamicMods)
   0    1    2 
  14 8041  305 


I pretty shock about the results, and I don’t know how to interrupt or modify the analysis.

Later around, the module-trait relationships look awful,


Could you please point out what’s going on my data? And how to improve it?


Thank you so much!


ADD COMMENTlink modified 3 months ago by Peter Langfelder1.4k • written 4 months ago by joseph40

I don't find the problem. What do you want to improve? You would like to have more modules?

ADD REPLYlink written 4 months ago by Lluís R310

Thanks. I'm not sure if few modules that I get looks "normal". Yes, I would like to have more modules. Meanwhile, the module-trait correlations are pretty low, I wonder to figure it out whether my data processing problem or it's the real of the data.

ADD REPLYlink written 3 months ago by joseph40

I think you can add a limit on your module size, or given the TOM distance you could build different networks by cutting at different heights. Maybe there isn't correlation between the group and the traits you are testing. You could also check if building other type of networks adjacency(..., type = "unsigned") or "signed hybrid" would change that, but that depends on what are you looking for in the network.

ADD REPLYlink written 3 months ago by Lluís R310
gravatar for Peter Langfelder
3 months ago by
United States
Peter Langfelder1.4k wrote:

It is impossible to diagnose issues with your data or analysis without seeing the data (the code looks reasonable but it obviously isn't complete). One usually sees at least a few distinct branches in WGCNA analysis of expression data, but again, without knowing what your data represent and seeing the actual data it is impossible to say anything of substance.

ADD COMMENTlink written 3 months ago by Peter Langfelder1.4k

Here's the data link:

load("Filtered and annotated Count data.Rdata")

RawCount_all_new, gene_annotation_new, Phenotype_all

# Removing genes that are lowly expressed
# Keep samples with cpm > 5,and more than 10 samples
keep.exprs = rowSums(cpm(RawCount_all_new) > 5) >= 10

# Filter the Count
RawCount_all_Filter = RawCount_all_new[keep.exprs,]

# Normalising gene expression distributions
# Varicance Stabilization
vsd = varianceStabilizingTransformation (RawCount_all_Filter)

# Important: transpose the expression data to rows for samples and columns for genes
Expr <-

Bench <- t(Phenotype_all)

ADD REPLYlink written 3 months ago by joseph40

The link to data is invalid (box reports the file has been removed or is unavailable).

Plot a sample tree from the "Expr" data; you may have a monstrous batch effect.

I hope the raw counts are really raw counts, not some transformed version. Your filter seems too stringent to me, I would filter on raw counts being greater than 5, not CPM (which are typically much less since each sample has many millions of reads).

ADD REPLYlink written 3 months ago by Peter Langfelder1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 194 users visited in the last hour