Search
Question: RNseq data for WGCNA
0
13 months ago by
joseph40
joseph40 wrote:

I face some problem when I apply WGCNA in the RNseq data. Here’s the thing,

I have the raw court data, which looks like as:

P6_A1 P6_A2 P6_A3 P6_A4 P6_A5 P6_A6 P6_A7 P6_A8 P6_A9 P6_A10 P6_A11 P6_A12 P6_B1 P6_B2 P6_B3 P6_B4 P6_B5 P6_B6 P6_B7 P6_B8

ENSG00000000419     0   375   527   337     0   607   595   149     0      0      0      0     0     0   567     0   322     1     0    42

ENSG00000000457     0     0     0     0     0     0     0     0     0      0      0      0     0     0     0     0     1     0   283     0

Then, I directly apply the count data to vsd = varianceStabilizingTransformation(RawCount_all_Filter), which looks fine.

I got the Expr = as.data.frame(t(vsd)) and Bench = t(Phenotype_all).

Then, checking the data, and cluster the sample as usual.

https://bri.box.com/s/solx4maup2g46jbcg1qe14s9vnszzsok

The height looks not like as usual in the microarray data, which I don’t if that’s correct.

But the scale free topology figure looks fine:

https://bri.box.com/s/9tn9frq2asajqgk48haf806av4j0vkk6

I select power = 10.

Then calculate the Co-expression similarity and adjacency matrix and TOM,

softPower = beta1

dissTOM = 1-TOM

I get modules like this:

https://bri.box.com/s/35br575uafusinti45a0e9nuywghn8lk

> table(dynamicMods)
dynamicMods
   0    1    2
  14 8041  305

I pretty shock about the results, and I don’t know how to interrupt or modify the analysis.

Later around, the module-trait relationships look awful,

https://bri.box.com/s/ofawxu03hlibrcik3totja66qn6s40z1

Could you please point out what’s going on my data? And how to improve it?

Thank you so much!

modified 13 months ago by Peter Langfelder1.6k • written 13 months ago by joseph40

I don't find the problem. What do you want to improve? You would like to have more modules?

Thanks. I'm not sure if few modules that I get looks "normal". Yes, I would like to have more modules. Meanwhile, the module-trait correlations are pretty low, I wonder to figure it out whether my data processing problem or it's the real of the data.

I think you can add a limit on your module size, or given the TOM distance you could build different networks by cutting at different heights. Maybe there isn't correlation between the group and the traits you are testing. You could also check if building other type of networks adjacency(..., type = "unsigned") or "signed hybrid" would change that, but that depends on what are you looking for in the network.

0
13 months ago by
United States
Peter Langfelder1.6k wrote:

It is impossible to diagnose issues with your data or analysis without seeing the data (the code looks reasonable but it obviously isn't complete). One usually sees at least a few distinct branches in WGCNA analysis of expression data, but again, without knowing what your data represent and seeing the actual data it is impossible to say anything of substance.

https://bri.box.com/s/412z4zvjerypzffk51ptv368u6l5pxlf

RawCount_all_new, gene_annotation_new, Phenotype_all

#------------------------------------------------------------------------------------------------#
# Removing genes that are lowly expressed
#------------------------------------------------------------------------------------------------#
# Keep samples with cpm > 5,and more than 10 samples
keep.exprs = rowSums(cpm(RawCount_all_new) > 5) >= 10

# Filter the Count
RawCount_all_Filter = RawCount_all_new[keep.exprs,]
dim(RawCount_all_Filter)

#------------------------------------------------------------------------------------------------#
# Normalising gene expression distributions
#------------------------------------------------------------------------------------------------#
# Varicance Stabilization
vsd = varianceStabilizingTransformation (RawCount_all_Filter)

# Important: transpose the expression data to rows for samples and columns for genes
Expr <- as.data.frame(t(vsd))

Bench <- t(Phenotype_all)

The link to data is invalid (box reports the file has been removed or is unavailable).

Plot a sample tree from the "Expr" data; you may have a monstrous batch effect.

I hope the raw counts are really raw counts, not some transformed version. Your filter seems too stringent to me, I would filter on raw counts being greater than 5, not CPM (which are typically much less since each sample has many millions of reads).