WGCNA: cleaning for sample outliers

0

Entering edit mode

GENOMIC_region • 0

@genomic_region-13050

Last seen 4.0 years ago

Hi there,

I'm working with 363 samples with 10K genes. My workflow is: load data, transpose, get gene names. use hclust. Plot data once and see where abline has to be drawn. I draw plot for clustering with eye-balled abline.

I'm lost with cutheight and min size while cleaning samples. Below are code and my doubts:

exprs_data<-read.table("complete_genes_mapped",header=TRUE)
data_exprs.cleaned<-as.data.frame(t(exprs_data[, -c(1)])); #remove gene column

#add row names, and col names
names(data_exprs.cleaned) = exprs_data$gene
rownames(data_exprs.cleaned) = names(exprs_data)[-c(1)]

#check data for excessive missing values and identi_cation of outlier microarray
gsg = goodSamplesGenes(data_exprs.cleaned, verbose = 3);

#--everything OK with mapped genes
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(data_exprs.cleaned)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(data_exprs.cleaned)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
data_exprs.cleaned= data_exprs.cleaned[gsg$goodSamples, gsg$goodGenes]
}

#Check outliers
sampleTree = hclust(dist(data_exprs.cleaned ), method = "average"); #do clustering 

# Plot the sample tree: 
# The user should change the dimensions if the window is too large or too small.

CairoJPEG("sample_outliers_tree.jpeg",width=1200,height=900)
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,cex.axis = 1.5, cex.main = 2)
abline(h=90, col = "red")
dev.off()

But now comes the foggy part:

labels_min10 = cutreeStatic(sampleTree, cutHeight = 90,minSize=10)
table(labels_min10)

labels
  0   1   2   3
  3 298  34  28

labels_def = cutreeStatic(sampleTree, cutHeight = 90) #min size is 50
 table(labels_def)
labels
  0   1
 65 298

I lose 65 samples (throwing samples with label as 0) with cutheight 90 which is ~20% of input sample size with min size as 50. Don't know what to do.?

Also, I cannot decide on how the min size of cluster works here. Following are my questions and doubts:

Does it mean I drop samples that have cluster size less than N (50,10) below cutHeight?
What do labels 2 and 3 tell for labels_min10 ?

Tutorial link: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-01-dataInput.pdf

microarray gene network WGCNA • 2.9k views

ADD COMMENT • link updated 6.1 years ago by Bioconductor Community 0 • written 6.1 years ago by GENOMIC_region • 0

1

Entering edit mode

1. Yes, the threshold is to remove samples that are "outliers", that too few follow the same pattern to be reliable

2.Each label is a group of samples, so there are two other groups (besides group 1) that behave differently

ADD REPLY • link 6.1 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Hi Lluis,

Thank you very much. That helps. :)

ADD REPLY • link 6.1 years ago GENOMIC_region • 0

0

Entering edit mode

Is it advised to keep samples that group besides in cluster 1?

ADD REPLY • link 5.0 years ago GENOMIC_region • 0

Login before adding your answer.

Similar Posts

WGCNA module trait relationship issue •

updated 4.7 years ago by Peter Langfelder ★ 3.0k • written 4.7 years ago by zen • 0

Hello, I am doing network analysis using WGCNA. I have successfully created the modules but got an error after relating it to the phenot…

novice: building gene co-expression network using RNA-Seq data •

updated 6.8 years ago by Peter Langfelder ★ 3.0k • written 6.9 years ago by 7kemZmani ▴ 10

Hello! Background: I'm new to bioinformatics (CS background) and I was assigned the task of implementing a procedure for metabolic network…

while doing co-expression network analysis, I am getting an error •

3.9 years ago Rishav ▴ 20

Hello everyone, I am trying to do a co-expression network analysis of my RNA seq data in R, but I am getting an error. Please help me to u…

Outlier Removal WGCNA •

updated 3.1 years ago by Vincent J. Carey, Jr. 6.7k • written 3.2 years ago by jms2520 ▴ 10

Hello, I am running WGCNA with 8 datasets and trying to do consensus analysis. When working through the WGCNA tutorials and looking at my…

Error in TOMplot (WGCNA) •

8.4 years ago Abhishek Singh ▴ 20

Hi, I am getting this error while running TOMplot. <pre> TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all gene…

how to fix error: object 'gsg' not found ? •

updated 3.8 years ago by Kevin Blighe ★ 4.0k • written 3.8 years ago by megha • 0

''' getwd(); workingDir = "."; setwd(workingDir); library(WGCNA); options(stringsAsFactors = FALSE); …

WGCNA Soft Power •

updated 13 months ago by ATpoint ★ 4.7k • written 13 months ago by Ömür Koray • 0

Hello, As a newbie i was trying to do WGCNA for my transcriptomics data. I used vst normalization and tried to choose soft power. Even i…

WGCNA HELP - Error: REAL() can only be applied to a 'numeric', not a 'integer' •

updated 9.1 years ago by Peter Langfelder ★ 3.0k • written 9.1 years ago by sm15766 ▴ 30

Hi,   I've been analysing gene expression networks from my RNAseq dataset using the WGCNA software, using the following code on a si…

WGCNA sample number reduction •

updated 5.0 years ago by shepherl 4.1k • written 5.0 years ago by anjanaram1 ▴ 10

Hi, I have changed my cutoff height to 7. But Im still getting the full list of samples. How do I reduce the samples before I assign the…

"Degree values are the same for all nodes in a network imported from WGCNA to Cytoscape using RCy3. Seeking insights and potential causes." •

21 months ago Ortega-C ▴ 10

I got the network with the WGCNA package and then imported the network into Cytoscape. I applied the analyzer network to obtain the "degree…

'Error in summary(lm1)$coefficients[2, 1] : subscript out of bounds' while running pickSoftThreshold in WGCNA •

6.9 years ago aditi ▴ 20

I have successfully run WGCNA on an expression matrix. WGCNA has also worked for other sets of expression data. I am now trying to run WGCN…

Error in "Relating the consensus modules to female set-specific modules" •

updated 2.7 years ago by annamariabugaj ▴ 10 • written 4.9 years ago by bahmanik@msu.edu ▴ 60

Hi, I'm new in WGCNA, and trying to learn. I'm using the second tutorial (female vs male), but in the third section of the tutorial (Rela…

Analysis of 2*4 factorial design •

updated 3 months ago by Gordon Smyth 52k • written 3 months ago by Sudipta • 0

Hi, first I want to thank you for creating limma package. I have a 2(Factor A)*4(Factor B) experimental design. I am trying to ask thre…

WGCNA problem •

updated 22 months ago by Ortega-C ▴ 10 • written 22 months ago by mortezaali • 0

Hello. I have problem to choose soft power for WGCNA. please guide me!! I have 501 sample (cancer=481 , normal=39) that download from TCGA…

WGCNA: how to change module color •

updated 6.4 years ago by Peter Langfelder ★ 3.0k • written 6.4 years ago by dcxy18 ▴ 20

Hi all, I am very new in the R world. I followed the UCLA "Step-by-step network construction and module detection" WGCNA tutorial, an…

Error with hclust •

updated 3.2 years ago by ATpoint ★ 4.7k • written 3.2 years ago by devbhavin20 • 0

I'm new to the world of R, and have been facing this issue and nothing over the internet helped me solve it so am noting it down here. T…

REAL() can only be applied to a 'numeric', not a 'integer' •

3.2 years ago • updated 3.1 years ago devbhavin20 • 0

I'm sharing my entire code which I'm running and the last part is generating some errors which I am not able to understand. Any help and su…

WGCNA with distance matrix only from newick data. •

7.3 years ago yifangt ▴ 20

Hello group! While I am trying WGCNA package to get modules of the candidate genes without expression data but distance matrix derived fro…

WGCNA error during network construction •

3.4 years ago gitanjali • 0

I am performing WGCNA analysis on my RNAseq dataset for the first time and getting this error message: ```r > net = blockwis…

How to create consensus network from SampledBlockwiseModules in WGCNA? •

2.6 years ago as8020 • 0

Hi, I have run the SampledBlockwiseModules function for WGCNA in order to subsample my data and create modules from it iteratively. I then …

Different results from blockwiseModules and blockwiseConsensusModules individual TOM •

14 months ago Levi • 0

I am trying to construct a consensus WGCNA network from RNAseq data on three brain regions in two lines of zebrafish. My main hitch is with…

Loading Similar Posts

Traffic: 756 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6