I have a RNA seq data which I am trying to identify DEGs. Dataset contains Untreated - 2 replicates (time point 0hr) Treated - 4 replicates (2 replicates at 6hr and 2 replicates at 12 hr) I merged all the samples and tried to identify DEGs untreated vs treated as well as I compared the samples using LRT to find the changes at any time point.
After doing the LRT test, to identify the significant genes the cutoff used was adjP<0.05. Approximately 10000 genes were obtained after cutoff. Can you use rlog transformation as normalization on these 10000 genes for gene co-expression network construction. Does doing LRT test will make any change in construction of a co-expression network
Thanks Peter.
I also prefer the vst() in DESeq2 now to rlog(), mostly because I have optimized it for speed, but also it can be more robust than rlog(), when the data contain outliers.
Thank you for helping me
Dr. Langfelder,
I have been doing WGCNA getting an assistance from a specialist. I have a transcriptomics data on drought tolerance in rice at reproductive-stage. I have 2 genotypes (tolerant and sensitive) and 2 conditions (drought and non-stress) with 4 replications each. basically a 2x 2 factorial experiment with 4 replications resulting in 16 samples.
I used DESEq2 for all exploratory and differential expression analysis. After the minimum pre-filtering suggested in DESEq2, I used the vst to normalize all the genes. I did, however, pass all these genes for a differential expression test (FDR<0.05), and extracted all the pairwise comparisons of interest. We used a oneway ANOVA test, which looks for changes anywhere across the several groups, so we had lots of different expression patterns across the groups.
Question, you mentioned that "Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail." I did not get just one or few genes and the plot to choose the soft thresholding power looked fine IIRC (increases sharply then plateaus at a low power value).
The problems they cite did not occur in your specific data set, thus it is fine to filter?
Below are some codes we used for the analysis
##Calculate the power needed for this data set for a signed analysis:
powers = c(c(1:10), seq(from = 12, to=30, by=2))
system.time( #This just outputs how long it took
sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5,
corFnc = cor, networkType = "signed hybrid")
)
#read in custom function to plot the sft values
source("plotSFT.R")
#x11(width = 9, height = 5)
jpeg("sft_plot_Vst.jpeg", width = 10, height = 5, units = "in", res = 300, type="cairo", quality = 100)
plotSFT(sft, lheight = 0.8)
dev.off()
## Use power = 6
##Run the entire WGCNA in one function:
system.time( #This just outputs how long it took
net <- blockwiseModules(datExpr,
power = 6,
maxBlockSize = 20000, #should be larger than num. of genes
networkType = "signed hybrid",
TOMType = "signed",
loadTOM = FALSE, #nothing to load the first time
saveTOMs = TRUE, #save the TOMS to re-load at a later time
saveTOMFileBase = "TOM_power6_default_flag-leaf", #base name for saved TOMs
corType = "pearson", #could try "bicor", but see help
deepSplit = 2, #intermediate level
minModuleSize = 20, #a module has to have this many genes
mergeCutHeight = 0.2, #level for merging; suggest 0.2 to start
numericLabels = T,
verbose = 3 )
)
save.image("WGCNA.RData")
I have two tissues (flag leaf and panciles) with 16 samples each. For each tissues we have at least 10 merged modules and at least 30 unmerged modules.
Please advise.
Sincerely,
Asher
Thanks for sharing this post