Question

DESeq LRT test and construction of Co-expression network

0

Entering edit mode

aishu.jp ▴ 30

@aishujp-15144

Last seen 5.4 years ago

I have a RNA seq data which I am trying to identify DEGs. Dataset contains Untreated - 2 replicates (time point 0hr) Treated - 4 replicates (2 replicates at 6hr and 2 replicates at 12 hr) I merged all the samples and tried to identify DEGs untreated vs treated as well as I compared the samples using LRT to find the changes at any time point.

After doing the LRT test, to identify the significant genes the cutoff used was adjP<0.05. Approximately 10000 genes were obtained after cutoff. Can you use rlog transformation as normalization on these 10000 genes for gene co-expression network construction. Does doing LRT test will make any change in construction of a co-expression network

LRT co-expression network deseq2 time course experiment network analysis • 2.6k views

ADD COMMENT • link updated 2.6 years ago by Jennifer Dehner • 0 • written 7.8 years ago by aishu.jp ▴ 30

score 0 · Answer 1 · 2018-03-20

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 14 months ago

United States

If you intend to WGCNA to construct the co-expression network, please read the WGCNA FAQ (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html). The biggest issue is your low sample number, 6 samples is too few to run WGCNA on. We don't recommend filtering genes by differential expression. The rlog transformation should work fine although I haven't tried it (I use variance stabilization).

ADD COMMENT • link 7.8 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks Peter.

I also prefer the vst() in DESeq2 now to rlog(), mostly because I have optimized it for speed, but also it can be more robust than rlog(), when the data contain outliers.

ADD REPLY • link 7.8 years ago Michael Love 43k

0

Entering edit mode

Thank you for helping me

ADD REPLY • link 7.8 years ago aishu.jp ▴ 30

0

Entering edit mode

Dr. Langfelder,

I have been doing WGCNA getting an assistance from a specialist. I have a transcriptomics data on drought tolerance in rice at reproductive-stage. I have 2 genotypes (tolerant and sensitive) and 2 conditions (drought and non-stress) with 4 replications each. basically a 2x 2 factorial experiment with 4 replications resulting in 16 samples.

I used DESEq2 for all exploratory and differential expression analysis. After the minimum pre-filtering suggested in DESEq2, I used the vst to normalize all the genes. I did, however, pass all these genes for a differential expression test (FDR<0.05), and extracted all the pairwise comparisons of interest. We used a oneway ANOVA test, which looks for changes anywhere across the several groups, so we had lots of different expression patterns across the groups.

Question, you mentioned that "Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail." I did not get just one or few genes and the plot to choose the soft thresholding power looked fine IIRC (increases sharply then plateaus at a low power value).

The problems they cite did not occur in your specific data set, thus it is fine to filter?

Below are some codes we used for the analysis

##Calculate the power needed for this data set for a signed analysis:

powers = c(c(1:10), seq(from = 12, to=30, by=2))

system.time( #This just outputs how long it took
sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5,
corFnc = cor, networkType = "signed hybrid")
)

#read in custom function to plot the sft values

source("plotSFT.R")

#x11(width = 9, height = 5)
jpeg("sft_plot_Vst.jpeg", width = 10, height = 5, units = "in", res = 300, type="cairo", quality = 100)
plotSFT(sft, lheight = 0.8)
dev.off()

## Use power = 6

##Run the entire WGCNA in one function:

system.time( #This just outputs how long it took
net <- blockwiseModules(datExpr,
power = 6,
maxBlockSize = 20000, #should be larger than num. of genes
networkType = "signed hybrid",
TOMType = "signed",
loadTOM = FALSE, #nothing to load the first time
saveTOMs = TRUE, #save the TOMS to re-load at a later time
saveTOMFileBase = "TOM_power6_default_flag-leaf", #base name for saved TOMs
corType = "pearson", #could try "bicor", but see help
deepSplit = 2, #intermediate level
minModuleSize = 20, #a module has to have this many genes
mergeCutHeight = 0.2, #level for merging; suggest 0.2 to start
numericLabels = T,
verbose = 3 )
)

save.image("WGCNA.RData")

I have two tissues (flag leaf and panciles) with 16 samples each. For each tissues we have at least 10 merged modules and at least 30 unmerged modules.

Please advise.

Sincerely,

Asher

ADD REPLY • link 7.8 years ago tarun2 • 0

0

Entering edit mode

Thanks for sharing this post

ADD REPLY • link 2.6 years ago Jennifer Dehner • 0