Question: DESeq LRT test and construction of Co-expression network
0
gravatar for aishu.jp
13 months ago by
aishu.jp10
aishu.jp10 wrote:

I have a RNA seq data which I am trying to identify DEGs. Dataset contains Untreated - 2 replicates (time point 0hr) Treated - 4 replicates (2 replicates at 6hr and 2 replicates at 12 hr) I merged all the samples and tried to identify DEGs untreated vs treated as well as I compared the samples using LRT to find the changes at any time point.

After doing the LRT test, to identify the significant genes the cutoff used was adjP<0.05.  Approximately 10000 genes were obtained after cutoff. Can you use rlog transformation as normalization on these 10000 genes for gene co-expression network construction. Does doing LRT test will make any change in construction of a co-expression network

ADD COMMENTlink modified 13 months ago by Peter Langfelder1.8k • written 13 months ago by aishu.jp10
Answer: DESeq LRT test and construction of Co-expression network
0
gravatar for Peter Langfelder
13 months ago by
United States
Peter Langfelder1.8k wrote:

If you intend to WGCNA to construct the co-expression network, please read the WGCNA FAQ (https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html). The biggest issue is your low sample number, 6 samples is too few to run WGCNA on. We don't recommend filtering genes by differential expression. The rlog transformation should work fine although I haven't tried it (I use variance stabilization).

ADD COMMENTlink written 13 months ago by Peter Langfelder1.8k

Thanks Peter.

I also prefer the vst() in DESeq2 now to rlog(), mostly because I have optimized it for speed, but also it can be more robust than rlog(), when the data contain outliers.

ADD REPLYlink written 13 months ago by Michael Love23k

Thank you for helping me

ADD REPLYlink written 13 months ago by aishu.jp10

Dr. Langfelder,

I have been doing WGCNA getting an assistance from a specialist. I have a transcriptomics data on drought tolerance in rice at reproductive-stage. I have 2 genotypes (tolerant and sensitive) and 2 conditions (drought and non-stress) with 4 replications each. basically a 2x 2 factorial experiment with 4 replications resulting in 16 samples.

I used DESEq2 for all exploratory and differential expression analysis. After the minimum pre-filtering suggested in DESEq2, I used the vst to normalize all the genes. I did, however, pass all these genes for a differential expression test (FDR<0.05), and extracted all the pairwise comparisons of interest. We used a oneway ANOVA test, which looks for changes anywhere across the several groups, so we had lots of different expression patterns across the groups.

Question, you mentioned that "Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail." I did not get just one or few genes and the plot to choose the soft thresholding power looked fine IIRC (increases sharply then plateaus at a low power value). 

The problems they cite did not occur in your specific data set, thus it is fine to filter?

Below are some codes we used for the analysis

##Calculate the power needed for this data set for a signed analysis:

powers = c(c(1:10), seq(from = 12, to=30, by=2))

system.time( #This just outputs how long it took
  sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5,
                           corFnc = cor, networkType = "signed hybrid")
)

#read in custom function to plot the sft values

source("plotSFT.R")

#x11(width = 9, height = 5)
jpeg("sft_plot_Vst.jpeg", width = 10, height = 5, units = "in", res = 300, type="cairo", quality = 100)
plotSFT(sft, lheight = 0.8)
dev.off()

## Use power = 6

##Run the entire WGCNA in one function:

system.time( #This just outputs how long it took
  net <- blockwiseModules(datExpr,
                          power = 6,
                          maxBlockSize = 20000, #should be larger than num. of genes
                          networkType = "signed hybrid",
                          TOMType = "signed",
                          loadTOM = FALSE, #nothing to load the first time
                          saveTOMs = TRUE, #save the TOMS to re-load at a later time
                          saveTOMFileBase = "TOM_power6_default_flag-leaf", #base name for saved TOMs
                          corType = "pearson", #could try "bicor", but see help
                          deepSplit = 2, #intermediate level
                          minModuleSize = 20, #a module has to have this many genes
                          mergeCutHeight = 0.2, #level for merging; suggest 0.2 to start
                          numericLabels = T,
                          verbose = 3 )
)

save.image("WGCNA.RData")

I have two tissues (flag leaf and panciles) with 16 samples each. For each tissues we have at least 10 merged modules and at least 30 unmerged modules. 

Please advise.

Sincerely,

Asher 

ADD REPLYlink written 12 months ago by tarun20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 153 users visited in the last hour