Hello,
I am performing RNA-Seq data analysis to find differentially expressed genes in my data. The data is from inoculated leaves of Barley. The data has 5 mutants and 6 time-points. Each of the mutants and each time-point has 3 technical replicates brining up the total number of samples to 90. There are roughly 40,000 genes in Barley that I wish to probe for differential expression. I have obtained the gene read counts using RSEM. I have some questions about the analysis.
1) My goal is to carry out differential gene analysis, gene expression clustering and construct gene regulatory network from the expression data. For the DGE analysis, I want to find genes which are differentially expressed within each mutant across different time-points. Also I wish to find genes which are differentially expressed among different mutants at a given time-point. I have used the function rsem-generate-data-matrix to combine the counts from the 90 samples. I have read this count data in R. But I cannot understand how I will obtain the desired differential expression p-values and log2FC. Here is the code I have written.
samples <- read.csv("counts.csv", header=TRUE,row.names=1) # Selecting genes by counts counts = subset(samples, rowSums(samples > 1) >= 3) time=c(0, 16, 20, 24, 32, 48) gen=c("bln1", "dm", "mla6", "rar3", "wt") cols=data.frame(genotype=rep(gen, each=3*6), timepoint=rep(rep(time, each=3),5), replication=rep(c(1,2,3),30)) cols$timepoint = as.factor(cols$timepoint) cols$replication = as.factor(cols$replication) rownames(cols) = colnames(counts) dds=DESeqDataSetFromMatrix(countData = counts, colData = cols, design = ~ genotype * timepoint * replication) dds=DESeq(dds) results(dds) # Extracting and writing normalized counts temp = counts(dds, normalized=TRUE) norm_counts = data.frame(gene = rownames(temp), temp, row.names = NULL)
I think I have to alter the design for each kind of analysis, but I am unsure how to do so. Does the DESeq() function automatically perform normalization? In the code above, I have created the variable "dds" using the "counts" which is not normalized. Do I have to perform normalization and then create the "dds" variable? Here is the link for the counts file.
2) For gene expression clustering and gene regulatory network should we perform rst or vst? Also should the transformation be performed on the raw count or on the normalized counts?
3) I am also not quite sure what conclusion can be drawn from the MA plot.
4) What is the need for shrinkage?
Thank you.
For many of your questions, the answer can be found in the vignette. Please give details n what you have already read, and wherev you had difficulty following, so that we don't have to start at zero with explanations.