Hi All,
I was wondering if you could look over my R code for differential gene expression using EdgeR. I am looking to determine differential gene expression between wild type (WT) cells and knockout cells (KO). Three biological replicates were grown for each cell line and RNA was harvested. The paired end reads were mapped using STAR. Exon counts were obtained using feature counts. The exon counts were then used for the R code. Would this be sufficient to determine differential gene expression between WT and KO? Am I doing anything invalid with this code?
library("edgeR") library("gdata") library("heatmaply") library("ggplot2") library("genefilter") library("methylumi") setwd("/Users/jwlandry2/Desktop/RNA-Seq\ ESC\ Data\ Analysis/R_Studio_Data_Sets") counts=read.delim("BPTFKD_ESC_RNA_Seq_Counts_Final2.txt", header=T, row.names="Geneid") head(counts) group <- factor(c("WT","WT","WT","KO","KO","KO")) dge = DGEList(counts=counts,group=group) keep <- rowSums(cpm(dge)>2) >= 3 dge <- dge[keep, , keep.lib.sizes=FALSE] #Normalization y <- calcNormFactors(dge) design <- model.matrix(~group) y <- estimateDisp(y,design) y$common.dispersion plotMDS(y) plotBCV(y) #quasi-likelihood F-test fit <- glmQLFit(y, design) qlf <- glmQLFTest(fit, coef=2) topTags(qlf) #output to text file df <- qlf$table write.csv(df, 'qlf.csv') #Get normalizaed CPMs mtx <- cpm(y, log = TRUE, normalized.lib.sizes = TRUE) mtx_to_plot <- varFilter(mtx, var.cutoff= 0.95) #Correlation matrix IAC <- mtx_to_plot %>% cor(. , use = "pairwise.complete.obs", method = "pearson") heatmaply(IAC) %>% layout(margin=list(l=200,b=150)) #Dendrogram plot(hclust(as.dist(1-IAC), method="ward"))