Finding DEGs by using edgeR
Entering edit mode
prity6459 • 0
Last seen 29 days ago


Time to tackle an amateur's problem! Ha ha! I have 6 samples/ groups with 3 replicates (In total 17; one sample contains 2 replicates). the sample names are Mf, Bf, Ef, Mr, Br, and Er. I want to get DEGs (differentially expressed genes) between Mr-Mf, Br-Bf, Er-Ef.

For doing that, I subsetted my data keeping either Mr-Mf, Br-Bf, or Er-Ef. And then I got DEGs for three different contrasts. But subsetting the data has an impact on dispersion estimation.

Now I would like not to subset my data (keeping all my data for dispersion estimation). And after estimating dispersion, I want to have DEGs for those three different contrasts. The order I want follow is - after importing my data > dispersion estimation > getting DEGs for three different contrasts in three different file. But I can not do that.

Could you please help me modifying my script so that I can follow my oder of analysis? Note: I do not want to use normalization steps from edgeR, since I already normalized my data.

Thank you in advance for dedicating your time to resolve me problem!

Code should be placed in three backticks as shown below


data <- read.csv("normalized_file_for_mesophyll_&_epidermis.csv", header = TRUE, row.names = 1)

# Since the data is already normalized, there is no specific step for normalization

# Extract Mf and Mr samples. 
Mf_samples <- colnames(data)[grep("^Mf", colnames(data))]
Mr_samples <- colnames(data)[grep("^Mr", colnames(data))]

# Subset data to keep only Mf and Mr samples. But now I do not want to subset my data. I want to have all the data.
subset_data <- data[, c(Mf_samples, Mr_samples)]

# Create DGEList object
dge_data <- DGEList(counts = subset_data)

# Assign group information to samples
sample_groups <- factor(c(rep("Mf", length(Mf_samples)), rep("Mr", length(Mr_samples))))

# Perform normalization
# dge_data <- calcNormFactors(dge_data)

# Perform differential expression analysis
design_matrix <- model.matrix(~0 + sample_groups) #to make a design matrix first
colnames(design_matrix) <- gsub("^sample_groups", "", colnames(design_matrix)) 

# To creates a contrast matrix (cm) comparing the expression levels between Mr and Mf
cm <- makeContrasts(Mr-Mf, levels=design_matrix) 

dge_data <- estimateDisp(dge_data, design = design_matrix)
dge_data <- estimateDisp(dge_data, design_matrix, robust=T)

# To create biological coefficient of variation
fit.dge <- glmFit(dge_data, design_matrix)

# perform gene selection
max.qval <- 0.05 
min.FC <- log2(2) 
min.count <- 96 
comparison <- "Mr - Mf" 

lrt <- glmLRT(fit.dge, contrast=cm[,comparison])
sel.r <- topTags(lrt, n=nrow(dge_data$counts))
med.expr <- apply(dge_data$counts[rownames(sel.r),],1,median) 
sel.r <- sel.r[sel.r$table$FDR<max.qval & abs(sel.r$table$logFC)>min.FC & med.expr>min.count,] 
sel.rows <- rownames(sel.r) #to get name of the genes that pass the criteria

# To get the FDR, logFC, logCPM of DEGs
DEGs <- sel.r$table[sel.rows, ]

# Save the results to a file
write.csv(DEGs, "DEGs_of_mesophyll.csv")
subsettingdata edgeR designmatrix • 220 views
Entering edit mode
Last seen 9 hours ago
WEHI, Melbourne, Australia

I am not clear what you mean that you have already normalized your data. How exactly did you normalize the data?

The best way to do an edgeR analysis is to follow the examples in the workflows and User Guide case studies.

Entering edit mode

Hello Gordon,

Thanks for your response. By normalization, I meant that I have already removed lowly expressed genes from my count matrix. I was trying to follow one of the user guide case studies, but I was confused about how can I make my design matrix and DGEList while not using a subset of my data.

Entering edit mode

Filtering (removing lowly expressed genes) is separate to normalization.

It isn't clear why not subsetting would cause any difficulty. It would be best for you to try to follow the documentation.


Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6