Question

Finding DEGs by using edgeR

0

Entering edit mode

prity6459 • 0

@84d63c4c

Last seen 3 months ago

France

Hi,

Time to tackle an amateur's problem! Ha ha! I have 6 samples/ groups with 3 replicates (In total 17; one sample contains 2 replicates). the sample names are Mf, Bf, Ef, Mr, Br, and Er. I want to get DEGs (differentially expressed genes) between Mr-Mf, Br-Bf, Er-Ef.

For doing that, I subsetted my data keeping either Mr-Mf, Br-Bf, or Er-Ef. And then I got DEGs for three different contrasts. But subsetting the data has an impact on dispersion estimation.

Now I would like not to subset my data (keeping all my data for dispersion estimation). And after estimating dispersion, I want to have DEGs for those three different contrasts. The order I want follow is - after importing my data > dispersion estimation > getting DEGs for three different contrasts in three different file. But I can not do that.

Could you please help me modifying my script so that I can follow my oder of analysis? Note: I do not want to use normalization steps from edgeR, since I already normalized my data.

Thank you in advance for dedicating your time to resolve me problem!

Code should be placed in three backticks as shown below

library(limma)

data <- read.csv("normalized_file_for_mesophyll_&_epidermis.csv", header = TRUE, row.names = 1)

# Since the data is already normalized, there is no specific step for normalization

# Extract Mf and Mr samples. 
Mf_samples <- colnames(data)[grep("^Mf", colnames(data))]
Mr_samples <- colnames(data)[grep("^Mr", colnames(data))]

# Subset data to keep only Mf and Mr samples. But now I do not want to subset my data. I want to have all the data.
subset_data <- data[, c(Mf_samples, Mr_samples)]

# Create DGEList object
dge_data <- DGEList(counts = subset_data)

# Assign group information to samples
sample_groups <- factor(c(rep("Mf", length(Mf_samples)), rep("Mr", length(Mr_samples))))

# Perform normalization
# dge_data <- calcNormFactors(dge_data)

# Perform differential expression analysis
design_matrix <- model.matrix(~0 + sample_groups) #to make a design matrix first
colnames(design_matrix) <- gsub("^sample_groups", "", colnames(design_matrix)) 

# To creates a contrast matrix (cm) comparing the expression levels between Mr and Mf
cm <- makeContrasts(Mr-Mf, levels=design_matrix) 

dge_data <- estimateDisp(dge_data, design = design_matrix)
dge_data <- estimateDisp(dge_data, design_matrix, robust=T)

# To create biological coefficient of variation
plotBCV(dge_data)
fit.dge <- glmFit(dge_data, design_matrix)

# perform gene selection
max.qval <- 0.05 
min.FC <- log2(2) 
min.count <- 96 
comparison <- "Mr - Mf" 

lrt <- glmLRT(fit.dge, contrast=cm[,comparison])
sel.r <- topTags(lrt, n=nrow(dge_data$counts))
med.expr <- apply(dge_data$counts[rownames(sel.r),],1,median) 
sel.r <- sel.r[sel.r$table$FDR<max.qval & abs(sel.r$table$logFC)>min.FC & med.expr>min.count,] 
sel.rows <- rownames(sel.r) #to get name of the genes that pass the criteria


# To get the FDR, logFC, logCPM of DEGs
DEGs <- sel.r$table[sel.rows, ]

# Save the results to a file
write.csv(DEGs, "DEGs_of_mesophyll.csv")

subsettingdata edgeR designmatrix • 339 views

ADD COMMENT • link updated 3 months ago by Gordon Smyth 51k • written 3 months ago by prity6459 • 0

score 0 · Answer 1 · 2024-04-19

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 46 minutes ago

WEHI, Melbourne, Australia

I am not clear what you mean that you have already normalized your data. How exactly did you normalize the data?

The best way to do an edgeR analysis is to follow the examples in the workflows and User Guide case studies.

ADD COMMENT • link 3 months ago Gordon Smyth 51k

0

Entering edit mode

Hello Gordon,

Thanks for your response. By normalization, I meant that I have already removed lowly expressed genes from my count matrix. I was trying to follow one of the user guide case studies, but I was confused about how can I make my design matrix and DGEList while not using a subset of my data.

ADD REPLY • link 3 months ago prity6459 • 0

0

Entering edit mode

Filtering (removing lowly expressed genes) is separate to normalization.

It isn't clear why not subsetting would cause any difficulty. It would be best for you to try to follow the documentation.

ADD REPLY • link 3 months ago Gordon Smyth 51k