Differential gene expression using edgeR and using a generalised linear model (GLM)
1
0
Entering edit mode
Last seen 12 months ago

Hi I am new to edgeR, and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use readDGE for reading the counts.txt files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I write code like below for the beginning :

library(edgeR)
directory="/home/ali/Desktop/SAMPLES1/"
files <-grep("counts.txt",list.files(directory),value = TRUE)


And the second fact is that I don't need to normalize my data. so is there anyone can guide me to which steps that I have to do? thanks in advance

edger GLM • 539 views
1
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 9 hours ago
The city by the bay

I don't need to normalize my data

That's a very bold claim. I don't know where you're getting this from.

so is there anyone can guide me to which steps that I have to do?

I assume you have read the user's guide?

library(edgeR)
edgeRUsersGuide()


Section 2.10 should get you started.

0
Entering edit mode

Thanks for your comment. That is what my supervisor told me that there is no need to normalize your data. I am trying to stimulate the process done in one article( description: Differential gene expression was done using edgeR and using a generalized linear model (GLM) DE genes were identified using a log2 fold change and likelihood ratios (LR) test in edgeR, significantly expressed genes had an FDR adjusted P value of < 5%). The user guide which I read was the same one that you have written. As written in the user guide, I have first to read the counts and make a DGElist. That is what I don't know how to do it and read all the 50 counts.txt files that I have.

1
Entering edit mode

Did your supervisor elaborate more on the lack of need to normalise your data? Somehow I suspect that your supervisor actually meant that you do not need to transform your count data to something else such as with the voom transformation from limma.

Anyway, DGEList object can be made as such:

counts <- read.table("counts.txt", sep = "\t") # use this command to read your tab-separated files
# if your matrix is a csv, then you can change the sep argument to "," for example.

# then create your DGEList object
d <- DGEList(
counts = your_count_matrix, # your count data matrix. Rows -> samples, Columns -> genes
samples = sample_metadata, # metadata of your count matrix's row, e.g. patient ID, treatment group
genes = genes_metadata # metadata of your count matrix's column, e.g. gene length, gene identifiers
)


Regarding the edgeR's LRT, I think a lot of people now recommend the usage of quasi-likelihood F test instead of the LRT. But if your aim is to replicate a study then yeah.

0
Entering edit mode

@Mikhael.manurung thanks for your advice and reply and sorry for my late answer. I didn't see your response, so I do myself some steps, but they were no efficient that much! I change the names of counts.txt files and move them all into one folder and put sample numbers such as 1.txt, 2.txt,... 50.txt. And use these scripts:

  group <- factor(c(rep("pro-vaccine",30),rep("pre_vaccine",20)))
library(edgeR)
getwd()
files=list.files(getwd())
x = readDGE(files,path = NULL, columns = c(1,2),header=FALSE)
x$counts<-x$counts[1:(nrow(x\$counts)-5)] #i try to delete last 5 rows that is unnecessary

0
Entering edit mode

Note that list.files will only, well, list all the files that you have within your working directory. It DOES NOT import the data into R. That is why the first thing that you should do is to properly import those data into R. For your data, you can easily use read.table.

By the way, it seems like your count data matrix is stored separately per samples. Do you have one matrix where it contains all the counts from all of your samples? Of course, you can also build one yourself, but it requires several lines of code.

0
Entering edit mode

Thanks, dear Mikhael, I build the table not the matrix by bash code and put Id header which are names of my sample on top of them in excel and save as CSV file, and then I will do what you recommended. I hope it works, and I proceed to the next steps.

0
Entering edit mode

If it is a csv file then you can use read.csv. Good luck!

0
Entering edit mode

yes i get it, thank you so much!