Question

Differential gene expression using edgeR and using a generalised linear model (GLM)

0

Entering edit mode

alihakimzadeh73 • 0

@alihakimzadeh73-20840

Last seen 3.8 years ago

Hi I am new to edgeR, and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use readDGE for reading the counts.txt files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I write code like below for the beginning :

library(edgeR)
directory="/home/ali/Desktop/SAMPLES1/"
files <-grep("counts.txt",list.files(directory),value = TRUE)
x <-readDGE(files,columns = c(1,2),header=FALSE)

And the second fact is that I don't need to normalize my data. so is there anyone can guide me to which steps that I have to do? thanks in advance

edger GLM • 1.6k views

ADD COMMENT • link updated 4.9 years ago by Aaron Lun ★ 28k • written 4.9 years ago by alihakimzadeh73 • 0

score 1 · Answer 1 · 2019-05-21

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 minutes ago

The city by the bay

I don't need to normalize my data

That's a very bold claim. I don't know where you're getting this from.

so is there anyone can guide me to which steps that I have to do?

I assume you have read the user's guide?

library(edgeR)
edgeRUsersGuide()

Section 2.10 should get you started.

ADD COMMENT • link 4.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for your comment. That is what my supervisor told me that there is no need to normalize your data. I am trying to stimulate the process done in one article( description: Differential gene expression was done using edgeR and using a generalized linear model (GLM) DE genes were identified using a log2 fold change and likelihood ratios (LR) test in edgeR, significantly expressed genes had an FDR adjusted P value of < 5%). The user guide which I read was the same one that you have written. As written in the user guide, I have first to read the counts and make a DGElist. That is what I don't know how to do it and read all the 50 counts.txt files that I have.

ADD REPLY • link 4.9 years ago alihakimzadeh73 • 0

1

Entering edit mode

Did your supervisor elaborate more on the lack of need to normalise your data? Somehow I suspect that your supervisor actually meant that you do not need to transform your count data to something else such as with the voom transformation from limma.

Anyway, DGEList object can be made as such:

counts <- read.table("counts.txt", sep = "\t") # use this command to read your tab-separated files
# if your matrix is a csv, then you can change the sep argument to "," for example.

# then create your DGEList object
d <- DGEList(
  counts = your_count_matrix, # your count data matrix. Rows -> samples, Columns -> genes
  samples = sample_metadata, # metadata of your count matrix's row, e.g. patient ID, treatment group
  genes = genes_metadata # metadata of your count matrix's column, e.g. gene length, gene identifiers
)

Regarding the edgeR's LRT, I think a lot of people now recommend the usage of quasi-likelihood F test instead of the LRT. But if your aim is to replicate a study then yeah.

ADD REPLY • link 4.9 years ago mikhael.manurung ▴ 270

0

Entering edit mode

@Mikhael.manurung thanks for your advice and reply and sorry for my late answer. I didn't see your response, so I do myself some steps, but they were no efficient that much! I change the names of counts.txt files and move them all into one folder and put sample numbers such as 1.txt, 2.txt,... 50.txt. And use these scripts:

  group <- factor(c(rep("pro-vaccine",30),rep("pre_vaccine",20)))
    library(edgeR)
    getwd()
    files=list.files(getwd())
    x = readDGE(files,path = NULL, columns = c(1,2),header=FALSE)
    x$counts<-x$counts[1:(nrow(x$counts)-5)] #i try to delete last 5 rows that is unnecessary

Here is my folder structure for 30 of them and the other 20 ,my counts.txt format. ,and the changes i made and put them in another folder with changed names as i mentioned.

ADD REPLY • link 4.9 years ago alihakimzadeh73 • 0

0

Entering edit mode

Note that list.files will only, well, list all the files that you have within your working directory. It DOES NOT import the data into R. That is why the first thing that you should do is to properly import those data into R. For your data, you can easily use read.table.

By the way, it seems like your count data matrix is stored separately per samples. Do you have one matrix where it contains all the counts from all of your samples? Of course, you can also build one yourself, but it requires several lines of code.

ADD REPLY • link 4.9 years ago mikhael.manurung ▴ 270

0

Entering edit mode

Thanks, dear Mikhael, I build the table not the matrix by bash code and put Id header which are names of my sample on top of them in excel and save as CSV file, and then I will do what you recommended. I hope it works, and I proceed to the next steps.

ADD REPLY • link 4.9 years ago alihakimzadeh73 • 0

0

Entering edit mode

If it is a csv file then you can use read.csv. Good luck!

ADD REPLY • link 4.9 years ago mikhael.manurung ▴ 270

0

Entering edit mode

yes i get it, thank you so much!

ADD REPLY • link 4.9 years ago alihakimzadeh73 • 0