Question: Differential gene expression using edgeR and using a generalised linear model (GLM)
0
gravatar for alihakimzadeh73
5 months ago by
alihakimzadeh730 wrote:

Hi I am new to edgeR, and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use readDGE for reading the counts.txt files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I write code like below for the beginning :

library(edgeR)
directory="/home/ali/Desktop/SAMPLES1/"
files <-grep("counts.txt",list.files(directory),value = TRUE)
x <-readDGE(files,columns = c(1,2),header=FALSE)

And the second fact is that I don't need to normalize my data. so is there anyone can guide me to which steps that I have to do? thanks in advance

edger glm • 227 views
ADD COMMENTlink modified 5 months ago by Aaron Lun25k • written 5 months ago by alihakimzadeh730
Answer: Differential gene expression using edgeR and using a generalised linear model (
1
gravatar for Aaron Lun
5 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I don't need to normalize my data

That's a very bold claim. I don't know where you're getting this from.

so is there anyone can guide me to which steps that I have to do?

I assume you have read the user's guide?

library(edgeR)
edgeRUsersGuide()

Section 2.10 should get you started.

ADD COMMENTlink written 5 months ago by Aaron Lun25k

Thanks for your comment. That is what my supervisor told me that there is no need to normalize your data. I am trying to stimulate the process done in one article( description: Differential gene expression was done using edgeR and using a generalized linear model (GLM) DE genes were identified using a log2 fold change and likelihood ratios (LR) test in edgeR, significantly expressed genes had an FDR adjusted P value of < 5%). The user guide which I read was the same one that you have written. As written in the user guide, I have first to read the counts and make a DGElist. That is what I don't know how to do it and read all the 50 counts.txt files that I have.

ADD REPLYlink modified 5 months ago • written 5 months ago by alihakimzadeh730
1

Did your supervisor elaborate more on the lack of need to normalise your data? Somehow I suspect that your supervisor actually meant that you do not need to transform your count data to something else such as with the voom transformation from limma.

Anyway, DGEList object can be made as such:

counts <- read.table("counts.txt", sep = "\t") # use this command to read your tab-separated files
# if your matrix is a csv, then you can change the sep argument to "," for example.

# then create your DGEList object
d <- DGEList(
  counts = your_count_matrix, # your count data matrix. Rows -> samples, Columns -> genes
  samples = sample_metadata, # metadata of your count matrix's row, e.g. patient ID, treatment group
  genes = genes_metadata # metadata of your count matrix's column, e.g. gene length, gene identifiers
)

Regarding the edgeR's LRT, I think a lot of people now recommend the usage of quasi-likelihood F test instead of the LRT. But if your aim is to replicate a study then yeah.

ADD REPLYlink written 5 months ago by mikhael.manurung190

@Mikhael.manurung thanks for your advice and reply and sorry for my late answer. I didn't see your response, so I do myself some steps, but they were no efficient that much! I change the names of counts.txt files and move them all into one folder and put sample numbers such as 1.txt, 2.txt,... 50.txt. And use these scripts:

  group <- factor(c(rep("pro-vaccine",30),rep("pre_vaccine",20)))
    library(edgeR)
    getwd()
    files=list.files(getwd())
    x = readDGE(files,path = NULL, columns = c(1,2),header=FALSE)
    x$counts<-x$counts[1:(nrow(x$counts)-5)] #i try to delete last 5 rows that is unnecessary

Here is my folder structure for 30 of them and the other 20 ,my counts.txt format. ,and the changes i made and put them in another folder with changed names as i mentioned.

ADD REPLYlink modified 4 months ago • written 4 months ago by alihakimzadeh730

Note that list.files will only, well, list all the files that you have within your working directory. It DOES NOT import the data into R. That is why the first thing that you should do is to properly import those data into R. For your data, you can easily use read.table.

By the way, it seems like your count data matrix is stored separately per samples. Do you have one matrix where it contains all the counts from all of your samples? Of course, you can also build one yourself, but it requires several lines of code.

ADD REPLYlink written 4 months ago by mikhael.manurung190

Thanks, dear Mikhael, I build the table not the matrix by bash code and put Id header which are names of my sample on top of them in excel and save as CSV file, and then I will do what you recommended. I hope it works, and I proceed to the next steps.

ADD REPLYlink modified 4 months ago • written 4 months ago by alihakimzadeh730

If it is a csv file then you can use read.csv. Good luck!

ADD REPLYlink written 4 months ago by mikhael.manurung190

yes i get it, thank you so much!

ADD REPLYlink written 4 months ago by alihakimzadeh730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 375 users visited in the last hour