Hi I am new to edgeR, and I don't have a basis so much in R programming since I used to work with python for my analysis. The counts that achieved from ht-seq for different samples stored in separate files and different folders; files names are all "counts.txt" and folders named as sample names. Counts.txt files contain two columns one for gene names and one for counts. As I learned from the documentary, I have to use readDGE for reading the counts.txt files in each folder. I want to do differential gene expression analysis by using a GLM. Although I have to identify DE genes by using log2 fold change and likelihood (LR) test in edgeR. I write code like below for the beginning :
library(edgeR)
directory="/home/ali/Desktop/SAMPLES1/"
files <-grep("counts.txt",list.files(directory),value = TRUE)
x <-readDGE(files,columns = c(1,2),header=FALSE)
And the second fact is that I don't need to normalize my data. so is there anyone can guide me to which steps that I have to do? thanks in advance
Thanks for your comment. That is what my supervisor told me that there is no need to normalize your data. I am trying to stimulate the process done in one article( description: Differential gene expression was done using edgeR and using a generalized linear model (GLM) DE genes were identified using a log2 fold change and likelihood ratios (LR) test in edgeR, significantly expressed genes had an FDR adjusted P value of < 5%). The user guide which I read was the same one that you have written. As written in the user guide, I have first to read the counts and make a DGElist. That is what I don't know how to do it and read all the 50 counts.txt files that I have.
Did your supervisor elaborate more on the lack of need to normalise your data? Somehow I suspect that your supervisor actually meant that you do not need to transform your count data to something else such as with the
voom
transformation fromlimma
.Anyway,
DGEList
object can be made as such:Regarding the
edgeR
's LRT, I think a lot of people now recommend the usage of quasi-likelihood F test instead of the LRT. But if your aim is to replicate a study then yeah.@Mikhael.manurung thanks for your advice and reply and sorry for my late answer. I didn't see your response, so I do myself some steps, but they were no efficient that much! I change the names of counts.txt files and move them all into one folder and put sample numbers such as 1.txt, 2.txt,... 50.txt. And use these scripts:
Here is my folder structure for 30 of them and the other 20 ,my counts.txt format. ,and the changes i made and put them in another folder with changed names as i mentioned.
Note that
list.files
will only, well, list all the files that you have within your working directory. It DOES NOT import the data into R. That is why the first thing that you should do is to properly import those data into R. For your data, you can easily useread.table
.By the way, it seems like your count data matrix is stored separately per samples. Do you have one matrix where it contains all the counts from all of your samples? Of course, you can also build one yourself, but it requires several lines of code.
Thanks, dear Mikhael, I build the table not the matrix by bash code and put Id header which are names of my sample on top of them in excel and save as CSV file, and then I will do what you recommended. I hope it works, and I proceed to the next steps.
If it is a csv file then you can use
read.csv
. Good luck!yes i get it, thank you so much!