I am using a for loop to compute the RPKM values for 11 samples, however, I am getting only the gene IDs from the last sample. I would like to know what commands I am missing in the lines below. If somebody knows a more efficient way to do this task, please tell me.
Thanks
Humberto
setwd("/Users/hmunozbarona/Documents/Normalization-R")
rm(list=ls(all=TRUE)) # remove all variables
files <- dir(pattern="*\.csv$")
RPKM = NULL
for (i in 1:11)
{
data<-read.csv(files[[i]])
id<- data['GeneID']
cnts<- data['ReadCount']
lens<- data['Length']
y <- DGEList(genes=data.frame(gene = id,Length=lens), counts=cnts)
RPKM[i]= data.frame(gene =id, counts=cnts, rpkm(y))
print(RPKM[i])
}
I wonder why all your samples are in separate files in the first place. Tools like featureCounts or htseq-counts collate the read counts into a matrix for you, so reading separate files of counts is generally unnecessary.