Dear all, and Aaron,
I am using the following R code for the paired design analysis of a 6*2 RNA-seq samples : please could you let me know if there are any changes/modifications you would suggest . thanks !
library("limma")
library("edgeR")
### reading the expression dataset
eset <- read.delim("mm10.strand.both.count.exons.condense.genes.NOADJ",row.names="Symbol")
### reading the library sizes
libsize <- c(28182007, 28599704, 26386209, 28069058, 26361374, 27108450, 27097918, 12155292, 27124214, 26774125, 26241068, 26296757)
### setting up the groups and the subjects
group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))
### setting up the design and the contrast matrix
design <- model.matrix(~0+group+subject)
contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)
### filtering the genes based on CPM :
y <- DGEList(counts=eset,group=group)
keep <- rowSums(cpm(y, lib.size=libsize)>1) >= 3
y <- y[keep,]
y$samples$lib.size <- colSums(y$counts)
### computing the normalization factors :
y <- calcNormFactors(y)
### using the VOOM transformation :
v <- voom(y,design,plot=TRUE)
### doing the LINEAR FIT in LIMMA :
fit <- lmFit(v, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
### obtaining and writing the results :
results <- topTable(fit2, coef=1, adjust="fdr", number=24073)
write.table(results, file="results", sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)
Can I ask why you are inputting the libsizes separately? Are these not the same as the column sums of eset?
Hi Gordon, thanks for your email. The reason for inputting the libsizes separately is the following :
the list of genes for DGE analysis contain only protein-coding and long non-coding RNAs as annotated in RefSeq, but not the eRNAs (enhancer RNAs) that could be also captured by RNA-seq.
So I though to keep the total library sizes, instead of using the library sizes only based on protein-coding and long non-coding RNAs. Although of course, the second option is a good choice too.
Shall you have any comments about how to improve or modify the code, please let me know. Thanks ;) !