limma code for RNA-seq analysis (paired design)
3
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear all,

we have 12 RNA-seq samples of pairs "CSC" - "noncsc" ("csc" stands from cancer stem cells; and each pair CSC-noncsc comes from a different mouse). I write the following code for computing the differential expression with limma on the paired samples. Please could you let me know if it is correct.  Thank you !

eset <- read.delim("mm10.genes.FPKM.for.Erica.and.Ariel.integrated.with.edgeR.version8feb2016.for-PCA.renaming",row.names="Symbol")

group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))
design <- model.matrix(~0+group+subject)

contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

results <- topTable(fit, coef=1, adjust="fdr", number=24073)
write.table(results, file="results", sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.26.9

loaded via a namespace (and not attached):
[1] tools_3.2.4

 

limma rnaseq • 1.1k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 41 minutes ago
The city by the bay

While there's nothing wrong with the limma code, it seems you're applying it directly on FPKMs. You should be using voom on the raw counts instead, if you want to use limma on RNA-seq data.

ADD COMMENT
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Hi Aaron, thanks for the very prompt reply ! you are right - mistakenly, I've imported the file with the FPKM values ;)

will post the code with voom in 5-10 minutes, and I may ask you to please take a look ... thanks again and happy weekend ! 

ADD COMMENT
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear Aaron,

here is an updated code that included the "voom" transformation : when you have a minute, please could you take a look and let me know if I shall make any changes. thank you again !

eset <- read.delim("raw-counts",row.names="Symbol")

group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))

design <- model.matrix(~0+group+subject)
contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

dge <- DGEList(counts=eset, group=group)
dge <- calcNormFactors(dge)
v <- voom(dge,design,plot=TRUE)

fit <- lmFit(v, design)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

results <- topTable(fit2, coef=1, adjust="fdr", number=24073)

write.table(results, file="results", sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

ADD COMMENT
1
Entering edit mode

Looks mostly fine to me. However, you might want to do some filtering to remove low-abundance genes at the start (i.e., before running calcNormFactors, and definitely before running voom, otherwise you'll end up with a strange mean-variance trend). Also, for future reference, modify your original question rather than posting a new answer.

ADD REPLY
0
Entering edit mode

thanks, Aaron. will open a new question about edgeR soon. On the same topic of limma though, a little additional question please (on a tiny detail): when I set up the design, it looks that the word "group" is added automatically to the  names of column1 and column2 : is it supposed to be this way ?

> group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
> subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))
> design <- model.matrix(~0+group+subject)
> design
   groupcsc groupnon subject2 subject3 subject4 subject5 subject6
1         1        0        0        0        0        0        0
2         1        0        1        0        0        0        0
3         1        0        0        1        0        0        0
4         1        0        0        0        1        0        0
5         1        0        0        0        0        1        0
6         1        0        0        0        0        0        1
7         0        1        0        0        0        0        0
8         0        1        1        0        0        0        0
9         0        1        0        1        0        0        0
10        0        1        0        0        1        0        0
11        0        1        0        0        0        1        0
12        0        1        0        0        0        0        1

and we do :

contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

dge <- DGEList(counts=eset,group=group)
dge <- calcNormFactors(dge)
v <- voom(dge,design,plot=TRUE)

fit <- lmFit(v, design)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

ADD REPLY
2
Entering edit mode

Yes, that's meant to happen. You could imagine if you had two factors, both of which have the level "X"; if you didn't put the factor name in front, you wouldn't be able to figure out which factor a column named "X" would be referring to.

ADD REPLY

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6