Question

limma code for RNA-seq analysis (paired design)

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 15 months ago

Palo Alto, CA, USA

Dear all,

we have 12 RNA-seq samples of pairs "CSC" - "noncsc" ("csc" stands from cancer stem cells; and each pair CSC-noncsc comes from a different mouse). I write the following code for computing the differential expression with limma on the paired samples. Please could you let me know if it is correct. Thank you !

eset <- read.delim("mm10.genes.FPKM.for.Erica.and.Ariel.integrated.with.edgeR.version8feb2016.for-PCA.renaming",row.names="Symbol")

group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))
design <- model.matrix(~0+group+subject)

contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

results <- topTable(fit, coef=1, adjust="fdr", number=24073)
write.table(results, file="results", sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.26.9

loaded via a namespace (and not attached):
[1] tools_3.2.4

limma rnaseq • 1.3k views

ADD COMMENT • link updated 8.6 years ago by Steve Lianoglou ★ 13k • written 8.6 years ago by Bogdan ▴ 670

score 2 · Answer 1 · 2016-06-10

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 16 hours ago

The city by the bay

While there's nothing wrong with the limma code, it seems you're applying it directly on FPKMs. You should be using voom on the raw counts instead, if you want to use limma on RNA-seq data.

ADD COMMENT • link 8.6 years ago Aaron Lun ★ 28k

score 0 · Answer 2 · 2016-06-10

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 15 months ago

Palo Alto, CA, USA

Hi Aaron, thanks for the very prompt reply ! you are right - mistakenly, I've imported the file with the FPKM values ;)

will post the code with voom in 5-10 minutes, and I may ask you to please take a look ... thanks again and happy weekend !

ADD COMMENT • link 8.6 years ago Bogdan ▴ 670

score 0 · Answer 3 · 2016-06-10

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 15 months ago

Palo Alto, CA, USA

Dear Aaron,

here is an updated code that included the "voom" transformation : when you have a minute, please could you take a look and let me know if I shall make any changes. thank you again !

eset <- read.delim("raw-counts",row.names="Symbol")

group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))

design <- model.matrix(~0+group+subject)
contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

dge <- DGEList(counts=eset, group=group)
dge <- calcNormFactors(dge)
v <- voom(dge,design,plot=TRUE)

fit <- lmFit(v, design)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

results <- topTable(fit2, coef=1, adjust="fdr", number=24073)

write.table(results, file="results", sep="\t", eol="\n", row.names=TRUE, col.names=TRUE)

ADD COMMENT • link 8.6 years ago Bogdan ▴ 670

1

Entering edit mode

Looks mostly fine to me. However, you might want to do some filtering to remove low-abundance genes at the start (i.e., before running calcNormFactors, and definitely before running voom, otherwise you'll end up with a strange mean-variance trend). Also, for future reference, modify your original question rather than posting a new answer.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

thanks, Aaron. will open a new question about edgeR soon. On the same topic of limma though, a little additional question please (on a tiny detail): when I set up the design, it looks that the word "group" is added automatically to the names of column1 and column2 : is it supposed to be this way ?

> group <- factor(c("csc","csc","csc","csc","csc","csc","non","non","non","non","non","non"))
> subject <- factor(c(1,2,3,4,5,6,1,2,3,4,5,6))
> design <- model.matrix(~0+group+subject)
> design
   groupcsc groupnon subject2 subject3 subject4 subject5 subject6
1         1        0        0        0        0        0        0
2         1        0        1        0        0        0        0
3         1        0        0        1        0        0        0
4         1        0        0        0        1        0        0
5         1        0        0        0        0        1        0
6         1        0        0        0        0        0        1
7         0        1        0        0        0        0        0
8         0        1        1        0        0        0        0
9         0        1        0        1        0        0        0
10        0        1        0        0        1        0        0
11        0        1        0        0        0        1        0
12        0        1        0        0        0        0        1

and we do :

contrast.matrix <- makeContrasts(groupcsc-groupnon, levels=design)

dge <- DGEList(counts=eset,group=group)
dge <- calcNormFactors(dge)
v <- voom(dge,design,plot=TRUE)

fit <- lmFit(v, design)

fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

ADD REPLY • link 8.6 years ago Bogdan ▴ 670

2

Entering edit mode

Yes, that's meant to happen. You could imagine if you had two factors, both of which have the level "X"; if you didn't put the factor name in front, you wouldn't be able to figure out which factor a column named "X" would be referring to.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k