Question

limma multiple rows per gene in output (write.fit)

0

Entering edit mode

Anne • 0

@anne-8108

Last seen 8.9 years ago

Netherlands

Hi all,

I'm working with a RNA-seq (tags) dataset, having some trouble getting my limma results to a file. I use write.fit for this. Seemed to be doing its job, but after I changed the contrasts slightly, I now get a warning. (changing to my old script also doesn't help). The analysis seems to be running (example part of the dataset below, with script). However, when i try to output the results to a .tsv file with write.fit, a problem occurs. First I get this warning:

> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6, :
row names were found from a short variable and have been discarded

Which seems to suggest to me that something is wrong with the row names (duplicated)/combining the different coefficients. When I look at the .tsv ouput I get a table with each gene row "multiplied" by the number of contrasts. Example (part of the table):

Genes.Gene.name	A	Coef.group	Coef.groupxtime	Coef.group2	Coef.group2xtime
Cops5	5.56	-0.688	-0.216	0.539	-0.106
Cops5	5.56	-0.688	-0.216	0.539	-0.106
Cops5	5.56	-0.688	-0.216	0.539	-0.106
Cops5	5.56	-0.688	-0.216	0.539	-0.106
Arfgef1	7.45	0.545	-0.012	-0.233	-0.026
Arfgef1	7.45	0.545	-0.012	-0.233	-0.026
Arfgef1	7.45	0.545	-0.012	-0.233	-0.026
Arfgef1	7.45	0.545	-0.012	-0.233	-0.026

I tried running everything again, also the script that previously worked, thinking something had gone wrong in my tables (they all look ok when I check this though). I also updated my packages. Using topTable to access the results to write a table for each coeffient does work, but as I'm doing a lot of comparisons (the above is a simplified example, but also doesn't work) I'd rather have them in the one table using write.fit.

I really can't figure out why I'm having this problem (sorry if this is a stupid question, as a biologist my understanding of R is very basic). So I could do with some pointers on how to solve this problem. Thanks in advance!

Anne

R code:

> head(dataset[,1:8]) # table with annotated counts per sample

S1 S2 S3 S4 S6 S7 S8 S9
TC_chr1_-_10037945_Cops5 807 733 255 397 591 727 611 888
TC_chr1_-_10232984_Arfgef1 1855 1775 668 1088 1263 1630 1477 1664
TC_chr1_-_105356670_Rnf152 117 52 27 35 72 64 40 79
TC_chr1_-_106714001_Bcl2 74 72 19 32 34 34 42 48
TC_chr1_-_106759669_Kdsr 475 550 206 352 370 461 318 334
TC_chr1_-_106796644_Vps4b 1138 1162 430 679 848 1181 788 943

> names(genes)
[1] "TCID" "gene_region" "CHROMOSOME" "START" "END" "STRAND"
[7] "START.ENS" "END.ENS" "ENSEMBL.TRANSCRIPTID" "STRAND." "GENE.ENSEMBLE.ID" "Source"
[13] "GENE.TYPE" "Status" "Gene.name" "ENTREZGENEID"

> group <- factor(c("D1","A1","B2","C2","B2","D2","B1","D2","A2",
+ "C1","A2","D1","B1","B2","C1","B1","C1",
+ "C2","C2","D1","A1","A2","B1","B2","C2","D2",
+ "C1","A2","D2","B2","B1","B2","C1","C2","A1",
+ "D1","D1","A2","B1","A2"))
> design = model.matrix(~0+group)
> g.name <- c("A1","A2","B1","B2","C1","C2","D1","D2")
> colnames(design) <- g.name
>
> # DGEList
> d <- DGEList(counts=dataset, group=group, genes=genes)
>
> # filter for low counts and reset lib.sizes
> fcpm <- rowSums(cpm(d) > 1) >= 20
> d <- d[fcpm,]
> d$samples$lib.size <- colSums(d$counts)
> dim(d)
[1] 16314 40
> # library scale normalisation factors
> d <- calcNormFactors(d, method="TMM")
> nf <- d$samples$norm.factors
>
> # normalise, voom with quality weights
> y <- voomWithQualityWeights(d, design=design, lib.size=(d$samples$lib.size*nf), normalization="none", plot=TRUE)
>
> fit <- lmFit(y,design)
>
> my.contrast = makeContrasts(group = (B1+B2)-(A1+A2),
+ groupxtime = (B1-B2)-(A1-A2),
+ group2 = (D1+D2)-(C1+C2),
+ group2xtime = (D1-D2)-(C1-C2),levels=design)
> fit1 <- contrasts.fit(fit,my.contrast)
> fit1 <- eBayes(fit1)
>
> P1 <- decideTests(fit1, adjust="none", p.value=0.005)
> FDR1 <- decideTests(fit1,adjust="fdr",method="global",p.value=0.1)
> DEG_group <- rbind(summary(P1),summary(FDR1))
> print(DEG_group)
group groupxtime group2 group2xtime
-1 479 33 42 33
0 15576 16265 16266 16263
1 259 16 6 18
-1 0 0 0 0
0 16314 16314 16314 16314
1 0 0 0 0

> write.fit(fit1,results=P1,file="results.tsv",digits=3,adjust="fdr",method="global",sep="\t")
Warning message:
In data.frame(A = c(5.56, 7.45, 2.74, 2.67, 5.17, 6.51, 2.77, 3.6, :
row names were found from a short variable and have been discarded

> sessionInfo()

R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] edgeR_3.10.2 limma_3.24.8 sparcl_1.0.3 calibrate_1.7.2 MASS_7.3-40 plyr_1.8.2

loaded via a namespace (and not attached):
[1] tools_3.2.0 Rcpp_0.11.6

limma results • 3.5k views

ADD COMMENT • link 8.9 years ago Anne • 0

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 11 hours ago

The city by the bay

This appears to be an issue with write.fit. When the BH adjustment is applied with method="global", the dimension attributes are lost when the p-value matrix is processed with p.adjust. This means that the adjusted p-values are stored as a vector instead of a matrix, such that the output table is expanded row-wise by four-fold to accommodate it (as each entry of the vector is given a new row, and there are four times as many entries in the vector as there are rows in the original matrix when you have four contrasts). I'm pretty sure this is not intentional; however, until the patch comes through for limma, just set method="separate" to get sensible results.

ADD COMMENT • link 8.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for the fast response! Reassuring to know that it is not a problem with my code, just the method that I'm trying with the writing out results. I've now tried Gordon's workaround, that seems to do the job :)

ADD REPLY • link 8.9 years ago Anne • 0

score 3 · Accepted Answer · 2015-06-09

3

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 39 minutes ago

WEHI, Melbourne, Australia

Aaron's diagnosis of the problem is correct. Amazing we haven't tripped this bug before in the 12 years that write.fit() has existed.

I have committed a bug fix to Bioconductor. It will be available for installation using biocLite() in a couple of days. In the meantime, here is a workaround:

FDR <- fit1$p.value
FDR[] <- p.adjust(FDR, method="BH")
fit1$genes <- data.frame(fit1$genes, FDR)
write.fit(fit1,results=P1,file="results.tsv",digits=3,sep="\t")

ADD COMMENT • link 8.9 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you for the workaround! I have my file - one happy biologist! Will keep an eye out for the update when it arrives, too.

ADD REPLY • link 8.9 years ago Anne • 0