Warning messages from diffSplice in Limma
2
1
Entering edit mode
willj ▴ 30
@willj-8763
Last seen 7.2 years ago
France

I have data from Affy Human exon arrays 1.0 ST, and I'm using the diffSplice function in Limma to detect differential splicing. I'm getting some warnings (see below) when I call diffSplice and I'd like to know if I should be concerned about them (results from topSplice and plotSplice look fine as far as I can tell).

The experiment has a Healthy group and 3 disease groups, with a design as follows:

> design
   Dis1 Healthy Dis2 Dis3
1    1       0       0   0
2    1       0       0   0
3    1       0       0   0
4    1       0       0   0
5    1       0       0   0
6    1       0       0   0
7    0       1       0   0
8    0       1       0   0
9    0       1       0   0
10   0       1       0   0
11   0       1       0   0
12   0       1       0   0
13   0       0       1   0
14   0       0       1   0
15   0       0       1   0
16   0       0       1   0
17   0       0       1   0
18   0       0       1   0
19   0       0       0   1
20   0       0       0   1
21   0       0       0   1
22   0       0       0   1
23   0       0       0   1
24   0       0       0   1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Conds
[1] "contr.treatment"

When I run diffSplice it gives some warning messages:

> fit <- lmFit(rma.probesets.filt, design)
> dSplice <- diffSplice(fit[,"Dis1"], geneid="geneNames", exonid="start")
Total number of exons:  787188 
Total number of genes:  31123 
Number of genes with 1 exon:  4769 
Mean number of exons in a gene:  25 
Max number of exons in a gene:  504965 
Warning messages:
1: In rowsum.default(exon.stat, geneid, reorder = FALSE) :
  missing values for 'group'
2: In rowsum.default(u2, geneid, reorder = FALSE) :
  missing values for 'group'
3: In rowsum.default(exon.coefficients * u2, geneid, reorder = FALSE) :
  missing values for 'group'
4: In rowsum.default(exon.t^2, geneid, reorder = FALSE) :
  missing values for 'group'

Should I be worried about these warning messages?

Also (perhaps related?) should I be worried that my geneid column has lots of unlabelled probesets that are all being considered by diffSplice as one giant gene with 504,965 exons?

Many thanks for any help/advice.

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] genefilter_1.56.0        pd.huex.1.0.st.v2_3.14.1 DBI_0.5-1                RSQLite_1.1-2           
 [5] oligo_1.38.0             Biostrings_2.42.1        XVector_0.14.0           IRanges_2.8.1           
 [9] S4Vectors_0.12.1         oligoClasses_1.36.0      limma_3.30.8             Biobase_2.34.0          
[13] BiocGenerics_0.20.0      BiocInstaller_1.24.0    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                GenomeInfoDb_1.10.2        bitops_1.0-6               iterators_1.0.8
 [5] tools_3.3.2                zlibbioc_1.20.0            digest_0.6.11              statmod_1.4.27
 [9] bit_1.1-12                 annotate_1.52.1            memoise_1.0.0              preprocessCore_1.36.0
[13] lattice_0.20-34            ff_2.2-13                  Matrix_1.2-7.1             foreach_1.4.3
[17] affxparser_1.46.0          grid_3.3.2                 AnnotationDbi_1.36.2       survival_2.40-1
[21] XML_3.98-1.5               codetools_0.2-15           GenomicRanges_1.26.2       splines_3.3.2
[25] SummarizedExperiment_1.4.0 xtable_1.8-2               RCurl_1.95-4.8             affyio_1.44.0
limma diffSplice • 1.3k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

As you may have already realized, the geneNames column of fit$genes contains missing values. This causes the warnings and can be fixed by removing the rows corresponding to the NA values. Otherwise, the function will consider all NA exons as being part of the same gene, which would be obviously silly.

Also, use of fit[,"Dis1"] doesn't make much sense for a design matrix without an intercept. The "exon-specific log-fold changes" in this setting would just be the log-expression of each exon in the Dis1 group, rather than the log-fold change between two groups (e.g., Dis1 and Healthy). Use contrasts.fit prior to running diffSplice if you want to compare exon usage between groups.

ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Yes, you should be worried by the unlabelled probesets. diffSplice() assumes that you know which gene each exon belongs to. You have 787k exons in total but 505k of these have no gene identifier. So you do need to fix this basic data problem before proceeding. Is there no annotation as to which exons belong to the same gene? As Aaron has advised, you will have to remove the exons that have no annotation.

I am going to edit diffSplice() so that it treats all NA values for geneid as different genes, each with a single exon.

Also, you would find it easier to define the design matrix like this:

Conds <- relevel(Conds, ref="Healthy")
design <- model.matrix(~Conds)

That way, each disease will be compared back to healthy. Then

ds <- diffSplice(fit, ...)

will do the differential splicing analysis for all the diseases at once.

ADD COMMENT

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6