Search
Question: Warning messages from diffSplice in Limma
1
gravatar for willj
3 months ago by
willj30
France
willj30 wrote:

I have data from Affy Human exon arrays 1.0 ST, and I'm using the diffSplice function in Limma to detect differential splicing. I'm getting some warnings (see below) when I call diffSplice and I'd like to know if I should be concerned about them (results from topSplice and plotSplice look fine as far as I can tell).

The experiment has a Healthy group and 3 disease groups, with a design as follows:

> design
   Dis1 Healthy Dis2 Dis3
1    1       0       0   0
2    1       0       0   0
3    1       0       0   0
4    1       0       0   0
5    1       0       0   0
6    1       0       0   0
7    0       1       0   0
8    0       1       0   0
9    0       1       0   0
10   0       1       0   0
11   0       1       0   0
12   0       1       0   0
13   0       0       1   0
14   0       0       1   0
15   0       0       1   0
16   0       0       1   0
17   0       0       1   0
18   0       0       1   0
19   0       0       0   1
20   0       0       0   1
21   0       0       0   1
22   0       0       0   1
23   0       0       0   1
24   0       0       0   1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Conds
[1] "contr.treatment"

When I run diffSplice it gives some warning messages:

> fit <- lmFit(rma.probesets.filt, design)
> dSplice <- diffSplice(fit[,"Dis1"], geneid="geneNames", exonid="start")
Total number of exons:  787188 
Total number of genes:  31123 
Number of genes with 1 exon:  4769 
Mean number of exons in a gene:  25 
Max number of exons in a gene:  504965 
Warning messages:
1: In rowsum.default(exon.stat, geneid, reorder = FALSE) :
  missing values for 'group'
2: In rowsum.default(u2, geneid, reorder = FALSE) :
  missing values for 'group'
3: In rowsum.default(exon.coefficients * u2, geneid, reorder = FALSE) :
  missing values for 'group'
4: In rowsum.default(exon.t^2, geneid, reorder = FALSE) :
  missing values for 'group'

Should I be worried about these warning messages?

Also (perhaps related?) should I be worried that my geneid column has lots of unlabelled probesets that are all being considered by diffSplice as one giant gene with 504,965 exons?

Many thanks for any help/advice.

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] genefilter_1.56.0        pd.huex.1.0.st.v2_3.14.1 DBI_0.5-1                RSQLite_1.1-2           
 [5] oligo_1.38.0             Biostrings_2.42.1        XVector_0.14.0           IRanges_2.8.1           
 [9] S4Vectors_0.12.1         oligoClasses_1.36.0      limma_3.30.8             Biobase_2.34.0          
[13] BiocGenerics_0.20.0      BiocInstaller_1.24.0    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                GenomeInfoDb_1.10.2        bitops_1.0-6               iterators_1.0.8
 [5] tools_3.3.2                zlibbioc_1.20.0            digest_0.6.11              statmod_1.4.27
 [9] bit_1.1-12                 annotate_1.52.1            memoise_1.0.0              preprocessCore_1.36.0
[13] lattice_0.20-34            ff_2.2-13                  Matrix_1.2-7.1             foreach_1.4.3
[17] affxparser_1.46.0          grid_3.3.2                 AnnotationDbi_1.36.2       survival_2.40-1
[21] XML_3.98-1.5               codetools_0.2-15           GenomicRanges_1.26.2       splines_3.3.2
[25] SummarizedExperiment_1.4.0 xtable_1.8-2               RCurl_1.95-4.8             affyio_1.44.0
ADD COMMENTlink modified 3 months ago by Gordon Smyth30k • written 3 months ago by willj30
2
gravatar for Aaron Lun
3 months ago by
Aaron Lun15k
Cambridge, United Kingdom
Aaron Lun15k wrote:

As you may have already realized, the geneNames column of fit$genes contains missing values. This causes the warnings and can be fixed by removing the rows corresponding to the NA values. Otherwise, the function will consider all NA exons as being part of the same gene, which would be obviously silly.

Also, use of fit[,"Dis1"] doesn't make much sense for a design matrix without an intercept. The "exon-specific log-fold changes" in this setting would just be the log-expression of each exon in the Dis1 group, rather than the log-fold change between two groups (e.g., Dis1 and Healthy). Use contrasts.fit prior to running diffSplice if you want to compare exon usage between groups.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun15k
2
gravatar for Gordon Smyth
3 months ago by
Gordon Smyth30k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth30k wrote:

Yes, you should be worried by the unlabelled probesets. diffSplice() assumes that you know which gene each exon belongs to. You have 787k exons in total but 505k of these have no gene identifier. So you do need to fix this basic data problem before proceeding. Is there no annotation as to which exons belong to the same gene? As Aaron has advised, you will have to remove the exons that have no annotation.

I am going to edit diffSplice() so that it treats all NA values for geneid as different genes, each with a single exon.

Also, you would find it easier to define the design matrix like this:

Conds <- relevel(Conds, ref="Healthy")
design <- model.matrix(~Conds)

That way, each disease will be compared back to healthy. Then

ds <- diffSplice(fit, ...)

will do the differential splicing analysis for all the diseases at once.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Gordon Smyth30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 304 users visited in the last hour