Question

Ballgown differential expression by exon problem

0

Entering edit mode

new_user • 0

@new_user-17097

Last seen 5.6 years ago

I am getting an error when running the following code using the Ballgown package:

results_exons <- stattest(bg_data_filt_e, feature='exon', covariate='Condition' , meas='mcov')

The error is:

Error in solve.default(t(mod) %*% mod) :
system is computationally singular: reciprocal condition number = 1.31705e-16

From https://github.com/alyssafrazee/ballgown/issues/67 it Alyssa wrote:

'The "system is computationally singular" usually means that all samples have the same value (usually 0, or something very close to zero) for at least one transcript/gene. You can usually fix this by filtering by expression level (see ?exprfilter).'

In this case I am looking at differential exon usage and I have already filtered (or subset-ed) using:

bg_data_filt <- ballgown::subset(bg_data, "rowVars(texpr(bg_data)) > 1", genomesubset=TRUE)

bg_data_filt_e <- ballgown::subset(bg_data_filt, "rowVars(eexpr(bg_data_filt)) > 1", genomesubset=TRUE)

So I am already filtering out low variance transcripts and exons already. This approach works fine for the transcripts, genes and introns but not the exons...

I have also looked at filtering the initial ballgown object using an FPKM cutoff of 1 (using exprfilter) and tried changing the meas used for exon expression to mrcount, rcount or ucount. No luck.

I'm pretty new to this so can anyone advise on why stattest might be failing, or give me a few pointers?

ballgown • 1.5k views

ADD COMMENT • link 5.7 years ago • updated 5.6 years ago new_user • 0

0

Entering edit mode

So after some digging it looks like the filter "rowVars(eexpr(bg_data_filt))>1" is not working.

Even if I go back to the original unfiltered ballgown object (not filtering for low abundance transcripts first), and use:

bg_data_filt_e <- ballgown::subset(bg_data, "rowVars(eexpr(bg_data, meas='mcov')) > 1", genomesubset=TRUE)

And then print to the screen using:

eexpr(bg_data_filt_e, meas="mcov")

I get values for mcov which are 0 for all samples....

Why isn't this filter working? It works for transcripts (texpr) and introns (iexpr)?

EDIT: it doesn't appear to be working for introns either although stattest doesn't result in an error for my data.

From ?subset:

'The subset expression can either involve column names of texpr(x, "all") (if genomesubset is TRUE) or of pData(x) (if genomesubset is FALSE). '

So perhaps I cannot subset based on exon or intron expression?

But if stattest is designed to work on exons and introns (as well as genes and transcripts), how do I filter out exons (or introns) with zero expression?

ADD REPLY • link 5.6 years ago new_user • 0

0

Entering edit mode

Hello! You are correct that the subset command does not work for exons or introns. It only works for transcripts, which is why you can only use the column names of the transcript expression matrix (and not the exon or intron matrix).

To use stattest with a filtered exon or intron table, I suggest manually subsetting the table (e.g., creating a new table subsetting by the row variances), and using the resulting table as the "gowntable" argument to stattest. Please see ?stattest for more information on gowntable.

Hope this helps!

ADD REPLY • link 5.6 years ago Alyssa Frazee ▴ 210

0

Entering edit mode

Hi Alyssa. Thanks for the reply.

I exported the mcov data using eexpr(bg_data,"mcov").

I have a data matrix with an additional column (rowVars of the proceeding columns containing the mcov data).

I then filtered on row variance > 1 and performed the stattest as follows:

numcols <- ncol(e_mcov_matrix)

results_exons <- ballgown::stattest(gown = NULL, gowntable = e_mcov_matrix[,1:(numcols-1)],pData =pheno_data, mod = NULL, mod0 = NULL, feature = "exon", meas = "mcov", timecourse = FALSE, covariate = "Condition", adjustvars = NULL, gexpr = NULL, df = 4, getFC = FALSE)

This hasn't resolved the issue with the result being computationally singular however. Instead I am going to work on error handling using trycatch() and not bother with the exon level DE analysis.

For analysis of a subset of conditions (comparing 2 conditions) using the gowntable method I realised that I need to use the droplevels() function on the subsetted/filtered data.frame(pData) to remove unused factors prior to running the stattest.

I solved this by first looking at the code for the stattest function from:

https://github.com/alyssafrazee/ballgown/blob/master/R/stattest.R

(see line 205-219)

When I was subsetting my pData (phenotype data) for two conditions the data.frame created still included unused factors. These were being evaluated in the code (lines 205-219) as 0 (and therefore less than 2 replicates per condition).

Although this resolved the error message relating to the lack of the replicates, the result was still computationally singular.

ADD REPLY • link 5.6 years ago new_user • 0