Question: DEXSeq - Error in estimateSizeFactorsForMatrix(featureCounts(object)
gravatar for komal.rathi
2.2 years ago by
United States
komal.rathi80 wrote:

Hi everyone,

I have exonic counts for 344 samples for 15 genes that include some house-keeping genes like GAPDH and Actin genes using dexseq_count. I am trying to normalize them like this:

dxd = DEXSeqDataSetFromHTSeq(
  sampleData = sampleTable,
  design= ~ sample + exon + disease_type:exon,
  flattenedfile = flattenedFile)

              sample_id                                     disease_type
7316-1057-T 7316-1057-T High-grade glioma/astrocytoma (WHO grade III/IV)
7316-1480-T 7316-1480-T                                       Ependymoma
7316-775-T   7316-775-T                                       Ependymoma
7316-2009-T 7316-2009-T                                  Medulloblastoma
7316-1473-T 7316-1473-T               Supratentorial or Spinal Cord PNET
7316-302-T   7316-302-T                                  Medulloblastoma

> dxd
class: DEXSeqDataSet 
dim: 441 688 
metadata(1): version
assays(1): counts
rownames(441): ENSG00000075624.13:E001 ENSG00000075624.13:E002 ... ENSG00000188676.13:E017
rowData names(5): featureID groupID exonBaseMean exonBaseVar transcripts
colnames: NULL
colData names(4): sample sample_id disease_type exon

Why is it showing 688 samples instead of 344? 

Also, I am getting the following error when I try to normalize:

dxd = estimateSizeFactors(dxd)
Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc,  : 
  every gene contains at least one zero, cannot compute log geometric means

I checked my counts file but I can't seem to find any sample that has all 0 values:

> dim(cts)
[1] 441 688

> summary(colSums(cts))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     485    15420    42410  2255000   599800 67470000 

> summary(rowSums(cts))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     989     2973   187300  3518000  8387000 11690000 

Please let me know what am I missing here. Thanks

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by komal.rathi80

Hi Komal

Can you please post

  • (a public URL to) the data objects countFiles, flattenedFile, sampleTable
  • the code that produces them
  • the output of devtools::session_info()

This should make it easier to troubleshoot what is going on. (Check out the advice in the Bioconductor posting guide, which is intended to make it easier to pin down the issue more quickly.)

It's always a good idea to get to the bottom of troubling observations such as "Why is it showing 688 samples instead of 344?" before continuing with the workflow. "Garbage in, garbage out."

One more thing: DEXSeq was designed to work with transcriptome-wide (i.e. RNA-Seq) data. Applying it to a dataset with 15 genes will be a bit of a stretch. It's quite possible that it works (especially if your treatment effects are mild), but I'd use extra caution and visualization in assessing the results. You might have to modify the size factor estimation to rely more on the housekeeping genes rather than all genes; or, if you do use estimateSizeFactors, look at the housekeeping genes whether the result is satisfactory.

Hope this helps


ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Wolfgang Huber13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 148 users visited in the last hour