Question

DEXSeq - Error in estimateSizeFactorsForMatrix(featureCounts(object)

0

Entering edit mode

komal.rathi ▴ 120

@komalrathi-9163

Last seen 2.4 years ago

United States

Hi everyone,

I have exonic counts for 344 samples for 15 genes that include some house-keeping genes like GAPDH and Actin genes using dexseq_count. I am trying to normalize them like this:

dxd = DEXSeqDataSetFromHTSeq(
  countFiles,
  sampleData = sampleTable,
  design= ~ sample + exon + disease_type:exon,
  flattenedfile = flattenedFile)

head(sampleTable)
              sample_id                                     disease_type
7316-1057-T 7316-1057-T High-grade glioma/astrocytoma (WHO grade III/IV)
7316-1480-T 7316-1480-T                                       Ependymoma
7316-775-T   7316-775-T                                       Ependymoma
7316-2009-T 7316-2009-T                                  Medulloblastoma
7316-1473-T 7316-1473-T               Supratentorial or Spinal Cord PNET
7316-302-T   7316-302-T                                  Medulloblastoma

> dxd
class: DEXSeqDataSet 
dim: 441 688 
metadata(1): version
assays(1): counts
rownames(441): ENSG00000075624.13:E001 ENSG00000075624.13:E002 ... ENSG00000188676.13:E017
  ENSG00000188676.13:E018
rowData names(5): featureID groupID exonBaseMean exonBaseVar transcripts
colnames: NULL
colData names(4): sample sample_id disease_type exon

Why is it showing 688 samples instead of 344?

Also, I am getting the following error when I try to normalize:

dxd = estimateSizeFactors(dxd)
Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc,  : 
  every gene contains at least one zero, cannot compute log geometric means

I checked my counts file but I can't seem to find any sample that has all 0 values:

> dim(cts)
[1] 441 688

> summary(colSums(cts))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     485    15420    42410  2255000   599800 67470000 

> summary(rowSums(cts))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     989     2973   187300  3518000  8387000 11690000

Please let me know what am I missing here. Thanks

dexseqdatasetfromhtseq dexseq_count dexseq • 1.2k views

ADD COMMENT • link 8.4 years ago komal.rathi ▴ 120

0

Entering edit mode

Hi Komal

Can you please post

(a public URL to) the data objects countFiles, flattenedFile, sampleTable
the code that produces them
the output of devtools::session_info()

This should make it easier to troubleshoot what is going on. (Check out the advice in the Bioconductor posting guide, which is intended to make it easier to pin down the issue more quickly.)

It's always a good idea to get to the bottom of troubling observations such as "Why is it showing 688 samples instead of 344?" before continuing with the workflow. "Garbage in, garbage out."

One more thing: DEXSeq was designed to work with transcriptome-wide (i.e. RNA-Seq) data. Applying it to a dataset with 15 genes will be a bit of a stretch. It's quite possible that it works (especially if your treatment effects are mild), but I'd use extra caution and visualization in assessing the results. You might have to modify the size factor estimation to rely more on the housekeeping genes rather than all genes; or, if you do use estimateSizeFactors, look at the housekeeping genes whether the result is satisfactory.

Hope this helps

Wolfgang

ADD REPLY • link 8.4 years ago Wolfgang Huber ★ 13k