Search
Question: DEXSeq - Error in estimateSizeFactorsForMatrix(featureCounts(object)
0
8 months ago by
komal.rathi50
United States
komal.rathi50 wrote:

Hi everyone,

I have exonic counts for 344 samples for 15 genes that include some house-keeping genes like GAPDH and Actin genes using dexseq_count. I am trying to normalize them like this:

dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData = sampleTable,
design= ~ sample + exon + disease_type:exon,
flattenedfile = flattenedFile)

sample_id                                     disease_type
7316-1480-T 7316-1480-T                                       Ependymoma
7316-775-T   7316-775-T                                       Ependymoma
7316-2009-T 7316-2009-T                                  Medulloblastoma
7316-1473-T 7316-1473-T               Supratentorial or Spinal Cord PNET
7316-302-T   7316-302-T                                  Medulloblastoma

> dxd
class: DEXSeqDataSet
dim: 441 688
assays(1): counts
rownames(441): ENSG00000075624.13:E001 ENSG00000075624.13:E002 ... ENSG00000188676.13:E017
ENSG00000188676.13:E018
rowData names(5): featureID groupID exonBaseMean exonBaseVar transcripts
colnames: NULL
colData names(4): sample sample_id disease_type exon


Why is it showing 688 samples instead of 344?

Also, I am getting the following error when I try to normalize:

dxd = estimateSizeFactors(dxd)
Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc,  :
every gene contains at least one zero, cannot compute log geometric means

I checked my counts file but I can't seem to find any sample that has all 0 values:

> dim(cts)
[1] 441 688

> summary(colSums(cts))
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
485    15420    42410  2255000   599800 67470000

> summary(rowSums(cts))
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
989     2973   187300  3518000  8387000 11690000 

Please let me know what am I missing here. Thanks

modified 8 months ago • written 8 months ago by komal.rathi50

Hi Komal

• (a public URL to) the data objects countFiles, flattenedFile, sampleTable
• the code that produces them
• the output of devtools::session_info()

This should make it easier to troubleshoot what is going on. (Check out the advice in the Bioconductor posting guide, which is intended to make it easier to pin down the issue more quickly.)

It's always a good idea to get to the bottom of troubling observations such as "Why is it showing 688 samples instead of 344?" before continuing with the workflow. "Garbage in, garbage out."

One more thing: DEXSeq was designed to work with transcriptome-wide (i.e. RNA-Seq) data. Applying it to a dataset with 15 genes will be a bit of a stretch. It's quite possible that it works (especially if your treatment effects are mild), but I'd use extra caution and visualization in assessing the results. You might have to modify the size factor estimation to rely more on the housekeeping genes rather than all genes; or, if you do use estimateSizeFactors, look at the housekeeping genes whether the result is satisfactory.

Hope this helps

Wolfgang