Question

Coseq error with DESeq2 object: Error in "normFactors"

0

Entering edit mode

varunorama • 0

@varunorama-20073

Last seen 4.5 years ago

Hello,

I would like to use coseq to cluster my results based upon transcripts that were identified as differentially expressed by DESeq2. Coseq mentions that I can take the object (e.g. dds) and use that directly.

With this said, I implemented coseq as such:

run <- coseq(dds2_sig, normFactors = "none", transformation = "none", model = "logclr", 
             parallel = TRUE, meanFilterCutoff = 50, nstart = 1, iter.max = 10, K=2:15, alpha=0.05)

but I receive an error:

****************************************
Co-expression analysis on DESeq2 output:
7479 DE genes at p-adj < 0.05 
Error in transformRNAseq(y = y, normFactors = normFactors, transformation = "profile",  : 
  ‘normFactors’ must be one of
     the following: none, TC, DESeq, TMM, or a vector oflength equal to the number of columns
     in ‘y’

Looking at the code, I did specify a valid option for normFactors, but R is registering as invalid.

Users of coseq, any thoughts as to why this problem might be occuring?

Thanks.

coseq rna-seq • 1.8k views

ADD COMMENT • link updated 5.1 years ago by andrea.rau ▴ 80 • written 5.1 years ago by varunorama • 0

0

Entering edit mode

apologies, here is my session information

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] coseq_1.6.1                 doParallel_1.0.14           iterators_1.0.10            foreach_1.4.4               tximport_1.10.1             HTSFilter_1.22.1           
 [7] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.6         matrixStats_0.54.0          Biobase_2.42.0             
[13] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     tensorA_0.36.1         tools_3.5.1            backports_1.1.3        R6_2.4.0              
 [8] rpart_4.1-13           Hmisc_4.2-0            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0       nnet_7.3-12            tidyselect_0.2.5      
[15] gridExtra_2.3          bayesm_3.1-1           bit_1.1-14             Rmixmod_2.1.2          compiler_3.5.1         compositions_1.40-2    htmlTable_1.13.1      
[22] scales_1.0.0           checkmate_1.9.1        mvtnorm_1.0-8          DEoptimR_1.0-8         robustbase_0.93-3      readr_1.3.1            genefilter_1.64.0     
[29] DESeq_1.34.1           stringr_1.4.0          digest_0.6.18          foreign_0.8-70         XVector_0.22.0         base64enc_0.1-3        pkgconfig_2.0.2       
[36] htmltools_0.3.6        plotrix_3.7-4          limma_3.38.3           htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1         
[43] energy_1.7-5           acepack_1.4.1          dplyr_0.8.0.1          HTSCluster_2.0.8       RCurl_1.95-4.11        magrittr_1.5           GenomeInfoDbData_1.2.0
[50] Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0             munsell_0.5.0          stringi_1.3.1          yaml_2.2.0             edgeR_3.24.3          
[57] MASS_7.3-50            zlibbioc_1.28.0        plyr_1.8.4             grid_3.5.1             blob_1.1.1             crayon_1.3.4           lattice_0.20-35       
[64] splines_3.5.1          annotate_1.60.0        hms_0.4.2              locfit_1.5-9.1         capushe_1.1.1          knitr_1.21             pillar_1.3.1          
[71] boot_1.3-20            geneplotter_1.60.0     codetools_0.2-15       XML_3.98-1.17          glue_1.3.0             latticeExtra_0.6-28    BiocManager_1.30.4    
[78] data.table_1.12.0      gtable_0.2.0           purrr_0.3.0            assertthat_0.2.0       ggplot2_3.1.0          xfun_0.5               xtable_1.8-3          
[85] e1071_1.7-0.1          class_7.3-14           survival_2.42-3        tibble_2.0.1           AnnotationDbi_1.44.0   memoise_1.1.0          corrplot_0.84         
[92] cluster_2.0.7-1

ADD REPLY • link 5.1 years ago varunorama • 0

0

Entering edit mode

Hi @varunorama,

I wanted to clarify the specific analysis you're trying to perform here:

You've specified normFactors = "none", which means that the clustering analysis will not make use of the library size normalization factors that were calculated by DESeq2. I would strongly recommend that you leave this to its default value (in the case of a DESeqDataSet object, normFactors = "DESeq") unless you have a good reason for not wanting to normalize the raw counts for the clustering step.
You've specified transformation = "none" as well, which means that your raw counts will not be transformed prior to performing the K-means algorithm (which is the default) for clustering. Typically I would recommend using either the logCLR or CLR tranformation of expression profiles (transformation = "logclr", which is the default, or transformation = "clr") in conjunction with the K-means algorithm, unless you have a good reason for wanting to use the raw counts.
Finally, you've specified model = "logCLR", but the only valid model choices are model = "kmeans" (defautl) or model = "Normal" for a Gaussian mixture model.

At the very least, you should thus change the value for the model argument in your call to coseq (although again, I would also recommend using default values for normFactors and transformation) and see if that helps things.

Andrea (the coseq developer)

ADD REPLY • link 5.1 years ago andrea.rau ▴ 80

0

Entering edit mode

Hello Andrea,

Thank you very much for your response. Yes, looking back at your comments I realized the errors in my parameters. I am running it again using this command:

run <- coseq(dds2_sig, normFactors = "DESeq", transformation = "logclr", model = "kmeans", 
             parallel = TRUE, meanFilterCutoff = 50, nstart = 1, iter.max = 10, K=2:15, alpha=0.05)

However, I still get the same error:

****************************************
Co-expression analysis on DESeq2 output:
7479 DE genes at p-adj < 0.05 
Error in transformRNAseq(y = y, normFactors = normFactors, transformation = "profile",  : 
  ‘normFactors’ must be one of
     the following: none, TC, DESeq, TMM, or a vector oflength equal to the number of columns
     in ‘y’

I noticed that in some other versions, normFactors was written as norm, but changing that did not alleviate the error.

Do you have any suggestions moving forward?

ADD REPLY • link 5.1 years ago varunorama • 0

0

Entering edit mode

Thanks for double-checking your parameters. Is there any way you could send a fully reproducible example? I'm having trouble reproducing your error.

When you run the analogous example from the vignette, do you get the same error you're seeing with your own data?

library(coseq)
library(DESeq2)
library(Biobase)
data("fietz")
counts <- exprs(fietz)
conds <- pData(fietz)$tissue
dds <- DESeqDataSetFromMatrix(counts, DataFrame(group=factor(conds)), ~group)
dds <- DESeq(dds, test="LRT", reduced = ~1)
run <- coseq(dds, normFactors = "DESeq", transformation = "logclr", 
             model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, 
             nstart = 1, iter.max = 10, K=2:15, alpha=0.05)

Also, regarding norm versus normFactors, R uses exact or unique partial matches for arguments, so using either of them is the same as using the full argument name normFactors.

ADD REPLY • link 5.1 years ago andrea.rau ▴ 80

0

Entering edit mode

Hello again Andrea,

This will be a somewhat long message but I ran the vignette by itself and it worked. As the only difference between your code and mine was the usage of HTSFilter, I incorporated that into the vignette as such:

library(coseq)
library(DESeq2)
library(Biobase)
data("fietz")
counts <- exprs(fietz)
conds <- pData(fietz)$tissue
dds <- DESeqDataSetFromMatrix(counts, DataFrame(group=factor(conds)), ~group)
dds <- DESeq(dds, test="LRT", reduced = ~1)
filter_ex <- HTSFilter(dds, s.len=100, normalization = "DESeq")$filteredData
run <- coseq(filter_ex, normFactors = "DESeq", transformation = "logclr", 
             model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, 
             nstart = 1, iter.max = 10, K=2:15, alpha=0.05)

However, I stylized the code according to the vignette, but still continued to have the error in my code.

dir <- system.file("extdata", package = "tximportData")
files = list.files(pattern = "\\.RSEM.genes.results$")
names(files) = c("AND_D0_1","AND_D0_2","AND_D0_3",
                 "AND_D1_1","AND_D1_2","AND_D1_3",
                 "AND_D2_1","AND_D2_2","AND_D2_3",
                 "AND_D3_1","AND_D3_2","AND_D3_3",
                 "AND_D4_1","AND_D4_2","AND_D4_3",
                 "AND_D5_1","AND_D5_2","AND_D5_3")
and.txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
head(and.txi.rsem$counts)

sampleTable <- data.frame(dpa = factor(c(0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)))
rownames(sampleTable) <- colnames(and.txi.rsem$counts)
dds <- DESeqDataSetFromTximport(and.txi.rsem, colData = sampleTable, design = ~ dpa)
dds2 <- DESeq(dds, test=c("LRT"), reduced =~1)
filtered <- HTSFilter(dds2, s.len=100, normalization = "DESeq")$filteredData
res.HTSfilter <- results(filtered, independentFiltering=FALSE)
#hist(res.HTSfilter$pvalue)
#head(res.HTSfilter)
res.HTS_filter_sig = res.HTSfilter[which(res.HTSfilter$padj < 0.05),]
library(coseq)
dds2_sig = dds2[rownames(dds2)%in%rownames(res.HTS_filter_sig),]
set.seed(12)
run <- coseq(assays(dds2_sig)$counts, normFactors = "DESeq", transformation = "logclr", 
             model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, 
             nstart = 1, iter.max = 10, K=2:30, alpha=0.05)

****************************************
Co-expression analysis on DESeq2 output:
7947 DE genes at p-adj < 0.05 
Error in transformRNAseq(y = y, normFactors = normFactors, transformation = "profile",  : 
  ‘normFactors’ must be one of
     the following: none, TC, DESeq, TMM, or a vector oflength equal to the number of columns
     in ‘y’

However, looking into it more, I managed to get an output when I called the counts specifically from dds2 as such:

run <- coseq(assays(dds2_sig)$counts, normFactors = "DESeq", transformation = "logclr", 
             model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, 
             nstart = 1, iter.max = 10, K=2:30, alpha=0.05)
****************************************
coseq analysis: kmeans approach & logclr transformation
K = 2 to 30 
Use set.seed() prior to running coseq for reproducible results.****************************************
Running K = 2 ...
Running K = 3 ...
Running K = 4 ...
Running K = 5 ...
Running K = 6 ...
Running K = 7 ...
Running K = 8 ...
Running K = 9 ...
Running K = 10 ...
Running K = 11 ...
Running K = 12 ...
Running K = 13 ...
Running K = 14 ...
Running K = 15 ...
Running K = 16 ...
Running K = 17 ...
Running K = 18 ...
Running K = 19 ...
Running K = 20 ...
Running K = 21 ...
Running K = 22 ...
Running K = 23 ...
Running K = 24 ...
Running K = 25 ...
Running K = 26 ...
Running K = 27 ...
Running K = 28 ...
Running K = 29 ...
Running K = 30 ...

The package I was able to go through the clusters and coseq ran successfully, and it seems to work! I am expecting that this route will produce the same results moving forward but I'd like to get your input on this.

ADD REPLY • link 5.1 years ago varunorama • 0

0

Entering edit mode

EDIT (3/4/19 3:54 EST): Looking at the summary of my results, I noticed that a lot of Clusters had errors:

> run <- coseq(assays(dds2_sig)$counts, normFactors = "DESeq", transformation = "logclr", 
+              model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, 
+              nstart = 1, iter.max = 10, K=2:30, alpha=0.05)
****************************************
coseq analysis: kmeans approach & logclr transformation
K = 2 to 30 
Use set.seed() prior to running coseq for reproducible results.****************************************
Running K = 2 ...
Running K = 3 ...
Running K = 4 ...
Running K = 5 ...
Running K = 6 ...
Running K = 7 ...
Running K = 8 ...
Running K = 9 ...
Running K = 10 ...
Running K = 11 ...
Running K = 12 ...
Running K = 13 ...
Running K = 14 ...
Running K = 15 ...
Running K = 16 ...
Running K = 17 ...
Running K = 18 ...
Running K = 19 ...
Running K = 20 ...
Running K = 21 ...
Running K = 22 ...
Running K = 23 ...
Running K = 24 ...
Running K = 25 ...
Running K = 26 ...
Running K = 27 ...
Running K = 28 ...
Running K = 29 ...
Running K = 30 ...
Warning messages:
1: In serialize(data, node$con) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con) :
  'package:stats' may not be available when loading
4: In serialize(data, node$con) :
  'package:stats' may not be available when loading
5: In serialize(data, node$con) :
  'package:stats' may not be available when loading
6: In serialize(data, node$con) :
  'package:stats' may not be available when loading
> summary(run)
*************************************************
K-means algorithm
Transformation: logclr
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
Clusters with errors: 3,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
Selected number of clusters via capushe: 14
*************************************************
Number of clusters = 14
*************************************************
Cluster sizes:
 Cluster 1  Cluster 2  Cluster 3  Cluster 4  Cluster 5  Cluster 6  Cluster 7  Cluster 8  Cluster 9 Cluster 10 
        13       1997       1420       1431         80        255        301         27        158         90 
Cluster 11 Cluster 12 Cluster 13 Cluster 14 
        20        148        167        158

I take it that it may not be working as well as I assumed initially... :(

ADD REPLY • link 5.1 years ago varunorama • 0

0

Entering edit mode

Thanks for the extra detail -- I think I know what's going on here. Because the data here were processed using tximport prior to the differential analysis, the DESeqDataSet object does not include any normalization factors that can be accessed via sizeFactors(dds) (which is how coseq attempts to identify the normalization factors used in the differential analysis). Since sizeFactors(dds) = NULL and coseq is expecting one of the supported normalization factor types, that throws the error you saw.

For an initial work-around, I would suggest something like the following:

dds <- DESeqDataSetFromTximport(and.txi.rsem, colData = sampleTable, design = ~ dpa)
dds2 <- DESeq(dds, test=c("LRT"), reduced =~1)
res <- results(dds2, independentFiltering=FALSE)
run <- coseq(counts(dds2, norm=TRUE)[which(res$padj < 0.05),], 
             normFactors = "none", transformation = "logclr", 
             model = "kmeans", parallel = TRUE, meanFilterCutoff = 50, K=2:30)

Note that I extracted the normalized counts from the dds2 object using the counts accessor function from DESeq2 with norm = TRUE and then specified normFactors = "none" in coseq so that it wouldn't try to renormalize counts. Could you try this and see if it works for you?

Regarding the clusters that have errors, those represent clusters for with ifault from the kmeans function returned an indicator for a possible algorithm problem (i.e., lack of convergence). I would try leaving the nstart and iter.max parameters at their default values (iter.max = 50 and and nstart = 50) to fix this -- I used smaller values for these in the vignette simply to improve compilation time.

As to the usage of HTSFilter here (which is actually another package I developed, to filter weakly expressed genes in RNA-seq data prior to running a differential analysis), there is no reason you need to use it here unless you want to; it is not at all necessary for or connected to coseq for clustering.

ADD REPLY • link 5.1 years ago andrea.rau ▴ 80

0

Entering edit mode

Hello Andrea,

Thank you so very much - I followed your advice and instructions which allowed me to run coseq successfully! Your explanations were very helpful as to why the workaround was successful - thank you for a patient answer.

As for HTSFilter (thank you for this package btw, I've used it endlessly), yes I can see why it's not needed to input into coseq (I take it the [which(res$padj < 0.05),] part of the command will only consider the significant genes from the LRT test). Thanks for pointing that out.

Thanks for a great package!

ADD REPLY • link 5.1 years ago varunorama • 0

0

Entering edit mode

Wonderful, I'm glad it worked! And yes you're right, the which index will indeed only consider the genes significant at padj < 0.05 from the LRT test in my above code.

I'm going to add an answer below to close out this question, and I will try to add something in the coseq documentation to indicate what needs to be done with data processed via tximport.

Thanks for using both coseq and HTSFilter!

ADD REPLY • link 5.1 years ago andrea.rau ▴ 80

score 2 · Answer 1 · 2019-03-05

Once the model and transformation arguments of coseq were correctly specified, the problem here arose from the fact that the data were processed using tximport prior to the DESeq2 analysis, which meant that no normalization size factors were available in the DESeqDataSet object for coseq to use.

For a work-around, users with data processed using tximport can feed in the normalized counts from the DESeq2 object using the counts accessor function with norm = TRUE (and optionally, filter those normalized counts to include only those significant at the desired adjusted p-value threshold, i.e. res$padj < 0.05), and then specify normFactors = "none" in coseq to avoid renormalizing counts.

dds  <- DESeqDataSetFromTximport(...)
dds2 <- DESeq(dds, ...)
res  <- results(dds2, independentFiltering=FALSE)
run  <- coseq(counts(dds2, norm=TRUE)[which(res$padj < 0.05),], 
             normFactors = "none",  K=2:30)