collapseReplicates error after tximeta
1
0
Entering edit mode
@79861138
Last seen 20 months ago
Latvia

Hi.

I`m trying to perform a DTE test using Swish as per fishpond vignette, but i would like to collapse technical replicates for some samples using the collapseReplicates function from DESeq2 package. First i read in the Salmon(v1.6.0) quantification data using the tximeta package and then try to run the collapseReplicates function on the resulting object. Unfortunately i get an error message for which i can not find any solution in the documentation or online forums (see code and output below). However, when i summarizeToGene and convert to DESeqDataSet, everything seems to be working perfectly fine in regards to the collapsing of technical replicates. From the collapseReplicates documentation it seems that it should work perfectly fine with tximeta output. What course of action should i take to collapse the replicates or am i mistaken and it's not possible unless summarized to gene level counts?

Thank you in advance.

> head(coldata, c(5,4))
               names ID group run
1  V300082504_L01_73 2T     T  73
2  V300082504_L01_75 3T     T  75
3  V300082504_L01_77 4T     T  77
4  V300082504_L01_78 4T     T  78
5 V300095289_L01_100 6T     T 100

> se = tximeta(coldata, type = 'salmon', )
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 
found matching transcriptome:
[ GENCODE - Homo sapiens - release 38 ]
loading existing TxDb created: 2021-11-13 00:50:22
loading existing transcript ranges created: 2021-11-13 00:50:23
fetching genome info for GENCODE

> se
class: RangedSummarizedExperiment 
dim: 236186 37 
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(23): counts abundance ... infRep19 infRep20
rownames(236186): ENST00000456328.2 ENST00000450305.2 ... ENST00000387460.2 ENST00000387461.2
rowData names(3): tx_id gene_id tx_name
colnames(37): V300082504_L01_73 V300082504_L01_75 ... V300095289_L01_101 V300095289_L01_103
colData names(4): names ID group run

> se = collapseReplicates(object = se, 
+                         groupby = se$ID,
+                         run = se$run,
+                         renameCols = T)
Error in collapseReplicates(object = se, groupby = se$ID, run = se$run,  : 
  sum(as.numeric(assay(object))) == sum(as.numeric(assay(collapsed))) is not TRUE

Session info:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Europe.utf8  LC_CTYPE=English_Europe.utf8    LC_MONETARY=English_Europe.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Europe.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fishpond_2.4.1              GenomicFeatures_1.50.4      AnnotationDbi_1.60.0        DESeq2_1.38.3               SummarizedExperiment_1.28.0
 [6] Biobase_2.58.0              MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2        GenomeInfoDb_1.34.8        
[11] IRanges_2.32.0              S4Vectors_0.36.1            BiocGenerics_0.44.0         tximeta_1.16.0              stringr_1.5.0              

loaded via a namespace (and not attached):
  [1] minqa_1.2.5                   colorspace_2.0-3              rjson_0.2.21                  ellipsis_0.3.2               
  [5] qvalue_2.30.0                 XVector_0.38.0                rstudioapi_0.14               bit64_4.0.5                  
  [9] interactiveDisplayBase_1.36.0 fansi_1.0.4                   xml2_1.3.3                    codetools_0.2-19             
 [13] splines_4.2.2                 tximport_1.26.1               cachem_1.0.6                  geneplotter_1.76.0           
 [17] jsonlite_1.8.4                nloptr_2.0.3                  Rsamtools_2.14.0              annotate_1.76.0              
 [21] dbplyr_2.3.0                  png_0.1-8                     pheatmap_1.0.12               shiny_1.7.4                  
 [25] readr_2.1.3                   BiocManager_1.30.19           compiler_4.2.2                httr_1.4.4                   
 [29] assertthat_0.2.1              Matrix_1.5-3                  fastmap_1.1.0                 lazyeval_0.2.2               
 [33] cli_3.6.0                     later_1.3.0                   htmltools_0.5.4               prettyunits_1.1.1            
 [37] tools_4.2.2                   gtable_0.3.1                  glue_1.6.2                    GenomeInfoDbData_1.2.9       
 [41] reshape2_1.4.4                dplyr_1.1.0                   rappdirs_0.3.3                Rcpp_1.0.10                  
 [45] vctrs_0.5.2                   Biostrings_2.66.0             nlme_3.1-162                  rtracklayer_1.58.0           
 [49] lme4_1.1-31                   mime_0.12                     lifecycle_1.0.3               restfulr_0.0.15              
 [53] ensembldb_2.22.0              gtools_3.9.4                  XML_3.99-0.13                 AnnotationHub_3.6.0          
 [57] zlibbioc_1.44.0               MASS_7.3-58.2                 scales_1.2.1                  vroom_1.6.1                  
 [61] hms_1.1.2                     promises_1.2.0.1              ProtGenerics_1.30.0           parallel_4.2.2               
 [65] AnnotationFilter_1.22.0       RColorBrewer_1.1-3            SingleCellExperiment_1.20.0   yaml_2.3.7                   
 [69] curl_5.0.0                    memoise_2.0.1                 ggplot2_3.4.0                 biomaRt_2.54.0               
 [73] stringi_1.7.12                RSQLite_2.2.20                BiocVersion_3.16.0            BiocIO_1.8.0                 
 [77] filelock_1.0.2                boot_1.3-28.1                 BiocParallel_1.32.5           rlang_1.0.6                  
 [81] pkgconfig_2.0.3               bitops_1.0-7                  archive_1.1.5                 lattice_0.20-45              
 [85] purrr_1.0.1                   GenomicAlignments_1.34.0      bit_4.0.5                     tidyselect_1.2.0             
 [89] plyr_1.8.8                    magrittr_2.0.3                R6_2.5.1                      generics_0.1.3               
 [93] DelayedArray_0.24.0           DBI_1.1.3                     withr_2.5.0                   pillar_1.8.1                 
 [97] svMisc_1.2.3                  abind_1.4-5                   KEGGREST_1.38.0               RCurl_1.98-1.10              
[101] tibble_3.1.8                  crayon_1.5.2                  utf8_1.2.3                    BiocFileCache_2.6.0          
[105] tzdb_0.3.0                    progress_1.2.2                locfit_1.5-9.7                grid_4.2.2                   
[109] blob_1.2.3                    digest_0.6.31                 xtable_1.8-4                  httpuv_1.6.8                 
[113] munsell_0.5.0
fishpond tximeta DESeq2 • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

The issue is that collapseReplicates just collapses counts and doesn't account for multiple assays which you need for e.g. Swish or other applications.

If it's just a few samples, a fast solution would be to quantify the runs together at the Salmon step:

salmon quant -i transcripts_index -l <LIBTYPE> \
  -1 reads01_1.fq  reads02_1.fq \
  -2 reads01_2.fq reads02_2.fq \
  -o sample

Given that Salmon takes a few minutes for a typical RNA-seq, this would just simplify your input, and simplify sharing the quantification directories later (we recommend uploading entire Salmon sample output directories when sharing summary data.

ADD COMMENT
0
Entering edit mode

Completely forgot about other assays i would need for swish.

Thank you, Michael!

ADD REPLY
0
Entering edit mode

A note: I've now changed collapseReplicates() to output a loud warning when it is run on a DESeqDataSet (SummarizedExperiment) with more than just a "counts" assay, and furthermore it will drop the other assays. This is to warn the user that other assays require some other form of collapsing (it's not safe to assume they should be summed, e.g. gene lengths don't sum, nor do TPM).

ADD REPLY

Login before adding your answer.

Traffic: 1031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6