Rsubread::cellCounts() result:

Question

cellCounts() function floating point error

1

Entering edit mode

J-E • 0

@82bf0bc5

Last seen 2.4 years ago

United States

I tested the R function 'cellCounts' [Rsubread package] on a simplisitc 2 samples scRNAseq data from a Mouse experiment. The function inputs reads from FastQ files. The cellCOunts() function requested 24 threads on a 24-CPU node with 90Gb of RAM. The job does not appear to have run out of memory and there has not been an alert about quotas in the last 3 days, so it does not appear disk space was an issue. In addition, the 'cellCounts' function seemed to have finished: the output says that it could map and count 554,000,000 reads in 41.5 mins. However, upon closer inspection of the log file and R output, 'cellCounts' did not return any value as expected, and the log file actually indicates that the job seems to have aborted at the very end due to a floating point exception (core dumped). See R output log below (edited for conciseness and privacy).

Floating point exceptions are caused by things like division by zero or overflow (trying to store a number too big for the data type). I believe R handles division by 0 gracefully, but the underlying C/C++ code would not.

Hopefully the developers/maintainers or someone can suggest some troubleshooting steps.

# 'M29.index' is the reference sequence index, succesfully built
# using Mouse Genome Reference (Release M29 - GRCm39,
# 'gencode.vM29.transcripts.fa.gz') from the GENCODE database.

# 'sample.sheet' is the dataframe referring to fastQ files
# (properly formatted - see output below).

fc <- cellCounts(index=file.path("M29.index", fsep=.Platform$file.sep),
                 sample=sample.sheet,
                 input.mode="FASTQ",
                 cell.barcode=NULL,
                 nsubreads=15,
                 minVotes=1,
                 maxMismatches=10,
                 minMappedLength=1,
                 annot.inbuilt="mm39",
                 nBestLocations=1,
                 useMetaFeatures=TRUE,
                 umi.cutoff=NULL,
                 nthreads=24)

# Results of running the following in an R session:
sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS:   /usr/local/gcc-6_3_0/lapack/3.7.0/lib/libblas.so.3.7.0
LAPACK: /usr/local/gcc-6_3_0/lapack/3.7.0/lib/liblapack.so.3.7.0

Random number generation:
 RNG:     L'Ecuyer-CMRG
 Normal:  Inversion
 Sample:  Rejection

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsubread_2.10.0 edgeR_3.36.0    limma_3.50.3

loaded via a namespace (and not attached):
[1] compiler_4.0.2  Matrix_1.2-18   Rcpp_1.0.8.3    grid_4.0.2
[5] locfit_1.5-9.4  lattice_0.20-41


# R output log:
-------------------------------
My job will use the following nodes:
-------------------------------
compt253
-------------------------------
My job will use the following number of tasks:
-------------------------------
24
-------------------------------
Temporarily Directory ($PFSDIR):
-------------------------------
/scratch/pbsjobs/job.16967554.hpc
-------------------------------

> library("limma")
> library("edgeR")
> library("Rsubread")
> fc <- cellCounts(index=file.path("M29.index", fsep=.Platform$file.sep),
+                  sample=sample.sheet,
+                  input.mode="FASTQ",
+                  cell.barcode=NULL,
+                  nsubreads=15,
+                  minVotes=1,
+                  maxMismatches=10,
+                  minMappedLength=1,
+                  annot.inbuilt="mm39",
+                  nBestLocations=1,
+                  useMetaFeatures=TRUE,
+                  umi.cutoff=NULL,
+                  nthreads=24)

NCBI RefSeq annotation for mm39 (build 39) is used.
    SampleName
1 GC-CC-1081_1
2 GC-CC-1081_1
3 GC-CC-1081_2
4 GC-CC-1081_2
                                                                                                              BarcodeUMIFile
1   /home/.../GC-CC-1081_1_S2_L001_R1_001.fastq.gz
2   /home/.../GC-CC-1081_1_S2_L002_R1_001.fastq.gz
3 /home/.../GC-CC-1081_2_S3_L001_R1_001.fastq.gz
4 /home/.../GC-CC-1081_2_S3_L002_R1_001.fastq.gz
                                                                                                                    ReadFile
1   /home/.../GC-CC-1081_1_S2_L001_R2_001.fastq.gz
2   /home/.../GC-CC-1081_1_S2_L002_R2_001.fastq.gz
3 /home/.../GC-CC-1081_2_S3_L001_R2_001.fastq.gz
4 /home/.../GC-CC-1081_2_S3_L002_R2_001.fastq.gz

Found 3 known cell barcode sets.
Testing the cell barcodes in 3M-february-2018.txt.gz.
Loaded 6794880 cell barcodes from /home/.../.Rsubread/cellCounts/3M-february-2018.txt.gz
Cell barcode supporting rate : 97.5%.
Found cell-barcode list '3M-february-2018.txt.gz' for the input data: supported by 97.5% reads.
Param #0 = 'R_cellCounts'
Param #1 = '--inputMode'
Param #2 = 'FASTQ'
Param #3 = '--cellBarcodeFile'
Param #4 = '/home/.../.Rsubread/cellCounts/3M-february-2018.txt.gz'
Param #5 = '--dataset'
Param #6 = '/home/.../GC-CC-1081_1_S2_L001_R1_001.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|/home/.../GC-CC-1081_1_S2_L001_R2_001.fastq.gz|Rsd:cCounts:mFQs|/home/.../GC-CC-1081_1_S2_L002_R1_001.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|/home/.../GC-CC-1081_1_S2_L002_R2_001.fastq.gz|Rsd:cCounts:mFQs|/home/.../GC-CC-1081_2_S3_L001_R1_001.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|/home/.../GC-CC-1081_2_S3_L001_R2_001.fastq.gz|Rsd:cCounts:mFQs|/home/.../GC-CC-1081_2_S3_L002_R1_001.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|/home/.../GC-CC-1081_2_S3_L002_R2_001.fastq.gz'
Param #7 = '--sampleSheetFile'
Param #8 = './.Rsubread_cCounts_Tmp_for_Pid_22098_Rproc.samplesheet'
Param #9 = '--index'
Param #10 = '/home/.../M29.index'
Param #11 = '--annotation'
Param #12 = '/home/.../.usr/local/R_packages/R-4.0.2/Rsubread/annot/mm39_RefSeq_exon.txt.gz'
Param #13 = '--geneIdColumn'
Param #14 = 'gene_id'
Param #15 = '--annotationType'
Param #16 = 'exon'
Param #17 = '--threads'
Param #18 = '24'
Param #19 = '--output'
Param #20 = './.Rsubread_cCounts_Tmp_for_Pid_22098_Rproc'
Param #21 = '--maxMismatch'
Param #22 = '10'
Param #23 = '--minVotesPerRead'
Param #24 = '1'
Param #25 = '--subreadsPerRead'
Param #26 = '15'
Param #27 = '--reportedAlignmentsPerRead'
Param #28 = '1'
Param #29 = '--maxDiffToTopVotes'
Param #30 = '2'
Param #31 = '--minMappedLength'
Param #32 = '1'
Param #33 = '--umiCutoff'
Param #34 = '-999'
Param #35 = '--reportMultiMappingReads'
UMI_CUT=-999.00
INIT LOCKER 1 at 0x7f957c9b9610
|| Load the 1-th index block...                                               ||
|| The index block has been loaded.                                           ||
Mapping and counting: 0  ; 2.0 mins
Mapping and counting: 1000000  ; 2.6 mins
Mapping and counting: 2000000  ; 2.7 mins
Mapping and counting: 3000000  ; 2.7 mins
Mapping and counting: 4000000  ; 2.8 mins
Mapping and counting: 5000000  ; 2.9 mins
Mapping and counting: 6000000  ; 2.9 mins
Mapping and counting: 7000000  ; 3.0 mins
Mapping and counting: 8000000  ; 3.1 mins
Mapping and counting: 9000000  ; 3.1 mins
Mapping and counting: 10000000  ; 3.2 mins
Mapping and counting: 11000000  ; 3.3 mins
Mapping and counting: 12000000  ; 3.3 mins
Mapping and counting: 13000000  ; 3.4 mins
Mapping and counting: 14000000  ; 3.5 mins
Mapping and counting: 15000000  ; 3.5 mins
Mapping and counting: 16000000  ; 3.6 mins
Mapping and counting: 17000000  ; 3.7 mins
Mapping and counting: 18000000  ; 3.7 mins
Mapping and counting: 19000000  ; 3.8 mins
Mapping and counting: 20000000  ; 3.9 mins
Mapping and counting: 21000000  ; 3.9 mins
Mapping and counting: 22000000  ; 4.0 mins
Mapping and counting: 23000000  ; 4.1 mins
Mapping and counting: 24000000  ; 4.1 mins
Mapping and counting: 25000000  ; 4.2 mins
Mapping and counting: 26000000  ; 4.3 mins
Mapping and counting: 27000000  ; 4.3 mins
Mapping and counting: 28000000  ; 4.4 mins
Mapping and counting: 29000000  ; 4.5 mins

[...]

HICONF MAPPING (SIMPLE) = 0, LOWCONF MAPPING (ALL SUBREADS, NOT SIMPLE) = 0
Mapping and counting: 540000000  ; 40.3 mins
Mapping and counting: 541000000  ; 40.3 mins
Mapping and counting: 542000000  ; 40.4 mins
Mapping and counting: 543000000  ; 40.5 mins
Mapping and counting: 544000000  ; 40.6 mins
Mapping and counting: 545000000  ; 40.6 mins
Mapping and counting: 546000000  ; 40.7 mins
Mapping and counting: 547000000  ; 41.0 mins
Mapping and counting: 548000000  ; 41.1 mins
Mapping and counting: 549000000  ; 41.2 mins
Mapping and counting: 550000000  ; 41.2 mins
Mapping and counting: 551000000  ; 41.3 mins
Mapping and counting: 552000000  ; 41.4 mins
Mapping and counting: 553000000  ; 41.4 mins
Mapping and counting: 554000000  ; 41.5 mins
HICONF MAPPING (SIMPLE) = 0, LOWCONF MAPPING (ALL SUBREADS, NOT SIMPLE) = 0

After processing batches, 0 UMIs were removed in step2 of UMI merging.
/tmp/slurm/job16967554/slurm_script: line 40: 22098 Floating point exception(core dumped) Rscript APOE_hpc.r

Rsubread • 4.2k views

ADD COMMENT • link 2.6 years ago • updated 2.4 years ago J-E • 0

0

Entering edit mode

Your R is almost 2 years old. You will need to update it to its latest version (4.2) to run it with latest Rsubread.

ADD REPLY • link 2.6 years ago Wei Shi ★ 3.6k

0

Entering edit mode

1) I initially did not think it was going to be an issue. I actually just tried under a newer version of R already installed on the cluster R-4.1.1. I experienced the same floating point error (running the same code, same machine). This apparently happens during feature counting because the "bam" files are generated. I'm honestly surprised that R.4.2.0 is absolutely required to run Rsubread 2.10.0 and that treatment of scRNAseq by cellCounts() did not work until R 4.2.0 was released approx 2 weeks ago.

2) Can align() and featureCounts() also be used for scRNAseq data?

ADD REPLY • link 2.6 years ago J-E • 0

1

Entering edit mode

You can use align() to map FASTQ-format scRNAseq reads, however this is not recommended. cellCounts implements a read mapping algorithm that is similar to align() but is more sensitive (we found that more sensitive mapping works better for 10x reads). So you should use cellCounts for 10x read mapping instead of align().

featureCounts() generates read counts, but it is unable to generate UMI counts. So you cant use it for scRNAseq data. However, featureCounts() is used in cellCounts to assign reads to genes and then cellCounts will create UMI counts based on the read assignment results and other information.

ADD REPLY • link 2.6 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Wei, could you provide a link to a dummy synthetic example (e.g. as in the Rsubread vignette) or a small real example (as in the Rsubread UserGuide Case Studies) of scRNAseq that you know works with cellCounts() with Rsubread 2.10.0 under R.4.2.0. That way, I can test whether the error I encounter comes from my data or not. Thanks!

ADD REPLY • link 2.6 years ago J-E • 0

1

Entering edit mode

Yes we are currently working on generating a small example for cellCounts.

ADD REPLY • link 2.6 years ago Wei Shi ★ 3.6k

0

Entering edit mode

I tested the cellCounts() function again on the new example provided in version 2.10.3, using R 4.2.0 and Biocionductor 3.15. The example runs successfully until after the internal call to featureCounts. However, it fails at the end when it tries to re-collect the results using 'readMM()' from dependecy 'Matrix'. See error message :

Error in readMM(paste0(fn, ".spmtx")) : could not find function "readMM"

I think you should prefix 'readMM' with 'Matrix::readMM()' in internal function .read.sparse.mat(). And no need to attach the library 'Matrix' since it's already in the namespace.

ADD REPLY • link 2.5 years ago J-E • 0

0

Entering edit mode

Thanks for the report but the error cannot be reproduced. We tested Rsubread 2.10.3 in R 4.2.0 and Bioconductor 3.15, and cellCounts didn't report any error messages. We always test our programs before every release of Rsubread in many platforms, and we haven't had similar errors before. Given that the readMM function had been in the Matrix package for many years (at least in 2015), we always assume that this function is available when cellCounts loaded the Matrix package.

Can you provide the sessionInfo() output from R 4.2.0 and Rsubread 2.10.3 after cellCounts reports the error? Also, can you try running this in your R environment:

> library(Matrix)
> readMM

to see if you can have the sourcecode of the readMM function shown on screen?

ADD REPLY • link 2.5 years ago Yang Liao ▴ 450

0

Entering edit mode

I tested again the cellCounts() function on the new scNAseq dataset and example provided in Rsubread 2.10.4, using R 4.2.0 and Biocionductor 3.15.

1) Even though the namespace correctly includes the line 'importFrom("Matrix", "readMM")', Matrix::readMM would import only if the package 'Matrix' is manually attached!

2) cellCounts() successfully runs the Rsubread example up to the building of the index and the read mapping part, but fails catastrophically in the featureCounts part. The R session crashes! In version 2.10.0, it was due to a floating point error. In this version, it's due to a memory allocation overflow. I reproduced the problem in a Windows and Linux machine, each with Rsubread 2.10.4, R 4.2.0 and Biocionductor 3.15. See attached sesionInfo() output and crash screenshot below. So, these tests conclusively prove that the problem was neither due to the data nor the R version that I was using as previously suspected by the developper.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: 1 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: 1 parallel stats graphics grDevices utils datasets methods base

other attached packages: 1 BiocParallel_1.30.3 Matrix_1.4-1 Rsubread_2.10.4

loaded via a namespace (and not attached): 1 Rcpp_1.0.8.3 codetools_0.2-18 lattice_0.20-45 grid_4.2.0 R6_2.5.1
[6] lifecycle_1.0.1 scales_1.2.0 rlang_1.0.3 cli_3.3.0 rstudioapi_0.13
[11] tools_4.2.0 munsell_0.5.0 compiler_4.2.0 colorspace_2.0-3 BiocManager_1.30.18

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 2.10.4

//================================= setting ==================================\ || || || Function : Read alignment (RNA-Seq) || || Input file : 1 samples from scRNA-seq || || Output file : .Rsubread_cCounts_Tmp_for_Pid_623021_Rproc (BAM) || || Index name : chr1.index || || || || ------------------------------------ || || || || Threads : 18 || || Phred offset : 33 || || Min votes : 3 / 10 || || Max mismatches : 3 || || Max indel length : 5 || || Report multi-mapping reads : yes || || Max alignments per multi-mapping read : 1 || || Annotations : inbuilt (hg38) || || || \============================================================================//

//================ Running (30-Jun-2022 13:19:47, pid=623021) ================\ || || ...

|| || || Completed successfully. || || || \==================================== ====================================//

//================================ Summary =================================\ || || || Total reads : 403,390 || || Mapped : 401,740 (99.6%) || || Uniquely mapped : 382,378 || || Multi-mapping : 19,362 || || || || Unmapped : 1,650 || || || || Indels : 11 || || || || Running time : 0.6 minutes || || || \============================================================================//

NCBI RefSeq annotation for hg38 (build 38.2) is used.

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 2.10.4

//========================== featureCounts setting ===========================\ || || || Input files : 1 BAM file || || || || .Rsubread_cCounts_Tmp_for_Pid_623021_Rproc || || || || Paired-end : no || || Count read pairs : no || || Annotation : inbuilt (hg38) || || Dir for temp files : . || || || || scRNA count table : <input_file>.scRNA.table || || scRNA sample sheet : .Rsubread_cCounts_Tmp_for_Pid_623021_Rproc.s ... || || scRNA barcode list : Barcodes || || Threads : 18 || || Level : meta-feature level || || Multimapping reads : counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || \============================================================================//

//================================= Running ==================================\ || || || Load annotation file hg38_RefSeq_exon.txt ... || || Features : 261752 || || Meta-features : 28395 || || Chromosomes/contigs : 55 || || || || Load scRNA-related files... || || scRNA samples : 1 || || scRNA cell barcodes : 0 || || || || Process BAM file .Rsubread_cCounts_Tmp_for_Pid_623021_Rproc... || || Strand specific : stranded || || scRNA quality control in first 20,000 reads: || || 100.0 pct reads have valid sample indices. || || 0.0 pct reads have valid cell barcodes. || || ||

ADD REPLY • link 2.5 years ago J-E • 0

score 3 · Accepted Answer · 2022-07-27

3

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 20 hours ago

Australia/Melbourne

Yes the use of different versions of mouse reference genome will cause differences in gene number and cell number. The number of cells called is determined based on the gene expression profiles in individual cells. The cellCounts inbuilt annotation is an RefSeq annotation, which is quite different from the Ensembl annotation used by CellRanger. This annotation difference contributed to the difference in quantification results as well. Also, cellCounts and CellRanger use different algorithms for read mapping and read assignment. In cellCounts, read mapping was performed based on a modified Subread mapping algorithm and read assignment is performed based on the featureCounts method.

ADD COMMENT • link 2.4 years ago Wei Shi ★ 3.6k

1

Entering edit mode

Thanks for your time and patience. Problems solved!

The following are just suggestions for improvement:

1) Your namespace correctly includes the line 'importFrom("Matrix", "readMM")'. So you don't have to attach the library 'Matrix' internally. If you want, for safety (not required), just prefix 'readMM' with 'Matrix::readMM()' in internal function .read.sparse.mat().

2) There is a warning during the Windows installation: "this package has a non-empty 'configure.win' file, so building only the main architecture". The fix is to force Biarch to true ('Biarch: true' in your DESCRIPTION file) to have it build on i386 or x64 when a configure.win file is present. See post at: https://community.rstudio.com/t/configure-win-and-cran-submission/24684

ADD REPLY • link 2.4 years ago J-E • 0

score 2 · Accepted Answer · 2022-06-30

2

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 20 hours ago

Australia/Melbourne

Hi J-E, thank you for providing further information. We had a close look at the information you provided and found that although your session info shows that the latest version of Rsubread was imported, your cellCounts function was actually not from the latest Rsubread. See the figure below for what the screen output of cellCounts (version 2.10.2 and newer) should look like for this example. There seems to be issues with your Rsubread installation and I would suggest you removing all versions of Rsubread installed on your computer and then reinstalling the latest Rsubread.

cellCounts screen ouput

ADD COMMENT • link 2.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

1) The output you can see is from the Rsubread 2.10.4 package that was downloaded from Bioconductor and installed from the source tar.gz file. As you can see from the screen shot you provided, the cellCounts() function that you use is from the Rsubread 2.10.2! So, what version are we supposed to use, and where are we supposed to get it from?? Thanks

2) Also, FYI, I'm providing below the output from ANOTHER computer, and this time running Windows (10). First, see message below during installation. "this package has a non-empty 'configure.win' file, so building only the main architecture". Based on some research I just did, the fix would be to Force Biarch to true ('Biarch: true') in your DESCRIPTION file to have it build on i386 or x64 when a configure.win file is present. Thanks.

Second, using now a fresh and clean installation of Rsubread 2.10.4 ... R crashes!

sessionInfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] BiocParallel_1.30.3 Matrix_1.4-1 Rsubread_2.10.4 edgeR_3.38.0 limma_3.52.0

loaded via [1] MatrixGenerics_1.8.0 [5] assertthat_0.2.1 [9] GenomeInfoDbData_1.2.8 [13] pillar_1.7.0 [17] digest_0.6.29 [21] pkgconfig_2.0.3 [25] tibble_3.1.7 [29] ellipsis_0.3.2 [33] BiocGenerics_0.42.0 [37] memoise_2.0.1 [41] prettyunits_1.1.1 [45] lifecycle_1.0.1 [49] DelayedArray_0.22.0 [53] GenomeInfoDb_1.32.2 [57] rstudioapi_0.13 [61] restfulr_0.0.13 [65] R6_2.5.1 [69] fastmap_1.1.0 [73] stringi_1.7.6 [77] dbplyr_2.2.0 a namespace (and not attached): Biobase_2.56.0 httr_1.4.3 bit64_4.0.5
stats4_4.2.0 BiocFileCache_2.4.0 blob_1.2.3
Rsamtools_2.12.0 yaml_2.3.5 progress_1.2.2
RSQLite_2.2.14 lattice_0.20-45 glue_1.6.2
GenomicRanges_1.48.0 XVector_0.36.0 XML_3.99-0.9
biomaRt_2.52.0 zlibbioc_1.42.0 purrr_0.3.4
KEGGREST_1.36.2 generics_0.1.2 IRanges_2.30.0
cachem_1.0.6 SummarizedExperiment_1.26.1 GenomicFeatures_1.48.0
cli_3.3.0 magrittr_2.0.3 crayon_1.5.1
fansi_1.0.3 xml2_1.3.3 tools_4.2.0
hms_1.1.1 matrixStats_0.62.0 BiocIO_1.6.0
stringr_1.4.0 S4Vectors_0.34.0 locfit_1.5-9.5
AnnotationDbi_1.58.0 Biostrings_2.64.0 compiler_4.2.0
rlang_1.0.2 grid_4.2.0 RCurl_1.98-1.7
rjson_0.2.21 rappdirs_0.3.3 bitops_1.0-7
codetools_0.2-18 DBI_1.1.3 curl_4.3.2
GenomicAlignments_1.32.0 dplyr_1.0.9 rtracklayer_1.56.0
bit_4.0.4 utf8_1.2.2 filelock_1.0.2
Rcpp_1.0.8.3 vctrs_0.4.1 png_0.1-7
tidyselect_1.1.2

SampleName BarcodeUMIFile ReadFile 1 Example C:/Users/Admin/Documents/CODES/R/BIOINFO/Rsubread/Data/Real2/reads_R1.fastq.gz .../reads_R2.fastq.gz

readfile1 (b4 check): ...\reads_R1.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|...\reads_R2.fastq.gz

readfile1 (after check): ...\reads_R1.fastq.gz|Rsd:cCounts:1mFQ|.|Rsd:cCounts:1mFQ|...\reads_R2.fastq.gz

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 2.10.4

//================================= setting ==================================\ || || || Function : Read alignment (RNA-Seq) || || Input file : 1 samples from scRNA-seq || || Output file : .Rsubread_cCounts_Tmp_for_Pid_35648_Rproc (BAM) || || Index name : chr1.index || || || || ------------------------------------ || || || || Threads : 8 || || Phred offset : 33 || || Min votes : 3 / 10 || || Max mismatches : 3 || || Max indel length : 5 || || Report multi-mapping reads : yes || || Max alignments per multi-mapping read : 1 || || Annotations : inbuilt (hg38) || || || \============================================================================//

//================ Running (30-Jun-2022 23:18:24, pid=35648) =================\ || || || Check the input reads. || || The input file contains base space reads. || || Initialise the memory objects. || || Estimate the mean read length. || || Create the output BAM file. || || Check the index. || || Init the voting space. || || Load the annotation file. || || 261752 annotation records were loaded. || || || || Chromosomes/contigs in annotation but not in index : ||

[...]

|| Load the 1-th index block... || || The index block has been loaded. || || Start read mapping in chunk. || || || || Completed successfully. || || || \==================================== ====================================//

//================================ Summary =================================\ || || || Total reads : 403390 || || Mapped : 401740 (99.6%) || || Uniquely mapped : 382378 || || Multi-mapping : 19362 || || || || Unmapped : 1650 || || || || Indels : 11 || || || || Running time : 1.0 minutes || || || \============================================================================//

NCBI RefSeq annotation for hg38 (build 38.2) is used.

[R session crashed (Bomb)]

ADD REPLY • link 2.5 years ago J-E • 0

1

Entering edit mode

Sorry for the confusion on the version numbers. I have modified my reply to clarify this. The screen output of cellCounts version 2.10.2 is identical to its version 2.10.4.

Regarding the installation of Rsubread, if you have looked at the Bioc webpage for Rsubread package you will find the recommended (also easiest) way of installing Rsubread is this:

To install this package, start R (version "4.2") and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Rsubread")

ADD REPLY • link 2.5 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Thanks for your patience. There was an object 'cellCounts' in my workspace that was masking the most updated function from Rsubread 2.10.4. A rookie oversight on my end... So, that explains the cellCounts() output problem that I had. Moving forward, I tested the Rsubread 2.10.4 cellCounts() function on the example scNAseq dataset. This example runs well on different machines R 4.2.0 or even running R 4.1.0! First issue resolved.

Remaining major issue:

However, when I tested again Rsubread 2.10.4 cellCounts() on my real scRNAseq dataset (same FastQ input files as in the initial post) on a machine running R 4.2.0 and Biocionductor 3.15 (as instructed by developper) the R session crashes. The function seems to bump into memory issues after the alignment/mapping part is finished. We checked that it was not due to a resources issue: the session used only 80% of memory and there was enough disk space as well. Also, we checked that the 10X Genomics cellRanger count function could successfully map and generate raw counts with the the same FastQ input files. So, there is no issue with this dataset either. See attached R sessionInfo() and cellCOunts() output.

Thanks for your help.

|| Mapping and counting : 277000000 reads; time elapsed : 16.5 mins || buffer overflow detected : terminated [709350:709350:20220705,105008.257070:ERROR process_memory_range.cc:86] read out of range [709350:709350:20220705,105008.257148:ERROR elf_image_reader.cc:558] missing nul-terminator [709350:709350:20220705,105008.257334:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265038:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265118:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265188:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265255:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265323:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265395:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.265984:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266049:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266586:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266649:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266710:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266776:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266840:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266904:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.266967:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267027:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267088:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267152:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267240:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267302:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267362:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267428:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267489:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267549:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267615:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709350:20220705,105008.267677:ERROR elf_dynamic_array_reader.h:61] tag not found [709350:709352:20220705,105008.291238:ERROR directory_reader_posix.cc:42] opendir: No such file or directory (2)

===============================================================================================================================

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] devtools_2.4.3 usethis_2.1.6 Rsubread_2.10.4 BiocParallel_1.30.3

loaded via a namespace (and not attached): [1] rstudioapi_0.13 magrittr_2.0.3 munsell_0.5.0 pkgload_1.3.0 colorspace_2.0-3
[6] lattice_0.20-45 R6_2.5.1 rlang_1.0.3 fastmap_1.1.0 tools_4.2.0
[11] grid_4.2.0 pkgbuild_1.3.1 sessioninfo_1.2.2 cli_3.3.0 ellipsis_0.3.2
[16] remotes_2.4.2 lifecycle_1.0.1 crayon_1.5.1 Matrix_1.4-1 processx_3.6.1
[21] BiocManager_1.30.18 purrr_0.3.4 callr_3.7.0 fs_1.5.2 ps_1.7.1
[26] codetools_0.2-18 memoise_2.0.1 glue_1.6.2 cachem_1.0.6 compiler_4.2.0
[31] scales_1.2.0 prettyunits_1.1.1

ADD REPLY • link 2.5 years ago J-E • 0

0

Entering edit mode

I assume that you downloaded the source code of Rsubread (v2.10.4) from https://bioconductor.org/packages/release/bioc/html/Rsubread.html ?

I would suggest you to open the source package that you downloaded, and see the content of cellCounts.R in the /R/ directory, and see if it calls the featureCounts function (not in the .old_cellCounts function, but in the cellCounts function in this R file). That is why I don't know why you consistently say about featureCounts, and gave the screen output from featureCounts in your posts. CellCounts has stopped calling both align() and featureCounts() from late last year.

Also, I suggest you to open /src/cell-counts.c to see if it contains the cellCounts ASCII art letters from line 577 to line 581. That is where the ASCII art letters generated in the screen output that you saw in this answer.

I have just downloaded the Rsubread_2.10.4.tar.gz source code package from the aforementioned URL (SHA256 checksum 349fe4942612b7192d6de0b31b7e686d4a10e90934c67cf9d1163fb55e11b094; MD5 checksum 2f1c3c2c84058dd7679d038b108e10e4; SHA1 chicksum 4d44dda2b3e1fdf21d4fb91396836a937c26d5de) and just checked both files. I am sure that they should have the behaviour that I expected.

ADD REPLY • link 2.5 years ago Yang Liao ▴ 450

0

Entering edit mode

Another test on another Linux machine with a larger RAM and processor resource (requested 24 threads and mem=180gb on icosa256gb chip with 40 cpus). Once again, I tested Rsubread 2.10.4 cellCounts() on my real scRNAseq dataset (same FastQ input files as in the initial post) on a machine running R 4.2.0 and Biocionductor 3.15 (as instructed by developper). The R session crashes after the alignment/mapping part is finished. This time, the error is the exact same "Floating point exception(core dumped)" as initially reported (05-06-2022). Don't know it it is the same as the the one last reported that seems to be related to a memory issue (07-05-2022). We checked that it was not due to a resources issue, and there was enough disk space as well. As previously mentioned the 10X Genomics cellRanger count function could successfully map and generate raw counts with the the same FastQ input files. So, there is no issue with this dataset either. See attached R sessionInfo() and cellCOunts() output.

Thanks for your help.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default BLAS/LAPACK: /usr/local/easybuild_allnodes/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] devtools_2.4.3 usethis_2.1.6 Rsubread_2.10.4 edgeR_3.38.1
[5] limma_3.52.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.9 magrittr_2.0.3 pkgload_1.2.4 lattice_0.20-45
[5] R6_2.5.1 rlang_1.0.4 fastmap_1.1.0 tools_4.2.0
[9] pkgbuild_1.3.1 grid_4.2.0 sessioninfo_1.2.2 cli_3.3.0
[13] withr_2.5.0 remotes_2.4.2 ellipsis_0.3.2 rprojroot_2.0.3
[17] lifecycle_1.0.1 crayon_1.5.1 brio_1.1.3 processx_3.6.1
[21] Matrix_1.4-1 purrr_0.3.4 callr_3.7.0 fs_1.5.2
[25] ps_1.7.1 testthat_3.1.4 memoise_2.0.1 glue_1.6.2
[29] cachem_1.0.6 compiler_4.2.0 desc_1.4.1 prettyunits_1.1.1 [33] locfit_1.5-9.6

Aorta_PM_F_1.cc <- cellCounts(index=file.path(HOME.path, "DATASETS/INDEX_FILES/M29/M29.index", fsep=.Platform$file.sep),

sample=sample.sheet,

input.mode="FASTQ",

cell.barcode=file.path(HOME.path, "DATASETS/10X_BARCODES/cellCounts", fsep=.Platform$file.sep),

annot.inbuilt="mm39",

nBestLocations=1,

useMetaFeatures=TRUE,

umi.cutoff=NULL,

nthreads=24) NCBI RefSeq annotation for mm39 (build 39) is used.
| | | / ___ _ _ _ | |_ ___ / __/ _ \ | |/ / / _ | | | | '_ | __/ __| | (_| __/ | / /__| (_) | |_| | | | | |__ \ ___|_|___/_/ \,_|_| |_|_|__/ Rsubread 2.10.4

//=========================== cellCounts settings ============================\ || || || Index : /home/[...]/INDEX_FILES/M29/M29.index || || Input mode : FASTQ files || || || \============================================================================//

//============================================================================\ || || || Started running. || || || || Load the 1-th index block... || || The index block has been loaded. Now map the reads... || || Mapping and counting : 0 reads; time elapsed : 0.7 mins || || Mapping and counting : 1000000 reads; time elapsed : 0.8 mins || || Mapping and counting : 2000000 reads; time elapsed : 0.8 mins || || Mapping and counting : 3000000 reads; time elapsed : 0.9 mins ||

[...]

|| Mapping and counting : 276000000 reads; time elapsed : 18.8 mins || || Mapping and counting : 277000000 reads; time elapsed : 18.9 mins || || Generate UMI counts... || /var/spool/slurmd/job13513/slurm_script: line 43: 21056 Floating point exception(core dumped) Rscript /home/[...]

ADD REPLY • link 2.4 years ago J-E • 0

1

Entering edit mode

Do the chr names in the reference genome you used for index building match the chr names in the inbuilt gene annotation? The inbuilt annotation has chr name like 'Chr1', 'Chr2' ... . If the chr names do not match, then no reads can be assigned and no UMI count will be generated and this could cause a crash. Reference genomes included in the GENCODE database have the chr name format same as that in the inbuilt annotation.

ADD REPLY • link 2.4 years ago Wei Shi ★ 3.6k

0

Entering edit mode

To make sure that my reference genome has the same chr name format as that in the inbuilt annotation (as instructed), I have rebuilt my index files using the primary genome assembly file "GRCm39.primary_assembly.genome.fa.gz" from the GENCODE database: Mouse (Release M30 - GRCm39) at https://www.gencodegenes.org/mouse/release_M30.html. The build was successful according to the buildindex() output. I ran the exact same test as before with same resources and the same machine. Surprisingly, the cellCounts() output did change with a different error after the alignment/mapping part is finished ( caught segfault address 0x7ee2d3083018, cause 'invalid permissions'). See attached R sessionInfo() and cellCOunts() output. Please, note that the same error happens if I use the Ensembl reference genome primary genome assembly file (available at https://useast.ensembl.org/Mus_musculus/Info/Index).

Thanks for your help.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default BLAS/LAPACK: /usr/local/easybuild_allnodes/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] devtools_2.4.3 usethis_2.1.6 Rsubread_2.10.4 edgeR_3.38.1
[5] limma_3.52.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.9 magrittr_2.0.3 pkgload_1.2.4 lattice_0.20-45
[5] R6_2.5.1 rlang_1.0.4 fastmap_1.1.0 tools_4.2.0
[9] pkgbuild_1.3.1 grid_4.2.0 sessioninfo_1.2.2 cli_3.3.0
[13] withr_2.5.0 remotes_2.4.2 ellipsis_0.3.2 rprojroot_2.0.3
[17] lifecycle_1.0.1 crayon_1.5.1 brio_1.1.3 processx_3.6.1
[21] Matrix_1.4-1 purrr_0.3.4 callr_3.7.0 fs_1.5.2
[25] ps_1.7.1 testthat_3.1.4 memoise_2.0.1 glue_1.6.2
[29] cachem_1.0.6 compiler_4.2.0 desc_1.4.1 prettyunits_1.1.1 [33] locfit_1.5-9.6

Aorta_PM_F_1.cc <- cellCounts(index=file.path(HOME.path, "DATASETS/INDEX_FILES/Gencode_Index/GRCm39/GRCm39.index", fsep=.Platform$file.sep),

sample=sample.sheet,

input.mode="FASTQ",

cell.barcode=file.path(HOME.path, "DATASETS/10X_BARCODES/cellCounts", fsep=.Platform$file.sep),

annot.inbuilt="mm39",

nBestLocations=1,

useMetaFeatures=TRUE,

umi.cutoff=NULL,

nthreads=24)

NCBI RefSeq annotation for mm39 (build 39) is used. _ _ ___ _
___ ___| | | / ___ _ _ _ | |_ ___ / __/ _ \ | |/ / / _ | | | | '_ | __/ __| | (_| __/ | / /__| (_) | |_| | | | | |__ \ ___|_|___/_/ \,_|_| |_|_|__/ Rsubread 2.10.4

//=========================== cellCounts settings ============================\ || || || Index : /home/[...]/Gencode_Index/GRCm3 ... || || Input mode : FASTQ files || || || \============================================================================//

//============================================================================\ || || || Started running. || || || || Load the 1-th index block... || || The index block has been loaded. Now map the reads... || || Mapping and counting : 0 reads; time elapsed : 0.7 mins || || Mapping and counting : 1000000 reads; time elapsed : 0.8 mins || || Mapping and counting : 2000000 reads; time elapsed : 0.9 mins || || Mapping and counting : 3000000 reads; time elapsed : 1.0 mins ||

[...]

|| Mapping and counting : 275000000 reads; time elapsed : 20.0 mins || || Mapping and counting : 276000000 reads; time elapsed : 20.1 mins || || Mapping and counting : 277000000 reads; time elapsed : 20.2 mins ||

caught segfault address 0x7ee2d3083018, cause 'invalid permissions'

ADD REPLY • link 2.4 years ago J-E • 0

1

Entering edit mode

Can you show the first few rows of the file you provided to the parameter cell.barcode? You don't have to set this parameter as cellCounts is able to automatically detect barcodes for you.

ADD REPLY • link 2.4 years ago Wei Shi ★ 3.6k

1

Entering edit mode

Yep. Finally, that solved it! See attached R sessionInfo() and cellCounts() output.

However, I have a follow up question. The sizes of count matrices generated by Rsubread::cellCounts (40838 x 36418) and 10X-Genomics Cell Ranger (32285 x 39322) do not match in sizes. The difference is big, even in the number of cells. Could this be because 10X-Genomics uses a different (proprietary) index file for mapping to the reference genome (mm10)? That could explain the difference in number of genes, but would that account for the difference in the number of cells as well?

Rsubread::cellCounts() uses the primary genome assembly file "GRCm39.primary_assembly.genome.fa.gz" from the GENCODE database: Mouse Release M30 - GRCm39 at https://www.gencodegenes.org/mouse/release_M30.html

10X-Genomics CellRanger() uses a pre-built reference genome data file containing a set of pre-generated indices and other data required by Cell Ranger. References - 2020-A (July 7, 2020). Mouse reference mm10 (GENCODE vM23/Ensembl 98) at https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-mm10-2020-A.tar.gz

Thanks for your help.

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default BLAS/LAPACK: /usr/local/easybuild_allnodes/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

Random number generation: RNG: L'Ecuyer-CMRG Normal: Inversion Sample: Rejection

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] devtools_2.4.3 usethis_2.1.6 Rsubread_2.10.4 edgeR_3.38.1
[5] limma_3.52.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.9 magrittr_2.0.3 pkgload_1.3.0 lattice_0.20-45
[5] R6_2.5.1 rlang_1.0.4 fastmap_1.1.0 tools_4.2.0
[9] pkgbuild_1.3.1 grid_4.2.0 sessioninfo_1.2.2 cli_3.3.0
[13] remotes_2.4.2 ellipsis_0.3.2 lifecycle_1.0.1 crayon_1.5.1
[17] processx_3.6.1 Matrix_1.4-1 purrr_0.3.4 callr_3.7.0
[21] fs_1.5.2 ps_1.7.1 memoise_2.0.1 glue_1.6.2
[25] cachem_1.0.6 compiler_4.2.0 prettyunits_1.1.1 locfit_1.5-9.6

Aorta_PM_F_1.cc <- cellCounts(index=file.path(HOME.path, "DATASETS/INDEX_FILES/Gencode_Index/GRCm39/GRCm39.index", fsep=.Platform$file.sep),

sample=sample.sheet,

input.mode="FASTQ",

annot.inbuilt="mm39",

nBestLocations=1,

useMetaFeatures=TRUE,

umi.cutoff=NULL,

nthreads=24) NCBI RefSeq annotation for mm39 (build 39) is used. Found 3 known cell barcode sets. Testing the cell barcodes in 3M-february-2018.txt.gz. Loaded 6794880 cell barcodes from /home/jxd101/.Rsubread/cellCounts/3M-february-2018.txt.gz Cell barcode supporting rate : 97.5%. Found cell-barcode list '3M-february-2018.txt.gz' for the input data: supported by 97.5% reads.
| | | / ___ _ _ _ | |_ ___ / __/ _ \ | |/ / / _ | | | | '_ | __/ __| | (_| __/ | / /__| (_) | |_| | | | | |__ \ ___|_|___/_/ \,_|_| |_|_|__/ Rsubread 2.10.4

//=========================== cellCounts settings ============================\ || || || Index : /home/jxd101/DATASETS/INDEX_FILES/Gencode_Index/GRCm3 ... || || Input mode : FASTQ files || || || \============================================================================//

//============================================================================\ || || || Started running. || || || || Load the 1-th index block... || || The index block has been loaded. Now map the reads... || || Mapping and counting : 0 reads; time elapsed : 1.1 mins || || Mapping and counting : 1000000 reads; time elapsed : 1.1 mins || || Mapping and counting : 2000000 reads; time elapsed : 1.2 mins ||

[...]

|| Mapping and counting : 275000000 reads; time elapsed : 16.4 mins || || Mapping and counting : 276000000 reads; time elapsed : 16.5 mins || || Mapping and counting : 277000000 reads; time elapsed : 16.5 mins || || Generate UMI counts... || || || || Read mapping finished successfully. || || || \============================================================================//

Process the UMI table for sample 1 ... Temporary files generated for read mapping and counting have been deleted.

The cellCounts program has finished successfully.

Rsubread::cellCounts() result:

dim(Aorta_PM_F_1.cc$counts[[1]])

[1] 40838 36418

CellRanger() result:

Aorta_PM_F_1.mat <- as.matrix(assay(Aorta_PM_F_1)) dim(Aorta_PM_F_1.mat)

[1] 32285 39322

ADD REPLY • link 2.4 years ago J-E • 0