profileplyr: how to access and replace data from a profileplyr object
0
1
Entering edit mode
@alexandreblais-23753
Last seen 27 days ago

Hello I am using profileplyr to manipulate data matrices generated by deepTools computeMatrix. I am finding it very useful in a number of application scenarios.

Now I would like to know if there is a proper way to access and to replace the numerical data stored in the matrix.

My computeMatrix object comes from RNA-seq. It has 17k rows (genes), 400 columns (region around the TSS of genes), in each of 3 samples. The 3 samples are A (all read pairs), B (read pairs subsetted to only those matching genes on the top strand), and C (read pairs matching genes on the bottom strand only). Thus I see me data as 3 matrices of 17000x400. Each row is associated to a GRanges where I have strand information.

My goal is to move the numerical data of a gene from sample B to sample C if this gene falls on the negative strand, and moving numerical data from sample C to sample B. This will help me display properly RNA-seq reads that come from upstream anti-sense transcription.

My profileplyr object is named "ppo_fixed"

I found that I can get the numerical data of the first row of samples B and C using :

assay(ppo_fixed[1,,2])
assay(ppo_fixed[1,,3])

This seems to work fine and I get the following, a matrix of one row and 400 columns (first few columns only from sample C):

```r [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [1,] 0 0 0 0 0 0 0 0 0 0 0 0 [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [1,] 0 0 0 0 0 0 0 0 0 0 0 0 [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50] [1,] 0 0 0 0 0.006 0.015 0.015 0.015 0.015 0.015 0.009 0 [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [1,] 0 0 0.006 0.015 0.015 0.015 0.015 0.015 0.021 0.036 0.045 0.045


However if I try to replace the row of data from one sample with the data from the other sample, it doesn't work:
```r
# swapping gene #1 data from sample 3 to sample 2
assay(ppo_fixed[1,,2]) <- assay(ppo_fixed[1,,3])
Error in slot(object, slot_name) :
  attempt to use zero-length variable name

I also tried to indicate the gene in the returned matrix using [1,] :

# swapping first row of the data matrix from sample 2 with first row of data from matrix of sample 3
assay(ppo_fixed[,,2])[1,] <- assay(ppo_fixed[,,3])[1,]

This actually returns no error upon execution, but I can no longer access the element, no matter how I proceed:

> assay(ppo_fixed[,,2])[1,] <- assay(ppo_fixed[,,3])[1,]
> assay(ppo_fixed[1,,2])
Error: subscript contains out-of-bounds indices
> assay(ppo_fixed[,,2])[1,]
Error: subscript contains out-of-bounds indices

**EDIT: Also, I find that with this method now ppo_fixed has only one sample, instead of the original 3

Would there be another way of achieving this?

Ultimately, my goal is to swap like this for the genes that are on the - strand:

ppo_fixed <- ppo_to_use
for(i in 1:nrow(ppo_to_use)){
    assay(ppo_fixed[i, ,2]) <- ifelse(as.character(strand(rowRanges(ppo_to_use))[i]) == "-", assay(ppo_to_use[i, , 3]), assay(ppo_to_use[i, , 2]))
    assay(ppo_fixed[i, ,3]) <- ifelse(as.character(strand(rowRanges(ppo_to_use))[i]) == "-", assay(ppo_to_use[i, , 2]), assay(ppo_to_use[i, , 3]))
}

Any help will be greatly appreciated. Thanks Alex

> sessionInfo( )
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/imkl/2018.3.222/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] rtracklayer_1.44.4          profileplyr_1.0.1
 [3] SummarizedExperiment_1.14.1 DelayedArray_0.10.0
 [5] BiocParallel_1.18.1         matrixStats_0.57.0
 [7] Biobase_2.44.0              GenomicRanges_1.36.1
 [9] GenomeInfoDb_1.20.0         IRanges_2.18.3
[11] S4Vectors_0.22.1            BiocGenerics_0.30.0

loaded via a namespace (and not attached):
  [1] ChIPseeker_1.20.0
  [2] circlize_0.4.11
  [3] fastmatch_1.1-0
  [4] soGGi_1.16.0
  [5] plyr_1.8.6
  [6] igraph_1.2.6
  [7] splines_3.6.0
  [8] ggplot2_3.3.2
  [9] gridBase_0.4-7
 [10] urltools_1.7.3
 [11] digest_0.6.27
 [12] GOSemSim_2.10.0
 [13] viridis_0.5.1
 [14] GO.db_3.8.2
 [15] magrittr_2.0.1
 [16] EnrichedHeatmap_1.14.0
 [17] memoise_1.1.0
 [18] cluster_2.0.8
 [19] ComplexHeatmap_2.0.0
 [20] Biostrings_2.52.0
 [21] graphlayouts_0.7.1
 [22] R.utils_2.10.1
 [23] enrichplot_1.4.0
 [24] prettyunits_1.1.1
 [25] jpeg_0.1-8.1
 [26] colorspace_2.0-0
 [27] blob_1.2.1
 [28] ggrepel_0.8.2
 [29] dplyr_1.0.2
 [30] crayon_1.3.4
 [31] RCurl_1.98-1.2
 [32] jsonlite_1.7.1
 [33] org.Mm.eg.db_3.8.2
 [34] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [35] chipseq_1.34.0
 [36] glue_1.4.2
 [37] polyclip_1.10-0
 [38] gtable_0.3.0
 [39] zlibbioc_1.30.0
 [40] XVector_0.24.0
 [41] UpSetR_1.4.0
 [42] GetoptLong_1.0.4
 [43] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.6
 [44] shape_1.4.5
 [45] scales_1.1.1
 [46] DOSE_3.10.2
 [47] pheatmap_1.0.12
 [48] DBI_1.1.0
 [49] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
 [50] Rcpp_1.0.5
 [51] plotrix_3.7-8
 [52] viridisLite_0.3.0
 [53] progress_1.2.2
 [54] clue_0.3-57
 [55] gridGraphics_0.5-0
 [56] bit_4.0.4
 [57] europepmc_0.4
 [58] preprocessCore_1.46.0
 [59] httr_1.4.2
 [60] fgsea_1.10.1
 [61] gplots_3.1.0
 [62] RColorBrewer_1.1-2
 [63] ellipsis_0.3.1
 [64] pkgconfig_2.0.3
 [65] XML_3.99-0.3
 [66] R.methodsS3_1.8.1
 [67] farver_2.0.3
 [68] locfit_1.5-9.4
 [69] ggplotify_0.0.5
 [70] tidyselect_1.1.0
 [71] rlang_0.4.8
 [72] reshape2_1.4.4
 [73] AnnotationDbi_1.46.1
 [74] munsell_0.5.0
 [75] tools_3.6.0
 [76] generics_0.1.0
 [77] RSQLite_2.2.1
 [78] ggridges_0.5.2
 [79] stringr_1.4.0
 [80] org.Hs.eg.db_3.8.2
 [81] bit64_4.0.5
 [82] tidygraph_1.2.0
 [83] caTools_1.18.0
 [84] purrr_0.3.4
 [85] ggraph_2.0.4
 [86] R.oo_1.24.0
 [87] DO.db_2.9
 [88] xml2_1.3.2
 [89] biomaRt_2.40.5
 [90] compiler_3.6.0
 [91] png_0.1-7
 [92] tibble_3.0.4
 [93] tweenr_1.0.1
 [94] stringi_1.5.3
 [95] GenomicFeatures_1.36.4
 [96] lattice_0.20-38
 [97] Matrix_1.2-17
 [98] vctrs_0.3.5
 [99] pillar_1.4.6
[100] lifecycle_0.2.0
[101] BiocManager_1.30.10
[102] triebeard_0.3.0
[103] GlobalOptions_0.1.2
[104] data.table_1.13.2
[105] cowplot_1.1.0
[106] bitops_1.0-6
[107] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.7
[108] qvalue_2.16.0
[109] latticeExtra_0.6-29
[110] hwriter_1.3.2
[111] R6_2.5.0
[112] ShortRead_1.42.0
[113] KernSmooth_2.23-15
[114] gridExtra_2.3
[115] boot_1.3-25
[116] MASS_7.3-51.4
[117] gtools_3.8.2
[118] rjson_0.2.20
[119] GenomicAlignments_1.20.1
[120] Rsamtools_2.0.3
[121] GenomeInfoDbData_1.2.1
[122] hms_0.5.3
[123] grid_3.6.0
[124] rGREAT_1.16.1
[125] tidyr_1.1.2
[126] rvcheck_0.1.8
[127] ggforce_0.3.2
profileplyr • 108 views
ADD COMMENT
0
Entering edit mode

Hi Alex, I'm glad you're finding profileplyr useful. The third slot in the bracket-based subsetting method for the profileplyr RangedSummarizedExperiment object (e.g. object[1,2,3]) was something that we added into the package, and is not a normal attribute of this type of object. While that functionality works great for subsetting the entire matrix within the object, sometimes it doesn't fully integrate as you would think.

To achieve what you'd like, you can use normal RangedSummarizedExperiment functionality as follows:

assays(ppo_fixed)[[2]][1,] <- assays(ppo_fixed)[[3]][1,]

or

assay(ppo_fixed, 2)[1, ] <- assay(ppo_fixed[1,,3])

Hopefully that answers your question and let me know if it's not clear!

ADD REPLY
0
Entering edit mode

Thank you very much Doug, it works. I did it with the first option you provided.

BTW, in my post there was an error in my "for" loop, as the ifelse statement (testing which strand each gene/row is on) was not compatible with returning a vector of values, so only the first element was returned by the code I indicated above. Changing to combined if/else statements worked fine.

ADD REPLY

Login before adding your answer.

Traffic: 443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6