Question

SGSeq: class of "sample_info" argument of analyzeFeatures, data.frame or DataFrame!

1

Entering edit mode

Chao-Jen Wong ▴ 40

@chao-jen-wong-7035

Last seen 2.4 years ago

USA/Seattle/Fred Hutchinson Cancer Rese…

This is just a minor problem that most of people won't encounter if they set up the "sample_info" argument as a data.frame instance.

By reading the man page of analyzeFeatures, I thought "sample_info" argument could be either data.frame or DataFrame. And the fact is it could be either, but have different constraint. If I let sample_info to be DataFrame, then the rownames must be either NULL or the same as "sample_name" column. Otherwise, an error occurs when generatitng the SummarizedExperiemnt for the feature counts. On the other hand, if sample_info is a data.frame, it wouldn't cause any problem whatever its rownames are. The reason is the internal function would convert the data.frame to DataFrame and the rownames would automatically set to NULL. Below is the reproducible example.

library(SGSeq)
path <- system.file("extdata", package = "SGSeq")
si$file_bam <- file.path(path, "bams", si$file_bam)
rownames(si)
si <- DataFrame(si)
rownames(si)
rownames(si) <- 1:nrow(si)
sgfc <- analyzeFeatures(si[1,], gr)


Predict features...
N1 complete.
Process features...
Obtain counts...
N1 complete.
Error in FUN(X[[i]], ...) :
  assay colnames() must be NULL or equal colData rownames()

I encountered this problem because I made "sample_info" as a DataFrame and use it for other purpose. I know it could be fix if I just convert it to data.frame when using the SESeq package. But it is kind of funny, right?

SessionInfo()

> sessionInfo() R Under development (unstable) (2016-02-27 r70236) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] SGSeq_1.5.5 SummarizedExperiment_1.1.18 [3] Biobase_2.31.3 GenomicRanges_1.23.13 [5] GenomeInfoDb_1.7.6 IRanges_2.5.24 [7] S4Vectors_0.9.27 BiocGenerics_0.17.3 [9] BiocInstaller_1.21.3

loaded via a namespace (and not attached): [1] igraph_1.0.1 AnnotationDbi_1.33.7 XVector_0.11.4 [4] magrittr_1.5 zlibbioc_1.17.0 GenomicAlignments_1.7.13 [7] BiocParallel_1.5.16 tools_3.3.0 DBI_0.3.1 [10] rtracklayer_1.31.6 bitops_1.0-6 RUnit_0.4.31 [13] RCurl_1.95-4.7 biomaRt_2.27.2 RSQLite_1.0.0 [16] GenomicFeatures_1.23.22 Biostrings_2.39.7 Rsamtools_1.23.3 [19] XML_3.98-1.3 >

sgseq • 1.9k views

ADD COMMENT • link updated 9.8 years ago by Leonard Goldstein ▴ 260 • written 9.8 years ago by Chao-Jen Wong ▴ 40

score 1 · Answer 1 · 2016-03-01

1

Entering edit mode

Leonard Goldstein ▴ 260

@leonard-goldstein-6845

Last seen 19 months ago

Australia

Thanks for reporting. Looks like this is caused by changes in the SummarizedExperiment package in bioc-devel. This is fixed in SGSeq 1.5.8 - available via svn now or biocLite() tomorrow.

Leonard

ADD COMMENT • link 9.8 years ago Leonard Goldstein ▴ 260

0

Entering edit mode

Thanks a lot.

ADD REPLY • link 9.8 years ago Chao-Jen Wong ▴ 40