Problems with iteration (sappily) over RNAStringSet
0
0
Entering edit mode
Kemal Akat ▴ 120
@kemal-akat-4351
Last seen 9.6 years ago
Hi, I want to iterate over an RNAStringSet (rs) to do a calculation for each of the sequences in the form of: 1) get the sequence 2) do the calculations 3) plot the results and 4) use the sequence name (names(rs) in plot legends and titles, e.g. plot(x, main = paste(sequence_name, 'in condition X'), sep = ' '). The name I want to use is the first field from the FASTA description, and I don't want to use the other information. However, the extraction of the name does not work as assumed. The input FASTA file looks like this: > Gene1 Description UUUUUUUUUUUUUUUUUUUUUUU > Gene2 Description AAAAAAAAAAAAAAAAAAAAAAA > Gene3 Description GGGGGGGGGGGGGGGGGGGGGGG > Gene4 Description CCCCCCCCCCCCCCCCCCCCCCC library("Biostrings") rs = read.RNAStringSet('test.fa') R> rs A RNAStringSet instance of length 4 width seq names [1] 23 UUUUUUUUUUUUUUUUUUUUUUU Gene1 Description [2] 23 AAAAAAAAAAAAAAAAAAAAAAA Gene2 Description [3] 23 GGGGGGGGGGGGGGGGGGGGGGG Gene3 Description [4] 23 CCCCCCCCCCCCCCCCCCCCCCC Gene4 Description The following commands return what I was expecting: R> strsplit(names(rs), split = ' ')[[1]][1] [1] "Gene1" R> strsplit(toString(rs), split = ',')[[1]][1] [1] "UUUUUUUUUUUUUUUUUUUUUUU" To iterate I wrote this function: myFun = function(x){ name = strsplit(names(x), split = ' ')[[1]][1] seq = strsplit(toString(x), split = ',')[[1]][1] names(seq) = name return(seq) } However, this returns an error: R> myFun = function(x){ + name = strsplit(names(x), split = ' ')[[1]][1] + seq = strsplit(toString(x), split = ',')[[1]][1] + names(seq) = name + return(seq) + } R> sapply(y, myFun) Error in strsplit(names(x), split = " ") : non-character argument Calls: sapply ... lapply -> lapply -> lapply -> FUN -> FUN -> strsplit Simplyfing the function to R> myFun = function(x){ + seq = strsplit(toString(x), split = ',')[[1]][1] + } Returns the full sequence names as entered in the original FASTA file. R> sapply(rs, myFun) Gene1 Description Gene2 Description Gene3 Description "UUUUUUUUUUUUUUUUUUUUUUU" "AAAAAAAAAAAAAAAAAAAAAAA" "GGGGGGGGGGGGGGGGGGGGGGG" Gene4 Description "CCCCCCCCCCCCCCCCCCCCCCC" I would appreciate if anyone could offer a solution or explain why the strsplit does not work with the looping (sapply)? Thank you! Kemal R> sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] illuminaHumanv4.db_1.14.0 org.Hs.eg.db_2.7.1 [3] RSQLite_0.11.1 DBI_0.2-5 [5] AnnotationDbi_1.18.1 beadarray_2.6.0 [7] Biobase_2.16.0 ShortRead_1.14.4 [9] latticeExtra_0.6-19 RColorBrewer_1.0-5 [11] Rsamtools_1.8.5 lattice_0.20-6 [13] GenomicRanges_1.8.7 ggplot2_0.9.1 [15] edgeR_2.6.7 limma_3.12.1 [17] Biostrings_2.24.1 IRanges_1.14.3 [19] BiocGenerics_0.2.0 colorout_0.9-9 loaded via a namespace (and not attached): [1] BeadDataPackR_1.8.0 bitops_1.0-4.1 colorspace_1.1-1 [4] dichromat_1.2-4 digest_0.5.2 grid_2.15.0 [7] hwriter_1.3 labeling_0.1 MASS_7.3-18 [10] memoise_0.1 munsell_0.3 plyr_1.7.1 [13] proto_0.3-9.2 reshape2_1.2.1 scales_0.2.1 [16] stats4_2.15.0 stringr_0.6 tools_2.15.0 [19] zlibbioc_1.2.0
• 982 views
ADD COMMENT

Login before adding your answer.

Traffic: 1001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6