limma_3.17.23 - missing ILMN identifiers in EList objects after read.ilmn
1
0
Entering edit mode
Kemal Akat ▴ 120
@kemal-akat-4351
Last seen 7.7 years ago
Dear colleagues, I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place. The probe and control profiles were exported from GenomeStudio without background correction and normalization. Here is the code I ran: R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) R> y = neqc(x) R> expressed = rowSums(y$other$Detection < 0.05) > 4 R> y = y[expressed, ] R> ids = rownames(y) R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289", : attempt to use zero-length variable name Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN R> traceback() 1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) R> ids[ids == ""] [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" [973] "" "" So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place? Here is how the EListRaw object looks like after reading it into R. R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) R> x An object of class "EListRaw" $source [1] "illumina" $E 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E ILMN_2735294 420.8 401.8 395.8 422.9 360.1 358.5 420.7 327.1 178.8 343.4 425.5 ILMN_2417611 323.8 280.2 294.1 315.5 542.5 301.0 398.0 133.7 235.9 382.0 512.7 ILMN_2545897 98.3 109.2 128.0 124.5 111.3 102.6 110.2 106.6 87.2 104.6 101.8 ILMN_2762289 91.7 88.3 94.2 95.5 88.1 81.2 88.5 88.0 79.4 85.3 84.5 ILMN_1248788 87.6 84.7 92.0 92.9 85.9 84.0 93.8 86.9 77.5 84.9 86.3 9379087022_F ILMN_2735294 322.0 ILMN_2417611 185.7 ILMN_2545897 107.8 ILMN_2762289 88.8 ILMN_1248788 85.1 46250 more rows ... $genes TargetID Status 1 0610005A07RIK regular 2 0610005C13RIK regular 3 0610005H09RIK regular 4 0610005I04 regular 5 0610005K03RIK regular 46250 more rows ... $other $Detection 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E ILMN_2735294 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 ILMN_2417611 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 ILMN_2545897 0.08974 0.00321 0.0000 0.0000 0.0000 0.0000 0.00107 0.0000 0.00214 0.00214 0.00107 ILMN_2762289 0.34402 0.49359 0.1998 0.1827 0.6068 0.9220 0.71047 0.4776 0.27350 0.58654 0.77991 ILMN_1248788 0.76603 0.86004 0.3472 0.3718 0.8440 0.6645 0.21902 0.6004 0.58120 0.63675 0.53419 9379087022_F ILMN_2735294 0.0000 ILMN_2417611 0.0000 ILMN_2545897 0.0000 ILMN_2762289 0.3440 ILMN_1248788 0.7949 46250 more rows ... $Avg_NBEADS 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E ILMN_2735294 51 63 58 57 36 46 49 60 62 50 58 ILMN_2417611 44 56 46 51 66 51 42 66 40 47 57 ILMN_2545897 51 69 45 67 47 39 44 56 59 43 50 ILMN_2762289 48 49 53 59 43 55 47 49 54 41 53 ILMN_1248788 43 42 29 38 39 42 36 36 29 31 45 9379087022_F ILMN_2735294 50 ILMN_2417611 56 ILMN_2545897 58 ILMN_2762289 42 ILMN_1248788 38 46250 more rows ... Now looking at the end of the file: R> tail(x$E) 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F 92.2 92.6 92.6 93.8 92.1 86.9 91.4 85.7 78.9 86.5 89.0 91.7 89.2 85.7 92.3 89.9 85.9 83.7 91.3 89.5 76.6 91.4 86.3 85.8 89.8 85.5 92.7 92.1 92.7 87.3 90.1 86.2 79.1 83.7 86.4 84.9 96.9 88.9 92.4 94.6 90.7 87.9 96.2 85.6 78.0 82.0 86.4 84.1 87.8 83.5 85.9 90.2 81.6 81.5 92.5 83.8 73.1 80.6 86.1 86.8 89.8 87.4 87.1 89.6 88.1 84.4 91.9 85.7 80.5 88.3 86.8 86.3 R> sessionInfo() R Under development (unstable) (2013-06-26 r63071) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base other attached packages: [1] xtable_1.7-1 vsn_3.29.1 reshape2_1.2.2 ratr_1.0 pheatmap_0.7.4 illuminaMousev2.db_1.18.0 [7] org.Mm.eg.db_2.9.0 GOstats_2.27.1 graph_1.39.3 ggplot2_0.9.3.1 edgeR_3.3.8 limma_3.17.23 [13] codetools_0.2-8 Category_2.27.3 GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 Matrix_1.0-12 [19] lattice_0.20-15 Biostrings_2.29.19 XVector_0.1.4 IRanges_1.19.37 AnnotationDbi_1.23.23 Biobase_2.21.7 [25] BiocGenerics_0.7.5 knitr_1.4.1 setwidth_1.0-3 loaded via a namespace (and not attached): [1] affy_1.39.2 affyio_1.29.0 annotate_1.39.0 AnnotationForge_1.3.22 BiocInstaller_1.11.4 colorspace_1.2-2 dichromat_2.0-0 [8] digest_0.6.3 evaluate_0.4.7 formatR_0.9 genefilter_1.43.0 grid_3.1.0 GSEABase_1.23.0 gtable_0.1.2 [15] highr_0.2.1 labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 preprocessCore_1.23.0 proto_0.3-10 [22] RBGL_1.37.2 RColorBrewer_1.0-5 scales_0.2.3 stats4_3.1.0 stringr_0.6.2 survival_2.37-4 tools_3.1.0 [29] XML_3.98-1.1 zlibbioc_1.7.0 R> Any help and explanations appreciated! Cheers, Kemal -- Kemal Akat Laboratory of RNA Molecular Biology The Rockefeller University 1230 York Avenue, Box #186 New York, NY 10065
Annotation Normalization GO probe annotate Annotation Normalization GO probe annotate • 1.0k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.4k
@wei-shi-2183
Last seen 4 days ago
Australia/Melbourne/Olivia Newton-John …
Dear Kemal, Those reads with empty names are likely to be control probes because control probes were always put at the end of the data matrix (x in your data) by read.ilmn. These probes however should be removed after you ran neqc function, but this didn't seem to be the case. Could you please run the following command so that I can see if neqc successfully identified the control probes? table(x$genes$Status) Best regards, Wei On Oct 10, 2013, at 5:39 AM, Kemal Akat wrote: > Dear colleagues, > > I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in > the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place. > > The probe and control profiles were exported from GenomeStudio without background correction and normalization. > > Here is the code I ran: > > R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) > R> y = neqc(x) > R> expressed = rowSums(y$other$Detection < 0.05) > 4 > R> y = y[expressed, ] > R> ids = rownames(y) > R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) > > Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) : > error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289", : > attempt to use zero-length variable name > Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN > > R> traceback() > 1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) > > R> ids[ids == ""] > [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" > [973] "" "" > > So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place? > > Here is how the EListRaw object looks like after reading it into R. > > R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) > R> x > An object of class "EListRaw" > $source > [1] "illumina" > > $E > 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E > ILMN_2735294 420.8 401.8 395.8 422.9 360.1 358.5 420.7 327.1 178.8 343.4 425.5 > ILMN_2417611 323.8 280.2 294.1 315.5 542.5 301.0 398.0 133.7 235.9 382.0 512.7 > ILMN_2545897 98.3 109.2 128.0 124.5 111.3 102.6 110.2 106.6 87.2 104.6 101.8 > ILMN_2762289 91.7 88.3 94.2 95.5 88.1 81.2 88.5 88.0 79.4 85.3 84.5 > ILMN_1248788 87.6 84.7 92.0 92.9 85.9 84.0 93.8 86.9 77.5 84.9 86.3 > 9379087022_F > ILMN_2735294 322.0 > ILMN_2417611 185.7 > ILMN_2545897 107.8 > ILMN_2762289 88.8 > ILMN_1248788 85.1 > 46250 more rows ... > > $genes > TargetID Status > 1 0610005A07RIK regular > 2 0610005C13RIK regular > 3 0610005H09RIK regular > 4 0610005I04 regular > 5 0610005K03RIK regular > 46250 more rows ... > > $other > $Detection > 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E > ILMN_2735294 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 > ILMN_2417611 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 > ILMN_2545897 0.08974 0.00321 0.0000 0.0000 0.0000 0.0000 0.00107 0.0000 0.00214 0.00214 0.00107 > ILMN_2762289 0.34402 0.49359 0.1998 0.1827 0.6068 0.9220 0.71047 0.4776 0.27350 0.58654 0.77991 > ILMN_1248788 0.76603 0.86004 0.3472 0.3718 0.8440 0.6645 0.21902 0.6004 0.58120 0.63675 0.53419 > 9379087022_F > ILMN_2735294 0.0000 > ILMN_2417611 0.0000 > ILMN_2545897 0.0000 > ILMN_2762289 0.3440 > ILMN_1248788 0.7949 > 46250 more rows ... > > $Avg_NBEADS > 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E > ILMN_2735294 51 63 58 57 36 46 49 60 62 50 58 > ILMN_2417611 44 56 46 51 66 51 42 66 40 47 57 > ILMN_2545897 51 69 45 67 47 39 44 56 59 43 50 > ILMN_2762289 48 49 53 59 43 55 47 49 54 41 53 > ILMN_1248788 43 42 29 38 39 42 36 36 29 31 45 > 9379087022_F > ILMN_2735294 50 > ILMN_2417611 56 > ILMN_2545897 58 > ILMN_2762289 42 > ILMN_1248788 38 > 46250 more rows ... > > Now looking at the end of the file: > > R> tail(x$E) > 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F > 92.2 92.6 92.6 93.8 92.1 86.9 91.4 85.7 78.9 86.5 89.0 91.7 > 89.2 85.7 92.3 89.9 85.9 83.7 91.3 89.5 76.6 91.4 86.3 85.8 > 89.8 85.5 92.7 92.1 92.7 87.3 90.1 86.2 79.1 83.7 86.4 84.9 > 96.9 88.9 92.4 94.6 90.7 87.9 96.2 85.6 78.0 82.0 86.4 84.1 > 87.8 83.5 85.9 90.2 81.6 81.5 92.5 83.8 73.1 80.6 86.1 86.8 > 89.8 87.4 87.1 89.6 88.1 84.4 91.9 85.7 80.5 88.3 86.8 86.3 > > > R> sessionInfo() > R Under development (unstable) (2013-06-26 r63071) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xtable_1.7-1 vsn_3.29.1 reshape2_1.2.2 ratr_1.0 pheatmap_0.7.4 illuminaMousev2.db_1.18.0 > [7] org.Mm.eg.db_2.9.0 GOstats_2.27.1 graph_1.39.3 ggplot2_0.9.3.1 edgeR_3.3.8 limma_3.17.23 > [13] codetools_0.2-8 Category_2.27.3 GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 Matrix_1.0-12 > [19] lattice_0.20-15 Biostrings_2.29.19 XVector_0.1.4 IRanges_1.19.37 AnnotationDbi_1.23.23 Biobase_2.21.7 > [25] BiocGenerics_0.7.5 knitr_1.4.1 setwidth_1.0-3 > > loaded via a namespace (and not attached): > [1] affy_1.39.2 affyio_1.29.0 annotate_1.39.0 AnnotationForge_1.3.22 BiocInstaller_1.11.4 colorspace_1.2-2 dichromat_2.0-0 > [8] digest_0.6.3 evaluate_0.4.7 formatR_0.9 genefilter_1.43.0 grid_3.1.0 GSEABase_1.23.0 gtable_0.1.2 > [15] highr_0.2.1 labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 preprocessCore_1.23.0 proto_0.3-10 > [22] RBGL_1.37.2 RColorBrewer_1.0-5 scales_0.2.3 stats4_3.1.0 stringr_0.6.2 survival_2.37-4 tools_3.1.0 > [29] XML_3.98-1.1 zlibbioc_1.7.0 > R> > > Any help and explanations appreciated! > > Cheers, > Kemal > -- > Kemal Akat > Laboratory of RNA Molecular Biology > The Rockefeller University > 1230 York Avenue, Box #186 > New York, NY 10065 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT
0
Entering edit mode
Dear Wei, Thank you for getting back to me. The limma::neqc function seemed to work fine: R> table(x$genes$Status) BIOTIN CY3_HYB HOUSEKEEPING LABELING LOW_STRINGENCY_HYB 2 6 14 8 8 NEGATIVE regular 936 45281 R> y = neqc(x) R> expressed <- rowSums(y$other$Detection < 0.05) > 1 R> y <- y[expressed, ] R> tail(y$E) 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C ILMN_2498108 6.440 6.524 7.269 6.651 6.159 6.827 6.309 ILMN_2499888 4.408 4.492 4.336 4.505 4.272 4.277 4.404 ILMN_2432039 4.439 4.695 4.410 4.435 4.407 4.277 4.375 ILMN_2475617 4.306 4.216 4.270 4.259 4.267 4.476 4.175 ILMN_2432040 4.477 4.590 4.326 4.445 4.275 4.457 4.386 ILMN_2424408 4.319 4.332 4.744 4.909 4.481 4.899 4.541 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F ILMN_2498108 6.929 7.220 7.270 7.300 6.797 ILMN_2499888 4.511 4.696 4.505 4.455 4.682 ILMN_2432039 4.469 4.256 4.243 4.271 4.298 ILMN_2475617 4.229 5.397 4.278 4.157 4.370 ILMN_2432040 4.601 5.381 4.526 4.442 4.440 ILMN_2424408 4.968 4.866 5.132 4.586 5.145 A bit puzzling at the moment is that the code that previously failed (below) works now. R> ids = rownames(y) R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) R> symbol = unlist(mget(ids, illuminaMousev2SYMBOL, ifnotfound = NA)) R> ensembl = unlist(mget(ids, illuminaMousev2ENSEMBL, ifnotfound = NA)) I don't really have a good explanation for that change. I tried a few things (made sure now superfluous rows in the probe profiles, tried the ProbeID field etc.), but nothing fundamental. Note, that mget failed on the normalized object (y) previously, as this also had the "nameless" probes. Right now, it looks like a user mistake, and a therapeutic post. :-) Best, Kemal On Oct 9, 2013, at 7:10 PM, Wei Shi <shi at="" wehi.edu.au=""> wrote: > Dear Kemal, > > Those reads with empty names are likely to be control probes because control probes were always put at the end of the data matrix (x in your data) by read.ilmn. These probes however should be removed after you ran neqc function, but this didn't seem to be the case. Could you please run the following command so that I can see if neqc successfully identified the control probes? > > table(x$genes$Status) > > Best regards, > Wei > > On Oct 10, 2013, at 5:39 AM, Kemal Akat wrote: > >> Dear colleagues, >> >> I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in >> the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place. >> >> The probe and control profiles were exported from GenomeStudio without background correction and normalization. >> >> Here is the code I ran: >> >> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) >> R> y = neqc(x) >> R> expressed = rowSums(y$other$Detection < 0.05) > 4 >> R> y = y[expressed, ] >> R> ids = rownames(y) >> R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) >> >> Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) : >> error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289", : >> attempt to use zero-length variable name >> Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN >> >> R> traceback() >> 1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) >> >> R> ids[ids == ""] >> [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >> [973] "" "" >> >> So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place? >> >> Here is how the EListRaw object looks like after reading it into R. >> >> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE) >> R> x >> An object of class "EListRaw" >> $source >> [1] "illumina" >> >> $E >> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E >> ILMN_2735294 420.8 401.8 395.8 422.9 360.1 358.5 420.7 327.1 178.8 343.4 425.5 >> ILMN_2417611 323.8 280.2 294.1 315.5 542.5 301.0 398.0 133.7 235.9 382.0 512.7 >> ILMN_2545897 98.3 109.2 128.0 124.5 111.3 102.6 110.2 106.6 87.2 104.6 101.8 >> ILMN_2762289 91.7 88.3 94.2 95.5 88.1 81.2 88.5 88.0 79.4 85.3 84.5 >> ILMN_1248788 87.6 84.7 92.0 92.9 85.9 84.0 93.8 86.9 77.5 84.9 86.3 >> 9379087022_F >> ILMN_2735294 322.0 >> ILMN_2417611 185.7 >> ILMN_2545897 107.8 >> ILMN_2762289 88.8 >> ILMN_1248788 85.1 >> 46250 more rows ... >> >> $genes >> TargetID Status >> 1 0610005A07RIK regular >> 2 0610005C13RIK regular >> 3 0610005H09RIK regular >> 4 0610005I04 regular >> 5 0610005K03RIK regular >> 46250 more rows ... >> >> $other >> $Detection >> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E >> ILMN_2735294 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 >> ILMN_2417611 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000 >> ILMN_2545897 0.08974 0.00321 0.0000 0.0000 0.0000 0.0000 0.00107 0.0000 0.00214 0.00214 0.00107 >> ILMN_2762289 0.34402 0.49359 0.1998 0.1827 0.6068 0.9220 0.71047 0.4776 0.27350 0.58654 0.77991 >> ILMN_1248788 0.76603 0.86004 0.3472 0.3718 0.8440 0.6645 0.21902 0.6004 0.58120 0.63675 0.53419 >> 9379087022_F >> ILMN_2735294 0.0000 >> ILMN_2417611 0.0000 >> ILMN_2545897 0.0000 >> ILMN_2762289 0.3440 >> ILMN_1248788 0.7949 >> 46250 more rows ... >> >> $Avg_NBEADS >> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E >> ILMN_2735294 51 63 58 57 36 46 49 60 62 50 58 >> ILMN_2417611 44 56 46 51 66 51 42 66 40 47 57 >> ILMN_2545897 51 69 45 67 47 39 44 56 59 43 50 >> ILMN_2762289 48 49 53 59 43 55 47 49 54 41 53 >> ILMN_1248788 43 42 29 38 39 42 36 36 29 31 45 >> 9379087022_F >> ILMN_2735294 50 >> ILMN_2417611 56 >> ILMN_2545897 58 >> ILMN_2762289 42 >> ILMN_1248788 38 >> 46250 more rows ... >> >> Now looking at the end of the file: >> >> R> tail(x$E) >> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F >> 92.2 92.6 92.6 93.8 92.1 86.9 91.4 85.7 78.9 86.5 89.0 91.7 >> 89.2 85.7 92.3 89.9 85.9 83.7 91.3 89.5 76.6 91.4 86.3 85.8 >> 89.8 85.5 92.7 92.1 92.7 87.3 90.1 86.2 79.1 83.7 86.4 84.9 >> 96.9 88.9 92.4 94.6 90.7 87.9 96.2 85.6 78.0 82.0 86.4 84.1 >> 87.8 83.5 85.9 90.2 81.6 81.5 92.5 83.8 73.1 80.6 86.1 86.8 >> 89.8 87.4 87.1 89.6 88.1 84.4 91.9 85.7 80.5 88.3 86.8 86.3 >> >> >> R> sessionInfo() >> R Under development (unstable) (2013-06-26 r63071) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] splines parallel stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] xtable_1.7-1 vsn_3.29.1 reshape2_1.2.2 ratr_1.0 pheatmap_0.7.4 illuminaMousev2.db_1.18.0 >> [7] org.Mm.eg.db_2.9.0 GOstats_2.27.1 graph_1.39.3 ggplot2_0.9.3.1 edgeR_3.3.8 limma_3.17.23 >> [13] codetools_0.2-8 Category_2.27.3 GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 Matrix_1.0-12 >> [19] lattice_0.20-15 Biostrings_2.29.19 XVector_0.1.4 IRanges_1.19.37 AnnotationDbi_1.23.23 Biobase_2.21.7 >> [25] BiocGenerics_0.7.5 knitr_1.4.1 setwidth_1.0-3 >> >> loaded via a namespace (and not attached): >> [1] affy_1.39.2 affyio_1.29.0 annotate_1.39.0 AnnotationForge_1.3.22 BiocInstaller_1.11.4 colorspace_1.2-2 dichromat_2.0-0 >> [8] digest_0.6.3 evaluate_0.4.7 formatR_0.9 genefilter_1.43.0 grid_3.1.0 GSEABase_1.23.0 gtable_0.1.2 >> [15] highr_0.2.1 labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 preprocessCore_1.23.0 proto_0.3-10 >> [22] RBGL_1.37.2 RColorBrewer_1.0-5 scales_0.2.3 stats4_3.1.0 stringr_0.6.2 survival_2.37-4 tools_3.1.0 >> [29] XML_3.98-1.1 zlibbioc_1.7.0 >> R> >> >> Any help and explanations appreciated! >> >> Cheers, >> Kemal >> -- >> Kemal Akat >> Laboratory of RNA Molecular Biology >> The Rockefeller University >> 1230 York Avenue, Box #186 >> New York, NY 10065 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6