Question

Using custom CDF with 'make.cdf.env'

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear List, I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong. I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform. Many thanks, Scott > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" > > setwd(paste0(folder,"CEL")) > options(stringsAsFactors=FALSE) > library(affy) Loading required package: BiocGenerics Loading required package: parallel Attaching package: ???BiocGenerics??? The following objects are masked from ???package:parallel???: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ???package:stats???: xtabs The following objects are masked from ???package:base???: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. > library(makecdfenv) Loading required package: affyio > > cleancdfname("newmir1.cdf") [1] "newmir1.cdf" > newmir1 = make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ....................... > Data <- ReadAffy() > Data at cdfName <- "newmir1" > > Data AffyBatch object size of arrays=230x230 features (17 kb) cdf=newmir1 (7815 affyids) number of samples=1 number of genes=7815 annotation=mirna102xgain notes= > > dim(exprs(rma(Data))) Background correcting Normalizing Calculating Expression [1] 7815 1 -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1 [5] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 [4] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.

miRNA cdf probe miRNA cdf probe • 1.9k views

ADD COMMENT • link updated 9.7 years ago by Scott Robinson ▴ 130 • written 9.7 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-08-27

0

Entering edit mode

Scott Robinson ▴ 130

@scott-robinson-5804

Last seen 9.6 years ago

Dear All, Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new ("newmir1.cdf") CDFs, test script and example CEL file: http://www.files.com/set/53fdeb0aa2176 Thanks, Scott ________________________________________ From: Scott Robinson [guest] [guest@bioconductor.org] Sent: 27 August 2014 13:11 To: bioconductor at r-project.org; Scott Robinson Cc: makecdfenv Maintainer Subject: Using custom CDF with 'make.cdf.env' Dear List, I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong. I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform. Many thanks, Scott > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" > > setwd(paste0(folder,"CEL")) > options(stringsAsFactors=FALSE) > library(affy) Loading required package: BiocGenerics Loading required package: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. > library(makecdfenv) Loading required package: affyio > > cleancdfname("newmir1.cdf") [1] "newmir1.cdf" > newmir1 = make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ....................... > Data <- ReadAffy() > Data at cdfName <- "newmir1" > > Data AffyBatch object size of arrays=230x230 features (17 kb) cdf=newmir1 (7815 affyids) number of samples=1 number of genes=7815 annotation=mirna102xgain notes= > > dim(exprs(rma(Data))) Background correcting Normalizing Calculating Expression [1] 7815 1 -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1 [5] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 [4] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 9.7 years ago Scott Robinson ▴ 130

0

Entering edit mode

Hi Scott, As far as I can tell, you haven't made any changes to the cdf at all: > z <- make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots.................................................................. ....... > z <environment: 0x00000000113d5c08=""> > length(ls(z)) [1] 7815 > zz <- as.list(z) > table(sapply(zz, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > y <- make.cdf.env("miRNA-1_0.CDF") Reading CDF file. Creating CDF environment Wait for about 78 dots.................................................................. ........ > yy <- as.list(y) > length(yy) [1] 7815 > table(sapply(yy, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > all.equal(names(zz), names(yy)) [1] TRUE Best, Jim On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson < Scott.Robinson at glasgow.ac.uk> wrote: > Dear All, > > Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new > ("newmir1.cdf") CDFs, test script and example CEL file: > > http://www.files.com/set/53fdeb0aa2176 > > Thanks, > > Scott > ________________________________________ > From: Scott Robinson [guest] [guest at bioconductor.org] > Sent: 27 August 2014 13:11 > To: bioconductor at r-project.org; Scott Robinson > Cc: makecdfenv Maintainer > Subject: Using custom CDF with 'make.cdf.env' > > Dear List, > > I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As > there is a great level of redundancy in this chip I have condensed the > original 7815 probe sets into 6190 probe sets (by 'moving' probes from one > set to another), however when I try making and attaching my new CDF > environment I still seem to have 7815 probe sets so presumably I must have > done something wrong. > > I have read the vignette and many similar posts to mine however still > cannot work out what I am doing wrong. Perhaps the problem is with the CDF > itself? I have a short script testing the functionality, the output of > which I have copied in below. I will gladly attach the script, CDFs and > example CEL file if there is nothing obviously wrong with the code - would > do this now but there doesn't appear to be an option on the webform. > > Many thanks, > > Scott > > > > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" > > > > setwd(paste0(folder,"CEL")) > > options(stringsAsFactors=FALSE) > > library(affy) > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: ?BiocGenerics? > > The following objects are masked from ?package:parallel?: > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > clusterExport, clusterMap, parApply, parCapply, parLapply, > parLapplyLB, parRapply, parSapply, parSapplyLB > > The following object is masked from ?package:stats?: > > xtabs > > The following objects are masked from ?package:base?: > > anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, > Filter, Find, get, intersect, lapply, Map, mapply, match, mget, > order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, > rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, > tapply, union, unique, unlist > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > > > library(makecdfenv) > Loading required package: affyio > > > > cleancdfname("newmir1.cdf") > [1] "newmir1.cdf" > > newmir1 = make.cdf.env("newmir1.cdf") > Reading CDF file. > Creating CDF environment > Wait for about 78 > dots................................................................ ....... > > Data <- ReadAffy() > > Data at cdfName <- "newmir1" > > > > Data > AffyBatch object > size of arrays=230x230 features (17 kb) > cdf=newmir1 (7815 affyids) > number of samples=1 > number of genes=7815 > annotation=mirna102xgain > notes= > > > > dim(exprs(rma(Data))) > Background correcting > Normalizing > Calculating Expression > [1] 7815 1 > > > -- output of sessionInfo(): > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 > [2] LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1 > [5] BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 > [4] zlibbioc_1.6.0 > > -- > Sent via the guest posting facility at bioconductor.org. > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Scott, I see some of what you have done. As an example, you moved things around, and changed the 'Cell' number: C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF 129939:Name=bta-let-7a_st 129946:Cell1=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 0 129947:Cell2=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 1 129948:Cell3=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 2 11 129949:Cell4=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 3 C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf 43056:Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11 43057:Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11 43058:Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11 43059:Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11 This won't change anything. In both cases, there is a probeset called bta-let-7a_st, that has four identical probes. Putting these data somewhere else in the cdf won't change the way it is parsed. In other words, this: C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf StopPosition=59 CellHeader=X Y PROBE FEAT QUAL EXPOS POS CBASE PBASE TBA Cell1=2 190 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 0 11 G Cell2=196 180 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 1 11 Cell3=211 187 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 2 11 Cell4=29 205 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 3 11 Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11 Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11 Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11 Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11 Cell9=2 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 8 11 G Cell10=178 178 ACTCCATCATCCAACATATCAA control cbr-let-7_st 9 11 Cell11=212 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 10 11 Cell12=189 181 ACTCCATCATCCAACATATCAA control cbr-let-7_st 11 11 Cell13=179 178 ACTCCATCATCCAACATATCAA control cel-let-7_st 12 11 Cell14=80 157 ACTCCATCATCCAACATATCAA control cel-let-7_st 13 11 Cell15=215 191 ACTCCATCATCCAACATATCAA control cel-let-7_st 14 11 Cell16=190 181 ACTCCATCATCCAACATATCAA control cel-let-7_st 15 11 Cell17=79 157 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 16 11 Cell18=213 189 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 17 11 Cell19=182 179 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 18 11 Cell20=196 181 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 19 11 Cell21=205 184 ACTCCATCATCCAACATATCAA control dre-let-7a_st 20 11 Cell22=188 181 ACTCCATCATCCAACATATCAA control dre-let-7a_st 21 11 Cell23=216 191 ACTCCATCATCCAACATATCAA control dre-let-7a_st 22 11 Cell24=83 157 ACTCCATCATCCAACATATCAA control dre-let-7a_st 23 11 Cell25=77 157 ACTCCATCATCCAACATATCAA control fru-let-7a_st 24 11 Cell26=212 188 ACTCCATCATCCAACATATCAA control fru-let-7a_st 25 11 Cell27=193 181 ACTCCATCATCCAACATATCAA control fru-let-7a_st 26 11 Cell28=182 180 ACTCCATCATCCAACATATCAA control fru-let-7a_st 27 11 Cell29=188 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 28 11 Cell30=211 189 ACTCCATCATCCAACATATCAA control gga-let-7a_st 29 11 Cell31=78 157 ACTCCATCATCCAACATATCAA control gga-let-7a_st 30 11 Cell32=199 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 31 11 Cell33=214 188 ACTCCATCATCCAACATATCAA control gga-let-7j_st 32 11 Cell34=191 181 ACTCCATCATCCAACATATCAA control gga-let-7j_st 33 11 Cell35=180 177 ACTCCATCATCCAACATATCAA control gga-let-7j_st 34 11 Cell36=203 180 ACTCCATCATCCAACATATCAA control gga-let-7j_st 35 11 Cell37=211 188 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 36 11 Cell38=184 179 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 37 11 Cell39=195 181 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 38 11 Cell40=82 157 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 39 11 Cell41=179 177 ACTCCATCATCCAACATATCAA control mml-let-7a_st 40 11 Cell42=190 182 ACTCCATCATCCAACATATCAA control mml-let-7a_st 41 11 Cell43=214 191 ACTCCATCATCCAACATATCAA control mml-let-7a_st 42 11 Cell44=202 180 ACTCCATCATCCAACATATCAA control mml-let-7a_st 43 11 Cell45=183 179 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 44 11 Cell46=84 157 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 45 11 Cell47=194 181 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 46 11 Cell48=212 187 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 47 11 Cell49=76 157 ACTCCATCATCCAACATATCAA control rno-let-7a_st 48 11 Cell50=192 181 ACTCCATCATCCAACATATCAA control rno-let-7a_st 49 11 Cell51=181 177 ACTCCATCATCCAACATATCAA control rno-let-7a_st 50 11 Cell52=212 191 ACTCCATCATCCAACATATCAA control rno-let-7a_st 51 11 Cell53=187 181 ACTCCATCATCCAACATATCAA control tni-let-7a_st 52 11 Cell54=128 77 ACTCCATCATCCAACATATCAA control tni-let-7a_st 53 11 Cell55=81 157 ACTCCATCATCCAACATATCAA control tni-let-7a_st 54 11 Cell56=213 191 ACTCCATCATCCAACATATCAA control tni-let-7a_st 55 11 Cell57=214 189 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 56 11 Cell58=185 179 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 57 11 Cell59=22 202 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 58 11 Cell60=197 181 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 59 11 will not create a single probeset for let-7a, over all species. And trying to combine 60 identical 25-mers into a single probeset is about as useless as having 15 individual probesets made up of four identical probes. You are still running RMA (or whatever) on essentially the same information, with the only differences between probes being entirely due to technical variability. These arrays are, within the constraints of Affy's system, about as good as you can do. Which is to say, not very good. If you really want to do what you want to do, then you have to also make the probeset IDs identical within each block. So here you would have to strip off the prepended species abbreviation, and convert the gga-let- 7j probes to let-7a_st, and then you would have just one probeset. But that will be a lot of work for what I imagine will be very little gain. Best, Jim On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Scott, > > As far as I can tell, you haven't made any changes to the cdf at all: > > > z <- make.cdf.env("newmir1.cdf") > Reading CDF file. > Creating CDF environment > Wait for about 78 > dots................................................................ ......... > > z > <environment: 0x00000000113d5c08=""> > > length(ls(z)) > [1] 7815 > > zz <- as.list(z) > > table(sapply(zz, nrow)) > > 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 > 92 94 > 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 > 1 78 > > y <- make.cdf.env("miRNA-1_0.CDF") > Reading CDF file. > Creating CDF environment > Wait for about 78 > dots................................................................ .......... > > yy <- as.list(y) > > length(yy) > [1] 7815 > > table(sapply(yy, nrow)) > > 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 > 92 94 > 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 > 1 78 > > all.equal(names(zz), names(yy)) > [1] TRUE > > Best, > > Jim > > > > > On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson < > Scott.Robinson at glasgow.ac.uk> wrote: > >> Dear All, >> >> Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new >> ("newmir1.cdf") CDFs, test script and example CEL file: >> >> http://www.files.com/set/53fdeb0aa2176 >> >> Thanks, >> >> Scott >> ________________________________________ >> From: Scott Robinson [guest] [guest at bioconductor.org] >> Sent: 27 August 2014 13:11 >> To: bioconductor at r-project.org; Scott Robinson >> Cc: makecdfenv Maintainer >> Subject: Using custom CDF with 'make.cdf.env' >> >> Dear List, >> >> I made a custom CDF by modifying the original Affymetrix miRNA v1 file. >> As there is a great level of redundancy in this chip I have condensed the >> original 7815 probe sets into 6190 probe sets (by 'moving' probes from one >> set to another), however when I try making and attaching my new CDF >> environment I still seem to have 7815 probe sets so presumably I must have >> done something wrong. >> >> I have read the vignette and many similar posts to mine however still >> cannot work out what I am doing wrong. Perhaps the problem is with the CDF >> itself? I have a short script testing the functionality, the output of >> which I have copied in below. I will gladly attach the script, CDFs and >> example CEL file if there is nothing obviously wrong with the code - would >> do this now but there doesn't appear to be an option on the webform. >> >> Many thanks, >> >> Scott >> >> >> > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" >> > >> > setwd(paste0(folder,"CEL")) >> > options(stringsAsFactors=FALSE) >> > library(affy) >> Loading required package: BiocGenerics >> Loading required package: parallel >> >> Attaching package: ?BiocGenerics? >> >> The following objects are masked from ?package:parallel?: >> >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, >> clusterExport, clusterMap, parApply, parCapply, parLapply, >> parLapplyLB, parRapply, parSapply, parSapplyLB >> >> The following object is masked from ?package:stats?: >> >> xtabs >> >> The following objects are masked from ?package:base?: >> >> anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, >> Filter, Find, get, intersect, lapply, Map, mapply, match, mget, >> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, >> rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, >> tapply, union, unique, unlist >> >> Loading required package: Biobase >> Welcome to Bioconductor >> >> Vignettes contain introductory material; view with >> 'browseVignettes()'. To cite Bioconductor, see >> 'citation("Biobase")', and for packages 'citation("pkgname")'. >> >> > library(makecdfenv) >> Loading required package: affyio >> > >> > cleancdfname("newmir1.cdf") >> [1] "newmir1.cdf" >> > newmir1 = make.cdf.env("newmir1.cdf") >> Reading CDF file. >> Creating CDF environment >> Wait for about 78 >> dots............................................................... ........ >> > Data <- ReadAffy() >> > Data at cdfName <- "newmir1" >> > >> > Data >> AffyBatch object >> size of arrays=230x230 features (17 kb) >> cdf=newmir1 (7815 affyids) >> number of samples=1 >> number of genes=7815 >> annotation=mirna102xgain >> notes= >> > >> > dim(exprs(rma(Data))) >> Background correcting >> Normalizing >> Calculating Expression >> [1] 7815 1 >> >> >> -- output of sessionInfo(): >> >> > sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United Kingdom.1252 >> [2] LC_CTYPE=English_United Kingdom.1252 >> [3] LC_MONETARY=English_United Kingdom.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 >> Biobase_2.20.1 >> [5] BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 >> [4] zlibbioc_1.6.0 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago James W. MacDonald 65k

0

Entering edit mode

Ah! Sorry, reading that reply I instantly saw the problem ? I forgot to change the probe set ID for the individual rows. Thanks very much James From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 27 August 2014 16:52 To: Scott Robinson Cc: bioconductor at r-project.org Subject: Re: Using custom CDF with 'make.cdf.env' Hi Scott, I see some of what you have done. As an example, you moved things around, and changed the 'Cell' number: C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF 129939:Name=bta-let-7a_st 129946:Cell1=185 178 ACTCCATCATCCAACATATCAA control bta- let-7a_st 0 129947:Cell2=197 180 ACTCCATCATCCAACATATCAA control bta- let-7a_st 1 129948:Cell3=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 2 11 129949:Cell4=210 187 ACTCCATCATCCAACATATCAA control bta- let-7a_st 3 C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf 43056:Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11 43057:Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11 43058:Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11 43059:Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11 This won't change anything. In both cases, there is a probeset called bta-let-7a_st, that has four identical probes. Putting these data somewhere else in the cdf won't change the way it is parsed. In other words, this: C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf StopPosition=59 CellHeader=X Y PROBE FEAT QUAL EXPOS POS CBASE PBASE TBA Cell1=2 190 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 0 11 G Cell2=196 180 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 1 11 Cell3=211 187 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 2 11 Cell4=29 205 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 3 11 Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11 Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11 Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11 Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11 Cell9=2 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 8 11 G Cell10=178 178 ACTCCATCATCCAACATATCAA control cbr-let-7_st 9 11 Cell11=212 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 10 11 Cell12=189 181 ACTCCATCATCCAACATATCAA control cbr-let-7_st 11 11 Cell13=179 178 ACTCCATCATCCAACATATCAA control cel-let-7_st 12 11 Cell14=80 157 ACTCCATCATCCAACATATCAA control cel-let-7_st 13 11 Cell15=215 191 ACTCCATCATCCAACATATCAA control cel-let-7_st 14 11 Cell16=190 181 ACTCCATCATCCAACATATCAA control cel-let-7_st 15 11 Cell17=79 157 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 16 11 Cell18=213 189 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 17 11 Cell19=182 179 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 18 11 Cell20=196 181 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 19 11 Cell21=205 184 ACTCCATCATCCAACATATCAA control dre-let-7a_st 20 11 Cell22=188 181 ACTCCATCATCCAACATATCAA control dre-let-7a_st 21 11 Cell23=216 191 ACTCCATCATCCAACATATCAA control dre-let-7a_st 22 11 Cell24=83 157 ACTCCATCATCCAACATATCAA control dre-let-7a_st 23 11 Cell25=77 157 ACTCCATCATCCAACATATCAA control fru-let-7a_st 24 11 Cell26=212 188 ACTCCATCATCCAACATATCAA control fru-let-7a_st 25 11 Cell27=193 181 ACTCCATCATCCAACATATCAA control fru-let-7a_st 26 11 Cell28=182 180 ACTCCATCATCCAACATATCAA control fru-let-7a_st 27 11 Cell29=188 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 28 11 Cell30=211 189 ACTCCATCATCCAACATATCAA control gga-let-7a_st 29 11 Cell31=78 157 ACTCCATCATCCAACATATCAA control gga-let-7a_st 30 11 Cell32=199 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 31 11 Cell33=214 188 ACTCCATCATCCAACATATCAA control gga-let-7j_st 32 11 Cell34=191 181 ACTCCATCATCCAACATATCAA control gga-let-7j_st 33 11 Cell35=180 177 ACTCCATCATCCAACATATCAA control gga-let-7j_st 34 11 Cell36=203 180 ACTCCATCATCCAACATATCAA control gga-let-7j_st 35 11 Cell37=211 188 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 36 11 Cell38=184 179 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 37 11 Cell39=195 181 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 38 11 Cell40=82 157 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 39 11 Cell41=179 177 ACTCCATCATCCAACATATCAA control mml-let-7a_st 40 11 Cell42=190 182 ACTCCATCATCCAACATATCAA control mml-let-7a_st 41 11 Cell43=214 191 ACTCCATCATCCAACATATCAA control mml-let-7a_st 42 11 Cell44=202 180 ACTCCATCATCCAACATATCAA control mml-let-7a_st 43 11 Cell45=183 179 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 44 11 Cell46=84 157 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 45 11 Cell47=194 181 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 46 11 Cell48=212 187 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 47 11 Cell49=76 157 ACTCCATCATCCAACATATCAA control rno-let-7a_st 48 11 Cell50=192 181 ACTCCATCATCCAACATATCAA control rno-let-7a_st 49 11 Cell51=181 177 ACTCCATCATCCAACATATCAA control rno-let-7a_st 50 11 Cell52=212 191 ACTCCATCATCCAACATATCAA control rno-let-7a_st 51 11 Cell53=187 181 ACTCCATCATCCAACATATCAA control tni-let-7a_st 52 11 Cell54=128 77 ACTCCATCATCCAACATATCAA control tni-let-7a_st 53 11 Cell55=81 157 ACTCCATCATCCAACATATCAA control tni-let-7a_st 54 11 Cell56=213 191 ACTCCATCATCCAACATATCAA control tni-let-7a_st 55 11 Cell57=214 189 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 56 11 Cell58=185 179 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 57 11 Cell59=22 202 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 58 11 Cell60=197 181 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 59 11 will not create a single probeset for let-7a, over all species. And trying to combine 60 identical 25-mers into a single probeset is about as useless as having 15 individual probesets made up of four identical probes. You are still running RMA (or whatever) on essentially the same information, with the only differences between probes being entirely due to technical variability. These arrays are, within the constraints of Affy's system, about as good as you can do. Which is to say, not very good. If you really want to do what you want to do, then you have to also make the probeset IDs identical within each block. So here you would have to strip off the prepended species abbreviation, and convert the gga-let-7j probes to let-7a_st, and then you would have just one probeset. But that will be a lot of work for what I imagine will be very little gain. Best, Jim On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:jmacdon="" at="" uw.edu="">> wrote: Hi Scott, As far as I can tell, you haven't made any changes to the cdf at all: > z <- make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ......................... > z <environment: 0x00000000113d5c08=""> > length(ls(z)) [1] 7815 > zz <- as.list(z) > table(sapply(zz, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > y <- make.cdf.env("miRNA-1_0.CDF") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ .......................... > yy <- as.list(y) > length(yy) [1] 7815 > table(sapply(yy, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > all.equal(names(zz), names(yy)) [1] TRUE Best, Jim On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <scott.robinson at="" glasgow.ac.uk<mailto:scott.robinson="" at="" glasgow.ac.uk="">> wrote: Dear All, Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new ("newmir1.cdf") CDFs, test script and example CEL file: http://www.files.com/set/53fdeb0aa2176 Thanks, Scott ________________________________________ From: Scott Robinson [guest] [guest@bioconductor.org<mailto:guest@bioconductor.org>] Sent: 27 August 2014 13:11 To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; Scott Robinson Cc: makecdfenv Maintainer Subject: Using custom CDF with 'make.cdf.env' Dear List, I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong. I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform. Many thanks, Scott > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" > > setwd(paste0(folder,"CEL")) > options(stringsAsFactors=FALSE) > library(affy) Loading required package: BiocGenerics Loading required package: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int<http: pmax.int="">, pmin, pmin.int<http: pmin.int="">, Position, rank, rbind, Reduce, rep.int<http: rep.int="">, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. > library(makecdfenv) Loading required package: affyio > > cleancdfname("newmir1.cdf") [1] "newmir1.cdf" > newmir1 = make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ....................... > Data <- ReadAffy() > Data at cdfName <- "newmir1" > > Data AffyBatch object size of arrays=230x230 features (17 kb) cdf=newmir1 (7815 affyids) number of samples=1 number of genes=7815 annotation=mirna102xgain notes= > > dim(exprs(rma(Data))) Background correcting Normalizing Calculating Expression [1] 7815 1 -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1 [5] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 [4] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Scott Robinson ▴ 130

0

Entering edit mode

Hi James, Thanks for the quick response. If you open the new CDF in a text editor like notepad++ and use a find & count function you will find there are only 12,380 matches to the text string ?[Unit?. This string matches both the ?Unit###? section and the ?Unit###_Block#? sections for each probe set, so divide by 2 = 6190 probe sets. The original CDF you do the same and find 15630/2 = 7815 probe sets. I read through the Affymetrix CDF format documentation pretty thoroughly and checked how it corresponded to the original CDF file but couldn?t see anything I had done wrong. Thanks, Scott From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 27 August 2014 16:20 To: Scott Robinson Cc: bioconductor at r-project.org Subject: Re: Using custom CDF with 'make.cdf.env' Hi Scott, As far as I can tell, you haven't made any changes to the cdf at all: > z <- make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ......................... > z <environment: 0x00000000113d5c08=""> > length(ls(z)) [1] 7815 > zz <- as.list(z) > table(sapply(zz, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > y <- make.cdf.env("miRNA-1_0.CDF") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ .......................... > yy <- as.list(y) > length(yy) [1] 7815 > table(sapply(yy, nrow)) 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78 > all.equal(names(zz), names(yy)) [1] TRUE Best, Jim On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <scott.robinson at="" glasgow.ac.uk<mailto:scott.robinson="" at="" glasgow.ac.uk="">> wrote: Dear All, Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new ("newmir1.cdf") CDFs, test script and example CEL file: http://www.files.com/set/53fdeb0aa2176 Thanks, Scott ________________________________________ From: Scott Robinson [guest] [guest@bioconductor.org<mailto:guest@bioconductor.org>] Sent: 27 August 2014 13:11 To: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">; Scott Robinson Cc: makecdfenv Maintainer Subject: Using custom CDF with 'make.cdf.env' Dear List, I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong. I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform. Many thanks, Scott > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\" > > setwd(paste0(folder,"CEL")) > options(stringsAsFactors=FALSE) > library(affy) Loading required package: BiocGenerics Loading required package: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int<http: pmax.int="">, pmin, pmin.int<http: pmin.int="">, Position, rank, rbind, Reduce, rep.int<http: rep.int="">, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. > library(makecdfenv) Loading required package: affyio > > cleancdfname("newmir1.cdf") [1] "newmir1.cdf" > newmir1 = make.cdf.env("newmir1.cdf") Reading CDF file. Creating CDF environment Wait for about 78 dots................................................ ....................... > Data <- ReadAffy() > Data at cdfName <- "newmir1" > > Data AffyBatch object size of arrays=230x230 features (17 kb) cdf=newmir1 (7815 affyids) number of samples=1 number of genes=7815 annotation=mirna102xgain notes= > > dim(exprs(rma(Data))) Background correcting Normalizing Calculating Expression [1] 7815 1 -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1 [5] BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2 [4] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago Scott Robinson ▴ 130