Question: how to build a R package with the inclusion of inst/extdata
1
7.1 years ago by
Yue Li370
USA
Yue Li370 wrote:
Dear list, I wonder if anyone could hint me on how to build a R package with the inclusion of "inst/extdata". Basically, I'm trying to build a R package with some BAM files as my test data used in the examples session from the Rd files. If I simply go: package.skeleton(name="myPackage", code_files=sourceCode, path="packageBuild") and manually copy and past the folder "inst" into the outDir folder then try: R CMD build packageBuild I got: ERROR packaging into .tar.gz failed The "1.1.5 Data in packages" from "Writing R Extensions" does not seem to help me with my problem. Thanks in advance, Yue
• 2.3k views
modified 7.1 years ago by Steve Lianoglou12k • written 7.1 years ago by Yue Li370
Answer: how to build a R package with the inclusion of inst/extdata
0
7.1 years ago by
Denali
Steve Lianoglou12k wrote:
Hi, On Thu, Sep 6, 2012 at 6:25 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: > Dear list, > > I wonder if anyone could hint me on how to build a R package with the inclusion of "inst/extdata". Basically, I'm trying to build a R package with some BAM files as my test data used in the examples session from the Rd files. > > If I simply go: > > package.skeleton(name="myPackage", code_files=sourceCode, path="packageBuild") > > and manually copy and past the folder "inst" into the outDir folder then try: > > R CMD build packageBuild Shouldn't you rather be cd-ing into the packageBuild directory, then running R CMD INSTALL myPackage? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Hi Steven, Thanks for the quick response. I think I probably didn't articulate my intend clearly. Basically, I'm trying to develop a R package rather than using someone else's package. In order to run some examples I have for the functions I wrote, I need to have BAM data saved in the "inst/extdata" (or anywhere for that matters). So when I call: R CMD check mypackage The example that says something like testfiles <- system.file("inst/extdata/*bam$", package = "mypackage", ) can give me the BAM files saved in that inst/extdata/ that come with the tar ball package. But I'm too ignorant to figure out how to do that. Yue On 2012-09-06, at 6:37 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Thu, Sep 6, 2012 at 6:25 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: >> Dear list, >> >> I wonder if anyone could hint me on how to build a R package with the inclusion of "inst/extdata". Basically, I'm trying to build a R package with some BAM files as my test data used in the examples session from the Rd files. >> >> If I simply go: >> >> package.skeleton(name="myPackage", code_files=sourceCode, path="packageBuild") >> >> and manually copy and past the folder "inst" into the outDir folder then try: >> >> R CMD build packageBuild > > Shouldn't you rather be cd-ing into the packageBuild directory, > then running R CMD INSTALL myPackage? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact ADD REPLYlink written 7.1 years ago by Yue Li370 Hi, On Thu, Sep 6, 2012 at 7:21 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: > Hi Steven, > > Thanks for the quick response. I think I probably didn't articulate my intend clearly. I actually understood your intent -- I thought you were confused on why you were getting some error when you ran the R CMD build ... command you posted previously. The problem was that you were trying to build something that wasn't really a package -- it seemed as if you were trying to build the *parent* directory your package directory was living in. > Basically, I'm trying to develop a R package rather than using someone else's package. In order to run some examples I have for the functions I wrote, I need to have BAM data saved in the "inst/extdata" (or anywhere for that matters). So when I call: > > R CMD check mypackage > > The example that says something like > > testfiles <- system.file("inst/extdata/*bam$", package = "mypackage", ) > > can give me the BAM files saved in that inst/extdata/ that come with the tar ball package. But I'm too ignorant to figure out how to do that. If you want to do this pattern matching on *.bam, I'm pretty sure you can't do it in a call to system.file, so you'd first get a handle on your extdata directory, then call dir on it. For example (and to be extra explicit), assuming you install your package succesfully, you would then do in R: R> extdata.dir <- system.file("extdata", package="myPackage") R> bamfiles <- dir(extdata.dir, pattern="\\.bam$", full.names=TRUE) The directory structure of your package would look something like this: myPackage - inst - extdata - data1.bam - data2.bam - R - ... - NAMESPACE - DESCRIPTION And note that when you actually install the package, the contents inside the inst directory get "hoisted" out of it and dropped into the directory of your package, eg. after installation, on your filesystem the extdata directory would be something like: /path/to/your/R/library/myPackage/extdata/ Download the source code of, say, the ShortRead package to see the structure you want to follow: http://www.bioconductor.org/packages/2.10/bioc/src/contrib/ShortRead_1 .14.4.tar.gz HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ADD REPLYlink written 7.1 years ago by Steve Lianoglou12k Sorry Steve, I'm actually stuck at building the package with inst/extdata. This is my first time trying to build a R package, so please bear with me. Let me walk you through my (incorrect) approach: I have a set of R scripts and Rd files that need to be built into a package. I deliberately make all examples in my Rd files trivial such as simply running ls() to pass the test. I can successfully build the package by running the following steps: (1) construct package skeleton in R console: scriptDir <- "~/Desktop/myRscripts/" outDir <- "~/Desktop/" sourceFiles <- list.files(path=scriptDir, pattern="[a-zA-Z]+\\.R$", full.names=TRUE, recursive=TRUE) package.skeleton(name="mypackage", code_files=sourceFiles, path=outDir) I now have a folder named "mypackage" sitting on my ~/Desktop. In a shell script, I do this: (2) replace the skeleton Rd files in ~/Desktop/mypackage/man with my prepared Rd files by: cp ~/Desktop/myRDfiles/*.Rd ~/Desktop/mypackage/man/ (3) R CMD build ~/Desktop/mypackage (4) R CMD check ~/Desktop/mypackage_0.99.0.tar.gz (5) R CMD INSTALL ~/Desktop/mypackage_0.99.0.tar.gz All of the above steps work fine. But now I at the stage of writing concrete examples for each function and use R CMD check in step (4) to make sure that the examples do get run successfully during compilation time. Some of the examples involve using BAM files and I need to put them into the package so that the package gets shipped with these BAM files as test data exactly as the ShortRead package. I learn that creating a subdirectory called "inst/extdata" inside the package folder (as in ShortRead) is a conventional way to put the test data in . So after step (2), I do this cp inst/extdata ~/Desktop/mypackage But then I cannot successfully perform (3) as it returns error: $R CMD build mypackage/ * checking for file ?mypackage/DESCRIPTION? ... OK * preparing ?mypackage?: * checking DESCRIPTION meta-information ... OK * excluding invalid files Subdirectory 'man' contains invalid file names: ?.Rhistory? * checking for LF line-endings in source and make files * checking for empty or unneeded directories * building ?mypackage_0.99.0.tar.gz? /usr/bin/gnutar: mypackage/inst/extdata/expt1/accepted_hits_noDup.bam: file changed as we read it /usr/bin/gnutar: mypackage/inst/extdata/expt2/accepted_hits_noDup.bam: file changed as we read it /usr/bin/gnutar: mypackage/inst/extdata/expt3/accepted_hits_noDup.bam: file changed as we read it ERROR packaging into .tar.gz failed I'm just wondering at which step between (1) and (5) could I somehow incorporate the inst/extdata into the package and make the tar ball containing the inst/extdata. Thanks much for your patient helps! Yue On 2012-09-06, at 7:50 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Thu, Sep 6, 2012 at 7:21 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: >> Hi Steven, >> >> Thanks for the quick response. I think I probably didn't articulate my intend clearly. > > I actually understood your intent -- I thought you were confused on > why you were getting some error when you ran the R CMD build ... > command you posted previously. > > The problem was that you were trying to build something that wasn't > really a package -- it seemed as if you were trying to build the > *parent* directory your package directory was living in. > >> Basically, I'm trying to develop a R package rather than using someone else's package. In order to run some examples I have for the functions I wrote, I need to have BAM data saved in the "inst/extdata" (or anywhere for that matters). So when I call: >> >> R CMD check mypackage >> >> The example that says something like >> >> testfiles <- system.file("inst/extdata/*bam$", package = "mypackage", ) >> >> can give me the BAM files saved in that inst/extdata/ that come with the tar ball package. But I'm too ignorant to figure out how to do that. > > If you want to do this pattern matching on *.bam, I'm pretty sure you > can't do it in a call to system.file, so you'd first get a handle on > your extdata directory, then call dir on it. For example (and to > be extra explicit), assuming you install your package succesfully, you > would then do in R: > > R> extdata.dir <- system.file("extdata", package="myPackage") > R> bamfiles <- dir(extdata.dir, pattern="\\.bam$", full.names=TRUE) > > The directory structure of your package would look something like this: > > myPackage > - inst > - extdata > - data1.bam > - data2.bam > - R > - ... > - NAMESPACE > - DESCRIPTION > > And note that when you actually install the package, the contents > inside the inst directory get "hoisted" out of it and dropped into > the directory of your package, eg. after installation, on your > filesystem the extdata directory would be something like: > > /path/to/your/R/library/myPackage/extdata/ > > Download the source code of, say, the ShortRead package to see the > structure you want to follow: > > http://www.bioconductor.org/packages/2.10/bioc/src/contrib/ShortRead _1.14.4.tar.gz > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact ADD REPLYlink written 7.1 years ago by Yue Li370 On 09/06/2012 06:25 PM, Yue Li wrote: > Sorry Steve, I'm actually stuck at building the package with inst/extdata. This is my first time trying to build a R package, so please bear with me. Let me walk you through my (incorrect) approach: > > I have a set of R scripts and Rd files that need to be built into a package. I deliberately make all examples in my Rd files trivial such as simply running ls() to pass the test. I can successfully build the package by running the following steps: > > (1) construct package skeleton in R console: > scriptDir <- "~/Desktop/myRscripts/" > > outDir <- "~/Desktop/" > > sourceFiles <- list.files(path=scriptDir, pattern="[a-zA-Z]+\\.R$", full.names=TRUE, recursive=TRUE) > > package.skeleton(name="mypackage", code_files=sourceFiles, path=outDir) > > I now have a folder named "mypackage" sitting on my ~/Desktop. In a shell script, I do this: > > (2) replace the skeleton Rd files in ~/Desktop/mypackage/man with my prepared Rd files by: > > cp ~/Desktop/myRDfiles/*.Rd ~/Desktop/mypackage/man/ > > (3) R CMD build ~/Desktop/mypackage > > (4) R CMD check ~/Desktop/mypackage_0.99.0.tar.gz > > (5) R CMD INSTALL ~/Desktop/mypackage_0.99.0.tar.gz > > > All of the above steps work fine. But now I at the stage of writing concrete examples for each function and use R CMD check in step (4) to make sure that the examples do get run successfully during compilation time. Some of the examples involve using BAM files and I need to put them into the package so that the package gets shipped with these BAM files as test data exactly as the ShortRead package. > > I learn that creating a subdirectory called "inst/extdata" inside the package folder (as in ShortRead) is a conventional way to put the test data in . So after step (2), I do this > > cp inst/extdata ~/Desktop/mypackage This is a bit unusual -- cp inst/extdata should complain that you're trying to copy a directory and instead you should use cp -r. I draw attention to this because otherwise it sounds like you've done things correctly... > > > But then I cannot successfully perform (3) as it returns error: > > $R CMD build mypackage/ > * checking for file ?mypackage/DESCRIPTION? ... OK > * preparing ?mypackage?: > * checking DESCRIPTION meta-information ... OK > * excluding invalid files > Subdirectory 'man' contains invalid file names: > ?.Rhistory? > * checking for LF line-endings in source and make files > * checking for empty or unneeded directories > * building ?mypackage_0.99.0.tar.gz? > /usr/bin/gnutar: mypackage/inst/extdata/expt1/accepted_hits_noDup.bam: file changed as we read it > /usr/bin/gnutar: mypackage/inst/extdata/expt2/accepted_hits_noDup.bam: file changed as we read it > /usr/bin/gnutar: mypackage/inst/extdata/expt3/accepted_hits_noDup.bam: file changed as we read it these messages are unusual. It looks to me like your package structure is correct, and that tar is failing because of some unfortunate interaction with your file system. Are these files bam files large? A first suggestion would be to try with smaller 'toy' files, e.g., and assuming you have backups rm -rf mypackage/inst/extdata/expt* touch mypackage/inst/extdata/toy.file also, might as well clean up while we're at it rm mypackage/man/.Rhistory and then R CMD build mypackage R CMD INSTALL mypackage_0.99.0.tar.gz and then in R library(mypackage) extdata.dir = system.file(package="mypackage", "extdata") dir(extdata.dir, full=TRUE) Martin > ERROR > packaging into .tar.gz failed > > > I'm just wondering at which step between (1) and (5) could I somehow incorporate the inst/extdata into the package and make the tar ball containing the inst/extdata. > > Thanks much for your patient helps! > Yue > > > > > > > > > > > On 2012-09-06, at 7:50 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > >> Hi, >> >> On Thu, Sep 6, 2012 at 7:21 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: >>> Hi Steven, >>> >>> Thanks for the quick response. I think I probably didn't articulate my intend clearly. >> >> I actually understood your intent -- I thought you were confused on >> why you were getting some error when you ran the R CMD build ... >> command you posted previously. >> >> The problem was that you were trying to build something that wasn't >> really a package -- it seemed as if you were trying to build the >> *parent* directory your package directory was living in. >> >>> Basically, I'm trying to develop a R package rather than using someone else's package. In order to run some examples I have for the functions I wrote, I need to have BAM data saved in the "inst/extdata" (or anywhere for that matters). So when I call: >>> >>> R CMD check mypackage >>> >>> The example that says something like >>> >>> testfiles <- system.file("inst/extdata/*bam$", package = "mypackage", ) >>> >>> can give me the BAM files saved in that inst/extdata/ that come with the tar ball package. But I'm too ignorant to figure out how to do that. >> >> If you want to do this pattern matching on *.bam, I'm pretty sure you >> can't do it in a call to system.file, so you'd first get a handle on >> your extdata directory, then call dir on it. For example (and to >> be extra explicit), assuming you install your package succesfully, you >> would then do in R: >> >> R> extdata.dir <- system.file("extdata", package="myPackage") >> R> bamfiles <- dir(extdata.dir, pattern="\\.bam$", full.names=TRUE) >> >> The directory structure of your package would look something like this: >> >> myPackage >> - inst >> - extdata >> - data1.bam >> - data2.bam >> - R >> - ... >> - NAMESPACE >> - DESCRIPTION >> >> And note that when you actually install the package, the contents >> inside the inst directory get "hoisted" out of it and dropped into >> the directory of your package, eg. after installation, on your >> filesystem the extdata directory would be something like: >> >> /path/to/your/R/library/myPackage/extdata/ >> >> Download the source code of, say, the ShortRead package to see the >> structure you want to follow: >> >> http://www.bioconductor.org/packages/2.10/bioc/src/contrib/ShortRea d_1.14.4.tar.gz >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ADD REPLYlink written 7.1 years ago by Martin Morgan ♦♦ 23k Thanks Martin. I got it to work somehow by following the exactly the same workflow ... the file size is about 80 Mb but it manages to squeeze them into the tar ball eventually. On 2012-09-07, at 1:02 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 09/06/2012 06:25 PM, Yue Li wrote: >> Sorry Steve, I'm actually stuck at building the package with inst/extdata. This is my first time trying to build a R package, so please bear with me. Let me walk you through my (incorrect) approach: >> >> I have a set of R scripts and Rd files that need to be built into a package. I deliberately make all examples in my Rd files trivial such as simply running ls() to pass the test. I can successfully build the package by running the following steps: >> >> (1) construct package skeleton in R console: >> scriptDir <- "~/Desktop/myRscripts/" >> >> outDir <- "~/Desktop/" >> >> sourceFiles <- list.files(path=scriptDir, pattern="[a-zA-Z]+\\.R$", full.names=TRUE, recursive=TRUE) >> >> package.skeleton(name="mypackage", code_files=sourceFiles, path=outDir) >> >> I now have a folder named "mypackage" sitting on my ~/Desktop. In a shell script, I do this: >> >> (2) replace the skeleton Rd files in ~/Desktop/mypackage/man with my prepared Rd files by: >> >> cp ~/Desktop/myRDfiles/*.Rd ~/Desktop/mypackage/man/ >> >> (3) R CMD build ~/Desktop/mypackage >> >> (4) R CMD check ~/Desktop/mypackage_0.99.0.tar.gz >> >> (5) R CMD INSTALL ~/Desktop/mypackage_0.99.0.tar.gz >> >> >> All of the above steps work fine. But now I at the stage of writing concrete examples for each function and use R CMD check in step (4) to make sure that the examples do get run successfully during compilation time. Some of the examples involve using BAM files and I need to put them into the package so that the package gets shipped with these BAM files as test data exactly as the ShortRead package. >> >> I learn that creating a subdirectory called "inst/extdata" inside the package folder (as in ShortRead) is a conventional way to put the test data in . So after step (2), I do this >> >> cp inst/extdata ~/Desktop/mypackage > > This is a bit unusual -- cp inst/extdata should complain that you're trying to copy a directory and instead you should use cp -r. I draw attention to this because otherwise it sounds like you've done things correctly... > >> >> >> But then I cannot successfully perform (3) as it returns error: >> >> $R CMD build mypackage/ >> * checking for file ?mypackage/DESCRIPTION? ... OK >> * preparing ?mypackage?: >> * checking DESCRIPTION meta-information ... OK >> * excluding invalid files >> Subdirectory 'man' contains invalid file names: >> ?.Rhistory? >> * checking for LF line-endings in source and make files >> * checking for empty or unneeded directories >> * building ?mypackage_0.99.0.tar.gz? >> /usr/bin/gnutar: mypackage/inst/extdata/expt1/accepted_hits_noDup.bam: file changed as we read it >> /usr/bin/gnutar: mypackage/inst/extdata/expt2/accepted_hits_noDup.bam: file changed as we read it >> /usr/bin/gnutar: mypackage/inst/extdata/expt3/accepted_hits_noDup.bam: file changed as we read it > > these messages are unusual. It looks to me like your package structure is correct, and that tar is failing because of some unfortunate interaction with your file system. > > Are these files bam files large? A first suggestion would be to try with smaller 'toy' files, e.g., and assuming you have backups > > rm -rf mypackage/inst/extdata/expt* > touch mypackage/inst/extdata/toy.file > > also, might as well clean up while we're at it > > rm mypackage/man/.Rhistory > > and then > > R CMD build mypackage > R CMD INSTALL mypackage_0.99.0.tar.gz > > and then in R > > library(mypackage) > extdata.dir = system.file(package="mypackage", "extdata") > dir(extdata.dir, full=TRUE) > > Martin > >> ERROR >> packaging into .tar.gz failed >> >> >> I'm just wondering at which step between (1) and (5) could I somehow incorporate the inst/extdata into the package and make the tar ball containing the inst/extdata. >> >> Thanks much for your patient helps! >> Yue >> >> >> >> >> >> >> >> >> >> >> On 2012-09-06, at 7:50 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: >> >>> Hi, >>> >>> On Thu, Sep 6, 2012 at 7:21 PM, Yue Li <gorillayue at="" gmail.com=""> wrote: >>>> Hi Steven, >>>> >>>> Thanks for the quick response. I think I probably didn't articulate my intend clearly. >>> >>> I actually understood your intent -- I thought you were confused on >>> why you were getting some error when you ran the R CMD build ... >>> command you posted previously. >>> >>> The problem was that you were trying to build something that wasn't >>> really a package -- it seemed as if you were trying to build the >>> *parent* directory your package directory was living in. >>> >>>> Basically, I'm trying to develop a R package rather than using someone else's package. In order to run some examples I have for the functions I wrote, I need to have BAM data saved in the "inst/extdata" (or anywhere for that matters). So when I call: >>>> >>>> R CMD check mypackage >>>> >>>> The example that says something like >>>> >>>> testfiles <- system.file("inst/extdata/*bam$", package = "mypackage", ) >>>> >>>> can give me the BAM files saved in that inst/extdata/ that come with the tar ball package. But I'm too ignorant to figure out how to do that. >>> >>> If you want to do this pattern matching on *.bam, I'm pretty sure you >>> can't do it in a call to system.file, so you'd first get a handle on >>> your extdata directory, then call dir on it. For example (and to >>> be extra explicit), assuming you install your package succesfully, you >>> would then do in R: >>> >>> R> extdata.dir <- system.file("extdata", package="myPackage") >>> R> bamfiles <- dir(extdata.dir, pattern="\\.bam\$", full.names=TRUE) >>> >>> The directory structure of your package would look something like this: >>> >>> myPackage >>> - inst >>> - extdata >>> - data1.bam >>> - data2.bam >>> - R >>> - ... >>> - NAMESPACE >>> - DESCRIPTION >>> >>> And note that when you actually install the package, the contents >>> inside the inst directory get "hoisted" out of it and dropped into >>> the directory of your package, eg. after installation, on your >>> filesystem the extdata directory would be something like: >>> >>> /path/to/your/R/library/myPackage/extdata/ >>> >>> Download the source code of, say, the ShortRead package to see the >>> structure you want to follow: >>> >>> http://www.bioconductor.org/packages/2.10/bioc/src/contrib/ShortRe ad_1.14.4.tar.gz >>> >>> HTH, >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793