0
5.7 years ago by
Sylvain Foisy80 wrote:
Hi, I am trying to use easyRNASeq to create count tables for DESeq analysis. I have about a hundred BAM files with their associated indexes, located in my file system and sorted in folders, on for each provider and in each one folder for each tissue. To not use to much space, I created simlinks from each of these BAM/BAI combo toward a single location so that easyRNASeq can read them all. When I try my command, I get this error: Error in easyRNASeq(filesDirectory = "/shares/data2/tmp01/inflammgen_t emp_transfer/htseq_datastore/RNASeq_datastore/RNASeq_comprehensive_cat alog/4.diff_ex_analysis/4.0.link2files/2013_10_29-16_00_07_NCBI_build3 7.from.2.2.phred_q_trimming_2013_10_24", : Index files (bai) are required. They are missing for the files: blablabla... But, the files are there :-( I removed the BAI symlinks and cp the real files to the same location and I get the same mistake... Is my idea of symlinking from the source the problem? Any idea appreciated ;-) Best regards Sylvain -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Sylvain Foisy, Ph. D. Consultant principal | Lead Consultant Diploide BioIT -> TI pour la vie / IT for Life M: sylvain.foisy@diploide.net T: (514) 893-4363 W: http://www.diploide.net -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [[alternative HTML version deleted]]
easyrnaseq • 916 views
modified 5.7 years ago by Sylvain Foisy30 • written 5.7 years ago by Sylvain Foisy80
0
5.7 years ago by
Sylvain Foisy30 wrote:
Hi, I am trying to use easyRNASeq to create count tables for DESeq analysis. I have about a hundred BAM files with their associated indexes, located in my file system and sorted in folders, on for each provider and in each one folder for each tissue. To not use to much space, I created simlinks from each of these BAM/BAI combo toward a single location so that easyRNASeq can read them all. When I try my command, I get this error: Error in easyRNASeq(filesDirectory = "/shares/data2/tmp01/inflammgen_t emp_transfer/htseq_datastore/RNASeq_datastore/RNASeq_comprehensive_cat alog/4.diff_ex_analysis/4.0.link2files/2013_10_29-16_00_07_NCBI_build3 7.from.2.2.phred_q_trimming_2013_10_24", : Index files (bai) are required. They are missing for the files: blablabla... But, the files are there :-( I removed the BAI symlinks and cp the real files to the same location and I get the same mistake... Is my idea of symlinking from the source the problem? Any idea appreciated ;-) Best regards Sylvain ============================================================== Sylvain Foisy, Ph. D. Chargé de projet | Project Manager Bioinformatics Labo. de génétique et médecine génomique de l'inflammation Centre de recherche Institut de cardiologie de Montréal 5000 Bélanger Est Montréal, Qc H1T 1C8 CANADA T: (514) 376-3330 x.2299 F: 514-593-2539 M: sylvain.foisy@inflammgen.org<mailto:sylvain.foisy@inflammgen.org> W: http://inflammgen.org ============================================================== [[alternative HTML version deleted]]
0
5.7 years ago by
Sweden
Nicolas Delhomme320 wrote:
Bonsoir Sylvain, I have never encountered that situation. I?ve tried to reproduce it but to no avail. Can you: 1) report your sessionInfo() to make sure you are using R-3.0.2 and easyRNASeq-1.8.2? 2) give me a glance of your symlink structure to see if my mockup recapitulated your case? 3) can you try to reproduce the following: Here is what I did to test easyRNASeq on what I guess was your symlink structure: a) first I created symlinks in my Desktop/tmp dir: lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACACTG.bam -> /Users/de lhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACACTG.bam.bai -> /User s/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam.bai lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACTAGC.bam -> /Users/de lhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACTAGC.bam.bai -> /User s/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam.bai lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ATGGCT.bam -> /Users/de lhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ATGGCT.bam.bai -> /User s/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam.bai lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 TTGCGA.bam -> /Users/de lhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 TTGCGA.bam.bai -> /User s/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam.bai pointing to the example data package (RnaSeqTutorial) companion of easyRNASeq. b) then in R I ran: counts <- easyRNASeq(filesDirectory="Desktop/tmp", pattern="[A,C,T,G]{6}\\.bam$", readLength=30L,chr.sel="chr2L", organism="Dmelanogaster", annotationMethod="rda", annotationFile=system.file("data","gAnnot.rda",pa ckage="RnaSeqTutorial"), count="exons") and I got the expected count table. 4) btw, how did you create the links? As far as I remember, Windows does not create symlink in the same way a Unix system does, which is what I tried here. Thanks, Nico --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On 26 Nov 2013, at 16:37, Sylvain Foisy Ph. D. <sylvain.foisy at="" diploide.net=""> wrote: > Hi, > > I am trying to use easyRNASeq to create count tables for DESeq analysis. I have about a hundred BAM files with their associated indexes, located in my file system and sorted in folders, on for each provider and in each one folder for each tissue. To not use to much space, I created simlinks from each of these BAM/BAI combo toward a single location so that easyRNASeq can read them all. When I try my command, I get this error: > > Error in easyRNASeq(filesDirectory = "/shares/data2/tmp01/inflammgen _temp_transfer/htseq_datastore/RNASeq_datastore/RNASeq_comprehensive_c atalog/4.diff_ex_analysis/4.0.link2files/2013_10_29-16_00_07_NCBI_buil d37.from.2.2.phred_q_trimming_2013_10_24", : > Index files (bai) are required. They are missing for the files: blablabla... > > But, the files are there :-( I removed the BAI symlinks and cp the real files to the same location and I get the same mistake... Is my idea of symlinking from the source the problem? > > Any idea appreciated ;-) > > Best regards > > Sylvain > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Sylvain Foisy, Ph. D. > Consultant principal | Lead Consultant > Diploide BioIT -> TI pour la vie / IT for Life > > M: sylvain.foisy at diploide.net > T: (514) 893-4363 > W: http://www.diploide.net > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD COMMENTlink written 5.7 years ago by Nicolas Delhomme320 Hi Nicolas, On 2013-11-26, at 11:13 AM, Nicolas Delhomme wrote: > Bonsoir Sylvain, > > I have never encountered that situation. I?ve tried to reproduce it but to no avail. Well, that would not be the first time but usually, I am in your position ;-) Ok, here goes: > 1) report your sessionInfo() to make sure you are using R-3.0.2 and easyRNASeq-1.8.2? > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] easyRNASeq_1.8.2 ShortRead_1.20.0 Rsamtools_1.14.2 [4] GenomicRanges_1.14.3 DESeq_1.14.0 lattice_0.20-24 [7] locfit_1.5-9.1 Biostrings_2.30.1 XVector_0.2.0 [10] IRanges_1.20.6 edgeR_3.4.0 limma_3.18.3 [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 [16] BiocGenerics_0.8.0 intervals_0.14.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 [7] grid_3.0.2 hwriter_1.3 latticeExtra_0.6-26 [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 [16] survival_2.37-4 XML_3.98-1.1 xtable_1.7-1 [19] zlibbioc_1.8.0 > 2) give me a glance of your symlink structure to see if my mockup recapitulated your case? Basically, my BAM/BAI file combos are located in something like this: given/location/sampleID/tissue/accepted_hits.bam given/location/sampleID/tissue/accepted_hits.bam.bai And I created my symlinks something like this: another/location/sample_tissue_accepted_nits.bam another/location/sample_tissue_accepted_nits.bam.bai So that all the symlinks are under a single location instead. > 3) can you try to reproduce the following: > > Here is what I did to test easyRNASeq on what I guess was your symlink structure: > > a) first I created symlinks in my Desktop/tmp dir: > > lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACACTG.bam -> /Users/ delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam > lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACACTG.bam.bai -> /Us ers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam.b ai > lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACTAGC.bam -> /Users/ delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam > lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACTAGC.bam.bai -> /Us ers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam.b ai > lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ATGGCT.bam -> /Users/ delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam > lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ATGGCT.bam.bai -> /Us ers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam.b ai > lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 TTGCGA.bam -> /Users/ delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam > lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 TTGCGA.bam.bai -> /Us ers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam.b ai > > pointing to the example data package (RnaSeqTutorial) companion of easyRNASeq. > > b) then in R I ran: > > counts <- easyRNASeq(filesDirectory="Desktop/tmp", > pattern="[A,C,T,G]{6}\\.bam$", > readLength=30L,chr.sel="chr2L", > organism="Dmelanogaster", > annotationMethod="rda", > annotationFile=system.file("data","gAnnot.rda",p ackage="RnaSeqTutorial"), > count="exons") > > > and I got the expected count table. Ok, along the suggested lines, I installed the RnaSeqTutorial package and symliked the BAM/BAI files from their location to a ~/test folder and am running this command: count.table<-easyRNASeq(filesDirectory="/shares/home/foisys/test", pattern="[A,C,T,G]{6}\\.bam$", readLength=30L,chr.sel="chr2L", organism="Dmelanogaster", annotationMethod="rda", annotationFile=system.file("data","gAnnot.rda",pac kage="RnaSeqTutorial"), count="exons") and it is working... > > 4) btw, how did you create the links? As far as I remember, Windows does not create symlink in the same way a Unix system does, which is what I tried here. I am 100% Linux ;-) Best regards Sylvain ADD REPLYlink written 5.7 years ago by Sylvain Foisy80 Beats me? I?m really hitting in the dark here but: 1) Is there any possibility that you have circular symlinks, or symlink chains? I?m not sure how gracefully R would do with symlink chains. 2) Are there no permission issues on the bam/bai files or a subset of them? 3) Could it be that the symlink names are too long? That used to create problem in the distant past on some linux distro, but I have not seen it occurring in years. Nico --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On 26 Nov 2013, at 17:36, Sylvain Foisy Ph. D. <sylvain.foisy at="" diploide.net=""> wrote: > Hi Nicolas, > > On 2013-11-26, at 11:13 AM, Nicolas Delhomme wrote: > >> Bonsoir Sylvain, >> >> I have never encountered that situation. I?ve tried to reproduce it but to no avail. > > Well, that would not be the first time but usually, I am in your position ;-) Ok, here goes: > >> 1) report your sessionInfo() to make sure you are using R-3.0.2 and easyRNASeq-1.8.2? > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] easyRNASeq_1.8.2 ShortRead_1.20.0 Rsamtools_1.14.2 > [4] GenomicRanges_1.14.3 DESeq_1.14.0 lattice_0.20-24 > [7] locfit_1.5-9.1 Biostrings_2.30.1 XVector_0.2.0 > [10] IRanges_1.20.6 edgeR_3.4.0 limma_3.18.3 > [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 > [16] BiocGenerics_0.8.0 intervals_0.14.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 > [4] DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0 > [7] grid_3.0.2 hwriter_1.3 latticeExtra_0.6-26 > [10] LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 > [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 > [16] survival_2.37-4 XML_3.98-1.1 xtable_1.7-1 > [19] zlibbioc_1.8.0 > > >> 2) give me a glance of your symlink structure to see if my mockup recapitulated your case? > > Basically, my BAM/BAI file combos are located in something like this: > > given/location/sampleID/tissue/accepted_hits.bam > given/location/sampleID/tissue/accepted_hits.bam.bai > > And I created my symlinks something like this: > > another/location/sample_tissue_accepted_nits.bam > another/location/sample_tissue_accepted_nits.bam.bai > > So that all the symlinks are under a single location instead. > >> 3) can you try to reproduce the following: >> >> Here is what I did to test easyRNASeq on what I guess was your symlink structure: >> >> a) first I created symlinks in my Desktop/tmp dir: >> >> lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACACTG.bam -> /Users /delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam >> lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACACTG.bam.bai -> /U sers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACACTG.bam. bai >> lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ACTAGC.bam -> /Users /delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam >> lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ACTAGC.bam.bai -> /U sers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ACTAGC.bam. bai >> lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 ATGGCT.bam -> /Users /delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam >> lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 ATGGCT.bam.bai -> /U sers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/ATGGCT.bam. bai >> lrwxr-xr-x 1 delhomme staff 71 Nov 26 17:02 TTGCGA.bam -> /Users /delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam >> lrwxr-xr-x 1 delhomme staff 75 Nov 26 17:02 TTGCGA.bam.bai -> /U sers/delhomme/Library/R/3.0/library/RnaSeqTutorial/extdata/TTGCGA.bam. bai >> >> pointing to the example data package (RnaSeqTutorial) companion of easyRNASeq. >> >> b) then in R I ran: >> >> counts <- easyRNASeq(filesDirectory="Desktop/tmp", >> pattern="[A,C,T,G]{6}\\.bam$", >> readLength=30L,chr.sel="chr2L", >> organism="Dmelanogaster", >> annotationMethod="rda", >> annotationFile=system.file("data","gAnnot.rda",p ackage="RnaSeqTutorial"), >> count="exons") >> >> >> and I got the expected count table. > > Ok, along the suggested lines, I installed the RnaSeqTutorial package and symliked the BAM/BAI files from their location to a ~/test folder and am running this command: > > count.table<-easyRNASeq(filesDirectory="/shares/home/foisys/test", > pattern="[A,C,T,G]{6}\\.bam\$", > readLength=30L,chr.sel="chr2L", > organism="Dmelanogaster", > annotationMethod="rda", > annotationFile=system.file("data","gAnnot.rda",pa ckage="RnaSeqTutorial"), > count="exons") > > and it is working... > >> >> 4) btw, how did you create the links? As far as I remember, Windows does not create symlink in the same way a Unix system does, which is what I tried here. > > I am 100% Linux ;-) > > Best regards > > Sylvain >
Hi, On 2013-11-26, at 11:55 AM, Nicolas Delhomme wrote: > 1) Is there any possibility that you have circular symlinks, or symlink chains? I?m not sure how gracefully R would do with symlink chains. Nope, I checked for that. > 2) Are there no permission issues on the bam/bai files or a subset of them? Nope take 2: both locations are own by me with full r+w permissions > > 3) Could it be that the symlink names are too long? That used to create problem in the distant past on some linux distro, but I have not seen it occurring in years. I made it work when I am in the original location (a single BAM/BAI combo) and it works so the hypothesis of long paths might be it... I'll see what I can do to rename my stuff with shorter names. Thanks for the time Sylvain
On 11/26/2013 09:36 AM, Sylvain Foisy Ph. D. wrote: > Hi, > > On 2013-11-26, at 11:55 AM, Nicolas Delhomme wrote: > >> 1) Is there any possibility that you have circular symlinks, or symlink chains? I?m not sure how gracefully R would do with symlink chains. > > Nope, I checked for that. > >> 2) Are there no permission issues on the bam/bai files or a subset of them? > > Nope take 2: both locations are own by me with full r+w permissions > >> >> 3) Could it be that the symlink names are too long? That used to create problem in the distant past on some linux distro, but I have not seen it occurring in years. > > I made it work when I am in the original location (a single BAM/BAI combo) and it works so the hypothesis of long paths might be it... I'll see what I can do to rename my stuff with shorter names. > my guess would be cross-file-system symlinks. Martin > Thanks for the time > > Sylvain > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793