Search
Question: Path to file in windows
0
gravatar for jorrenkuster
8 months ago by
jorrenkuster0 wrote:

Hi all,

I have a question about generation a path in R with a file in a windows path.

Currently I am using this path for a bamfile counting reads for DiffBind:

bamReads = "U:/R/win-library/3.0/Data_ChIP_Experiment/aligned_S00D47H1-RUNX1.filtered.bam"

This does not work at the moment. Cause the error is: Some read files could not be accessed. See warnings for details.

Does anyone has a suggestion?

Greats, Jorren

ADD COMMENTlink modified 7 months ago by Gord Brown560 • written 8 months ago by jorrenkuster0

Hi,

Does it warn about all files, or just some of them?  (The warning messages should tell you which files it can't find.)

If you post your sample sheet (or email it to me, email on the DiffBind home page) I can take a look to see if there are possible problems.

 - Gord

ADD REPLYlink written 8 months ago by Gord Brown560

Hi Gord,

It warns about all the files I want to use. I will post it there soon, thanks.

Jorren

ADD REPLYlink written 8 months ago by jorrenkuster0

If you copy the cell containing the file path from the sample sheet, and, in R, try

file.exists("...paste filename here...")

pasting the exact string from the sample sheet instead of ...paste filename here... can R find the file?  The idea is to test whether it's your path, some quirk of R, or a problem with DiffBind (entirely possible!).  (I don't have a Windows machine accessible, so can't easily test for Windows-related quirks.)

ADD REPLYlink written 8 months ago by Gord Brown560

If I run the command the output is as follows:

> file.exists("aligned_S00D47h1-RUNX1.filtered.bam")
[1] TRUE

 

ADD REPLYlink written 8 months ago by jorrenkuster0

Your original question suggested that the sample sheet had an absolute path for the files (U:/R/win-library/3.0/Data_ChIP_Experiment/aligned_S00D47H1-RUNX1.filtered.bam), but your example above is just the bare filename.  Whatever is in the sample sheet has to be either an absolute path (U:/R/win-library/...), or a path that identifies the file relative to the current directory when running DiffBind.  So... running R in the directory that you use when running DiffBind, try file.exists with the exact contents of the cell.  Do you have all the files in the same directory that you're running DiffBind in?  If so just the filename(s) in the sample sheet should be fine (without the full path).

We typically run DiffBind in a directory with the sample sheet, all the peaks in a subdirectory named "peaks", and all the BAM files in a subdirectory named "bam".  So the sample sheet contains paths like "bam/myfile.bam", relative to the directory R is running in.

ADD REPLYlink written 8 months ago by Gord Brown560

This is the Samplesheet I am using. 

The samplesheet is located in the directory where I run DiffBind.

SampleID Tissue Factor Condition Replicate bamReads bamControl Peaks PeakCaller Counts
S00D63H1 iPSC RUNX1 DOX 1 bam/aligned_S00D63H1.filtered.bam   peaks/aligned_S00D63H1.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_S00D63H1.filtered.bam.count
S00D47H1-RUNX1 iPSC RUNX1 DOX 1 bam/aligned_S00D47H1-RUNX1.filtered.bam   peaks/aligned_S00D47H1-RUNX1.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_S00D47H1-RUNX1.filtered.bam.count
S00D55H1-2006090-RUNX1 iPSC RUNX1 DOX 1 bam/aligned_S00D55H1-2006090-RUNX1.filtered.bam   peaks/aligned_S00D55H1-2006090-RUNX1.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_S00D55H1-2006090-RUNX1.filtered.bam.count
S00XUNH1-RUN iPSC RUNX1 DOX 1 bam/aligned_S00XUNH1-RUNX1.filtered.bam   peaks/aligned_S00XUNH1-RUNX1.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_S00XUNH1-RUNX1.filtered.bam.count
SRR1536791 CD34+ RUNX1   1 bam/aligned_SRR1536791.filtered.bam   peaks/aligned_SRR1536791.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_SRR1536791.filtered.bam.count
SRR772111 CD34+ RUNX1   1 bam/aligned_SRR772111.filtered.bam   peaks/aligned_SRR772111.filtered.bam.calledpeaks_peaks.bed macs counts/aligned_SRR772111.filtered.bam.count
ADD REPLYlink written 7 months ago by jorrenkuster0

A couple of points that may be relevant (or may not, but need to be fixed):

1) The Condition column should have a value for every row, even if the value is "Control".

2) You have the peak caller listed as "macs" but the peaks have a ".bed" suffix.  If you want to use bed files, set the PeakCaller to be "bed", or add a  "PeakFormat" column with "bed" as the value for each row.  MACS also produces a file with a ".xls" suffix (though it isn't an Excel file); that's the one that DiffBind is expecting if the PeakCaller is "macs". 

3) You probably don't want to set the "Counts" parameter; DiffBind is designed to do its own counting from the supplied BAM files.

Beyond that, please confirm that with R running in the same directory as the sample sheet,

file.exists("bam/aligned_S00D63H1.filtered.bam")

file.exists("peaks/aligned_S00D63H1.filtered.bam.calledpeaks_peaks.bed")

both return TRUE.  (Your previous example didn't include the "bam/" or "peaks/" path component.)

ADD REPLYlink written 7 months ago by Gord Brown560

I input all the recommendations you said en both files return TRUE. 

The error unfortunately remains the same 

bam/aligned_S00D63H1.filtered.bam not accessible
ADD REPLYlink written 7 months ago by jorrenkuster0

I don't know what to suggest.  Either your current working directory isn't what you think it is when you run R, or there is some mismatch between the paths in your sample sheet and the actual paths on disk.   There is no other possibility that I can think of.

In an R session, type the following commands, and post the *complete* script, including the exact commands you typed and the exact output (*all* of it):

> getwd()

> samples = read.csv('sampleSheet.csv',header=T)

> print(samples)

> list.files()

> list.files(path="bam")

> sessionInfo()

Otherwise perhaps you can find a local person who can give you some help.

ADD REPLYlink written 7 months ago by Gord Brown560
> getwd()
[1] "\\\\home2.science.ru.nl/jkuster/R/win-library/3.0/Data_ChIP_Experiment"


> samples <- read.csv('Examplesheet.csv', header=T)

> print(samples)
                SampleID Tissue Factor Condition Replicate                                        bamReads bamControl
1               S00D63H1   iPSC  RUNX1       DOX         1               bam/aligned_S00D63H1.filtered.bam         NA
2         S00D47H1-RUNX1   iPSC  RUNX1       DOX         1         bam/aligned_S00D47H1-RUNX1.filtered.bam         NA
3 S00D55H1-2006090-RUNX1   iPSC  RUNX1       DOX         1 bam/aligned_S00D55H1-2006090-RUNX1.filtered.bam         NA
4           S00XUNH1-RUN   iPSC  RUNX1       DOX         1         bam/aligned_S00XUNH1-RUNX1.filtered.bam         NA
5             SRR1536791  CD34+  RUNX1   Control         1             bam/aligned_SRR1536791.filtered.bam         NA
6              SRR772111  CD34+  RUNX1   Control         1              bam/aligned_SRR772111.filtered.bam         NA
                                                                    Peaks PeakCaller Peakformat
1               peaks/aligned_S00D63H1.filtered.bam.calledpeaks_peaks.bed       macs        bed
2         peaks/aligned_S00D47H1-RUNX1.filtered.bam.calledpeaks_peaks.bed       macs        bed
3 peaks/aligned_S00D55H1-2006090-RUNX1.filtered.bam.calledpeaks_peaks.bed       macs        bed
4         peaks/aligned_S00XUNH1-RUNX1.filtered.bam.calledpeaks_peaks.bed       macs        bed
5             peaks/aligned_SRR1536791.filtered.bam.calledpeaks_peaks.bed       macs        bed
6              peaks/aligned_SRR772111.filtered.bam.calledpeaks_peaks.bed       macs        bed

list.files()
[1] "aligned_S00D47H1-RUNX1.filtered.bam.bai" "bam"                                     "Examplesheet.csv"                       
[4] "Examplesheet.pdf"                        "Patient.csv"                             "peaks"

> list.files()
[1] "bam"              "Examplesheet.csv" "Patient.csv"      "peaks"

> list.files(path='bam')
[1] "aligned_S00D47H1-RUNX1.filtered.bam" "aligned_S00D55H1.filtered.bam"       "aligned_S00D63H1.filtered.bam"       "aligned_S00XUNH1-RUNX1.filtered.bam"
[5] "aligned_SRR1536791.filtered.bam"     "aligned_SRR772111.filtered.bam"

 

ADD REPLYlink written 7 months ago by jorrenkuster0

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DiffBind_1.8.5       GenomicRanges_1.14.4 XVector_0.2.0        IRanges_1.20.7       BiocGenerics_0.8.0  

loaded via a namespace (and not attached):
 [1] amap_0.8-14        bitops_1.0-6       caTools_1.17.1     edgeR_3.4.2        gdata_2.13.3       gplots_2.16.0      gtools_3.4.2       KernSmooth_2.23-10
 [9] limma_3.18.13      RColorBrewer_1.1-2 stats4_3.0.2       tools_3.0.2        zlibbioc_1.8.0   

ADD REPLYlink written 7 months ago by jorrenkuster0

Hi,

Well, the paths look right.  Your sessionInfo shows that your versions of R and DiffBind are close to 4 years out of date, though.  Could you upgrade to R 3.3.3 or 3.4.0 and try again?  We can't support versions of DiffBind that old.

ADD REPLYlink written 7 months ago by Gord Brown560

It works now, thanks a lot for your help

ADD REPLYlink written 6 months ago by jorrenkuster0

My guess would be the network path "//home2...."; you could try to map this (I googled for 'windows map network drive') to a standard windows drive letter and use that.

ADD REPLYlink written 7 months ago by Martin Morgan ♦♦ 21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 322 users visited in the last hour