how to analysis chip-seq data
1
0
Entering edit mode
@bioinformatics-10931
Last seen 2.2 years ago
United States

I am using your R code to analysis some Chip_seq data. I have 6 fastq files You have two files per samples in your example while I have only one

FileName1 FileName2

so I amended the file as with 1 sample and I can read all

I am sure I have the data in the folder but I cannot figure out why it gives me an error

I attached my target file

Let me know your thoughts

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 
targetpath <- "~/Desktop/data/targetsPE_chip.txt"
targets <- read.delim("targetsPE_chip.txt", comment.char = "#")
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe",package = "systemPipeR")
trim <- loadWF(targets = targetpath, wf_file = "trim-pe.cwl",
               input_file = "trim-pe.yml", dir_path = dir_path)
trim <- renderWF(trim, inputvars = c(FileName1 = "_FASTQ_PATH1_", SampleName = "_SampleName_"))
trim
output(trim)[1:2]
filterFct <- function(fq, cutoff = 20, Nexceptions = 0) {
  qcount <- rowSums(as(quality(fq), "matrix") <= cutoff, na.rm = TRUE)
  fq[qcount <= Nexceptions]
  # Retains reads where Phred scores are >= cutoff with N
  # exceptions
}


but when I invoke the following command, it always gives me error 

preprocessReads(args = trim, Fct = "filterFct(fq, cutoff=20, Nexceptions=0)",
                batchsize = 1e+05)

Error in open.connection(con, "rb") : cannot open the connection
In addition: Warning messages:
1: In normalizePath(subset_input[[i]][["FileName1"]]) :
  path[1]="/Users/admin/Desktop/data/S1_R1_001.fastq.gz ": No such file or directory
2: In open.connection(con, "rb") :
  cannot open file '/Users/admin/Desktop/data/S1_R1_001.fastq.gz ': No such file or directory

sessionInfo( )
> sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] systemPipeR_1.22.0          ShortRead_1.46.0            GenomicAlignments_1.26.0   
 [4] SummarizedExperiment_1.20.0 Biobase_2.50.0              MatrixGenerics_1.2.1       
 [7] matrixStats_0.61.0          BiocParallel_1.28.3         Rsamtools_2.6.0            
[10] Biostrings_2.58.0           XVector_0.30.0              GenomicRanges_1.42.0       
[13] GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1           
[16] BiocGenerics_0.36.1        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.20             hwriter_1.3.2            ellipsis_0.3.2          
  [5] rstudioapi_0.13          bit64_4.0.5              AnnotationDbi_1.52.0     fansi_0.5.0             
  [9] xml2_1.3.3               splines_4.0.5            cachem_1.0.6             jsonlite_1.7.2          
 [13] annotate_1.68.0          GO.db_3.11.4             dbplyr_2.1.1             png_0.1-7               
 [17] pheatmap_1.0.12          graph_1.66.0             compiler_4.0.5           httr_1.4.2              
 [21] GOstats_2.54.0           backports_1.4.1          assertthat_0.2.1         Matrix_1.4-0            
 [25] fastmap_1.1.0            limma_3.46.0             prettyunits_1.1.1        tools_4.0.5             
 [29] gtable_0.3.0             glue_1.6.0               GenomeInfoDbData_1.2.4   Category_2.54.0         
 [33] dplyr_1.0.7              rsvg_2.1.2               batchtools_0.9.15        rappdirs_0.3.3          
 [37] V8_4.0.0                 Rcpp_1.0.7               vctrs_0.3.8              rtracklayer_1.54.0      
 [41] stringr_1.4.0            lifecycle_1.0.1          restfulr_0.0.13          XML_3.99-0.8            
 [45] edgeR_3.32.1             zlibbioc_1.36.0          scales_1.1.1             BSgenome_1.58.0         
 [49] VariantAnnotation_1.36.0 hms_1.1.1                RBGL_1.64.0              RColorBrewer_1.1-2      
 [53] yaml_2.2.1               curl_4.3.2               memoise_2.0.1            ggplot2_3.3.5           
 [57] biomaRt_2.46.3           latticeExtra_0.6-29      stringi_1.7.6            RSQLite_2.2.9           
 [61] genefilter_1.72.1        BiocIO_1.0.1             checkmate_2.0.0          GenomicFeatures_1.46.2  
 [65] DOT_0.1                  rlang_0.4.12             pkgconfig_2.0.3          bitops_1.0-7            
 [69] lattice_0.20-45          purrr_0.3.4              bit_4.0.4                tidyselect_1.1.1        
 [73] GSEABase_1.50.1          AnnotationForge_1.30.1   magrittr_2.0.1           R6_2.5.1                
 [77] generics_0.1.1           base64url_1.4            DelayedArray_0.16.3      DBI_1.1.2               
 [81] pillar_1.6.4             withr_2.4.3              survival_3.2-13          RCurl_1.98-1.5          
 [85] tibble_3.1.6             crayon_1.4.2             utf8_1.2.2               BiocFileCache_1.14.0    
 [89] jpeg_0.1-9               progress_1.2.2           locfit_1.5-9.4           grid_4.0.5              
 [93] data.table_1.14.2        blob_1.2.2               Rgraphviz_2.32.0         digest_0.6.29           
 [97] xtable_1.8-4             brew_1.0-6               openssl_1.4.6            munsell_0.5.0           
[101] askpass_1.1             
>
systemPipeR • 717 views
ADD COMMENT
0
Entering edit mode
dcassol ▴ 100
@dcassol-15717
Last seen 2.0 years ago
Riverside/CA

Hi Mohammad,

I already replied by email, but I will add the answer here too.

First, if you have single-end fastq files, you need to use the respective param files. Second, you need to have the right PATH in the targets file. For example, your targets files/table should point to the files:

 > targetspath <- "targets.txt"
> read.delim(targetspath, comment.char = "#")
#                      FileName SampleName Factor SampleLong Experiment        Date
# 1  ./data/SRR446027_1.fastq.gz        M1A     M1  Mock.1h.A          1 23-Mar-2012
# 2  ./data/SRR446028_1.fastq.gz        M1B     M1  Mock.1h.B          1 23-Mar-2012

You can double-chek if the files PATH is correct:

file.exists(targets$FileName)

then,

dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWF(targets = targetspath, wf_file = "preprocessReads/trim-se.cwl", input_file = "preprocessReads/trim-se.yml", dir_path = dir_path)
args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_"))
cmdlist(args[1])
output(args[1])

In your targets example, you are missing some columns, especially the Factor. Also, replace FileName1 to FileName.

> targets <- read.delim("targetsPE_chip.txt", comment.char = "#")
> targets
#                     FileName1 SampleName    Factor SampleLong Experiment Date SampleReference
# 1 ~/Desktop/S1_R1_001.fastq.gz      SG-9 1 23-Dec-21       WT-1         NA   NA              NA
# 2  ~/Desktop/S2_R1_001.fastq.gz    SG-10 1 23-Dec-21       KI-2         NA   NA              NA

I hope this helps you.

All the best,

Daniela

ADD COMMENT

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6