Workflow Perraudeau F1000 - Bioconductor workflow for scRNA-seq
1
0
Entering edit mode
@vivianamarin-esteban-15328
Last seen 6.3 years ago

Hi everybody,

I am learning bioinformatics. I try to run the workflow from Perradeau https://f1000research.com/articles/6-1158/v1. I have troubles already at 1st steps (Bold lines in script at the bottom) : 

The functions and warnings :

read.table(.... 

Warning EOF within quoted string

I added ,quote=""

data.frame(...

   WARNING arguments implying different number of arguments

I will appreciate any help with these points.

Thanks

Viviana

###SCRIPT

library(c(BiocParallel, clusterExperiment, scone, zinbwave, slingshot,doParallel,gam,RColorBrewer)
set.seed(20)

# Parallel comput
NCORES <- 2
mysystem = Sys.info()[["sysname"]]
if (mysystem == "Darwin"){
  registerDoParallel(NCORES)
  register(DoparParam())
}else if (mysystem == "Linux"){
  register(bpstart(MulticoreParam(workers=NCORES)))
}else{
  print("Please change this to allow parallel computing on your computer.")
  register(SerialParam())
}

#Pre-proc
data_dir <- "/Users/vivi/"
urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file=GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz",
         "https://github.com/rufletch/p63-HBC-diff/tree/master/ref/oeHBCdiff_clusterLabels.txt")

     
if(!file.exists(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda"))) { 
  download.file(urls[1], paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda.gz")) 
  R.utils::gunzip(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda.gz"))
}
if(!file.exists(paste0(data_dir, "oeHBCdiff_clusterLabels.txt"))) {
  download.file(urls[2], paste0(data_dir, "oeHBCdiff_clusterLabels.txt"))
}
load(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda"))
  

# Count mtx
E <- assayData(Cufflinks_eSet)$counts_table

# Rmv undetected genes
E <- na.omit(E)
E <- E[rowSums(E)>0,]
dim(E)
## [1] 28361   849

# Rmv ERCC and CreER genes
cre <- E["CreER",]
ercc <- E[grep("^ERCC-", rownames(E)),]
E <- E[grep("^ERCC-", rownames(E), invert = TRUE), ]
E <- E[-which(rownames(E)=="CreER"), ]
dim(E)

# Extr QC metrics
qc <- as.matrix(protocolData(Cufflinks_eSet)@data)[,c(1:5, 10:18)]
qc <- cbind(qc, CreER = cre, ERCC_reads = colSums(ercc))

# Extract metadata
batch <- droplevels(pData(Cufflinks_eSet)$MD_c1_run_id)
batch
bio <- droplevels(pData(Cufflinks_eSet)$MD_expt_condition)
bio
clusterLabels <- read.table(paste0(data_dir, "oeHBCdiff_clusterLabels.txt"),
                            sep = "\t", stringsAsFactors = FALSE,quote = "")

clusterLabels
m <- match(colnames(E), clusterLabels[, 1])

# Create metadata data.frame
metadata <- data.frame("Experiment" = bio,
                       "Batch" = batch,
                       "publishedClusters" = clusterLabels[m,2],
                       qc)

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
 [1] splines   parallel  stats4    stats     graphics  grDevices
 [7] utils     datasets  methods   base     

other attached packages:
 [1] GEOquery_2.46.15           RColorBrewer_1.1-2        
 [3] gam_1.15                   doParallel_1.0.11         
 [5] iterators_1.0.9            foreach_1.4.4             
 [7] slingshot_0.1.2-3          princurve_1.1-12          
 [9] zinbwave_1.0.0             SingleCellExperiment_1.0.0
[11] scone_1.2.0                clusterExperiment_1.4.0   
[13] SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[15] matrixStats_0.53.1         Biobase_2.38.0            
[17] GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[19] IRanges_2.12.0             S4Vectors_0.16.0          
[21] BiocGenerics_0.24.0        BiocParallel_1.12.0       
[23] R.utils_2.6.0              R.oo_1.21.0               
[25] R.methodsS3_1.7.1         

loaded via a namespace (and not attached):
  [1] copula_0.999-18          uuid_0.1-2              
  [3] aroma.light_3.8.0        NMF_0.21.0              
  [5] igraph_1.2.1             plyr_1.8.4              
  [7] lazyeval_0.2.1           pspline_1.0-18          
  [9] rncl_0.8.2               ggplot2_2.2.1           
 [11] gridBase_0.4-7           digest_0.6.15           
 [13] viridis_0.5.0            gdata_2.18.0            
 [15] magrittr_1.5             memoise_1.1.0           
 [17] cluster_2.0.6            mixtools_1.1.0          
 [19] limma_3.34.9             readr_1.1.1             
 [21] Biostrings_2.46.0        annotate_1.56.2         
 [23] bayesm_3.1-0.1           stabledist_0.7-1        
 [25] rARPACK_0.11-0           prettyunits_1.0.2       
 [27] colorspace_1.3-2         blob_1.1.0              
 [29] dplyr_0.7.4              hexbin_1.27.2           
 [31] RCurl_1.95-4.10          jsonlite_1.5            
 [33] genefilter_1.60.0        bindr_0.1.1             
 [35] phylobase_0.8.4          survival_2.41-3         
 [37] zoo_1.8-1                ape_5.0                 
 [39] glue_1.2.0               registry_0.5            
 [41] gtable_0.2.0             zlibbioc_1.24.0         
 [43] XVector_0.18.0           compositions_1.40-1     
 [45] kernlab_0.9-25           prabclus_2.2-6          
 [47] DEoptimR_1.0-8           scales_0.5.0            
 [49] DESeq_1.30.0             mvtnorm_1.0-7           
 [51] DBI_0.8                  edgeR_3.20.9            
 [53] rngtools_1.2.4           Rcpp_0.12.16            
 [55] viridisLite_0.3.0        xtable_1.8-2            
 [57] progress_1.1.2           bit_1.1-12              
 [59] bold_0.5.0               mclust_5.4              
 [61] glmnet_2.0-13            httr_1.3.1              
 [63] gplots_3.0.1             fpc_2.1-11              
 [65] modeltools_0.2-21        pkgconfig_2.0.1         
 [67] reshape_0.8.7            XML_3.98-1.10           
 [69] flexmix_2.3-14           nnet_7.3-12             
 [71] locfit_1.5-9.1           crul_0.5.2              
 [73] softImpute_1.4           howmany_0.3-1           
 [75] rlang_0.2.0              reshape2_1.4.3          
 [77] AnnotationDbi_1.40.0     munsell_0.4.3           
 [79] tools_3.4.3              RSQLite_2.0             
 [81] ade4_1.7-10              stringr_1.3.0           
 [83] bit64_0.9-7              robustbase_0.92-8       
 [85] caTools_1.17.1           purrr_0.2.4             
 [87] dendextend_1.7.0         bindrcpp_0.2            
 [89] EDASeq_2.12.0            nlme_3.1-131.1          
 [91] whisker_0.3-2            taxize_0.9.3            
 [93] xml2_1.2.0               biomaRt_2.34.2          
 [95] compiler_3.4.3           curl_3.1                
 [97] tibble_1.4.2             geneplotter_1.56.0      
 [99] pcaPP_1.9-73             gsl_1.9-10.3            
[101] RNeXML_2.0.8             stringi_1.1.7           
[103] GenomicFeatures_1.30.3   RSpectra_0.12-0         
[105] lattice_0.20-35          trimcluster_0.1-2       
[107] Matrix_1.2-12            tensorA_0.36            
[109] pillar_1.2.1             ADGofTest_0.3           
[111] data.table_1.10.4-3      bitops_1.0-6            
[113] rtracklayer_1.38.3       R6_2.2.2                
[115] latticeExtra_0.6-28      hwriter_1.3.2           
[117] RMySQL_0.10.14           ShortRead_1.36.1        
[119] KernSmooth_2.23-15       gridExtra_2.3           
[121] codetools_0.2-15         energy_1.7-2            
[123] boot_1.3-20              MASS_7.3-49             
[125] gtools_3.5.0             assertthat_0.2.0        
[127] rhdf5_2.22.0             pkgmaker_0.22           
[129] RUVSeq_1.12.0            GenomicAlignments_1.14.1
[131] Rsamtools_1.30.0         GenomeInfoDbData_1.0.0  
[133] locfdr_1.1-8             hms_0.4.2               
[135] diptest_0.75-7           grid_3.4.3              
[137] tidyr_0.8.0              class_7.3-14            
[139] segmented_0.5-3.0        numDeriv_2016.8-1      

 

 

 

 

 

 

 

 

data.frame read.table • 1.0k views
ADD COMMENT
0
Entering edit mode
davide risso ▴ 950
@davide-risso-5075
Last seen 4 months ago
University of Padova

Hi Viviana,

You have changed this line of the workflow:

urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file= GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz"
"https://raw.githubusercontent.com/rufletch/p63-HBC-diff/master/ref/oeHBCdiff_clusterLabels.txt")

With this line in your code:

urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file=GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz",
         "https://github.com/rufletch/p63-HBC-diff/tree/master/ref/oeHBCdiff_clusterLabels.txt")

For this reason, you are not downloading the raw text file with the cluster labels but the html page from Github.

Note that the html version of the workflow has a typo that prevents you to copy and paste this correctly (I will email F1000 to make sure they correct that). The PDF version of the workflow is fine.

ADD COMMENT

Login before adding your answer.

Traffic: 448 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6