Question

Workflow Perraudeau F1000 - Bioconductor workflow for scRNA-seq

0

Entering edit mode

viviana.marin-esteban • 0

@vivianamarin-esteban-15328

Last seen 7.9 years ago

Hi everybody,

I am learning bioinformatics. I try to run the workflow from Perradeau https://f1000research.com/articles/6-1158/v1. I have troubles already at 1st steps (Bold lines in script at the bottom) :

The functions and warnings :

read.table(....

Warning EOF within quoted string

I added ,quote=""

data.frame(...

WARNING arguments implying different number of arguments

I will appreciate any help with these points.

Thanks

Viviana

###SCRIPT

library(c(BiocParallel, clusterExperiment, scone, zinbwave, slingshot,doParallel,gam,RColorBrewer)
set.seed(20)

# Parallel comput
NCORES <- 2
mysystem = Sys.info()[["sysname"]]
if (mysystem == "Darwin"){
registerDoParallel(NCORES)
register(DoparParam())
}else if (mysystem == "Linux"){
register(bpstart(MulticoreParam(workers=NCORES)))
}else{
print("Please change this to allow parallel computing on your computer.")
register(SerialParam())
}

#Pre-proc
data_dir <- "/Users/vivi/"
urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file=GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz",
"https://github.com/rufletch/p63-HBC-diff/tree/master/ref/oeHBCdiff_clusterLabels.txt")

if(!file.exists(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda"))) {
download.file(urls[1], paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda.gz"))
R.utils::gunzip(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda.gz"))
}
if(!file.exists(paste0(data_dir, "oeHBCdiff_clusterLabels.txt"))) {
download.file(urls[2], paste0(data_dir, "oeHBCdiff_clusterLabels.txt"))
}
load(paste0(data_dir, "GSE95601_oeHBCdiff_Cufflinks_eSet.Rda"))

# Count mtx
E <- assayData(Cufflinks_eSet)$counts_table

# Rmv undetected genes
E <- na.omit(E)
E <- E[rowSums(E)>0,]
dim(E)
## [1] 28361 849

# Rmv ERCC and CreER genes
cre <- E["CreER",]
ercc <- E[grep("^ERCC-", rownames(E)),]
E <- E[grep("^ERCC-", rownames(E), invert = TRUE), ]
E <- E[-which(rownames(E)=="CreER"), ]
dim(E)

# Extr QC metrics
qc <- as.matrix(protocolData(Cufflinks_eSet)@data)[,c(1:5, 10:18)]
qc <- cbind(qc, CreER = cre, ERCC_reads = colSums(ercc))

# Extract metadata
batch <- droplevels(pData(Cufflinks_eSet)$MD_c1_run_id)
batch
bio <- droplevels(pData(Cufflinks_eSet)$MD_expt_condition)
bio
clusterLabels <- read.table(paste0(data_dir, "oeHBCdiff_clusterLabels.txt"),
sep = "\t", stringsAsFactors = FALSE,quote = "")

clusterLabels
m <- match(colnames(E), clusterLabels[, 1])

# Create metadata data.frame
metadata <- data.frame("Experiment" = bio,
"Batch" = batch,
"publishedClusters" = clusterLabels[m,2],
qc)

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] splines parallel stats4 stats graphics grDevices
[7] utils datasets methods base

other attached packages:
[1] GEOquery_2.46.15 RColorBrewer_1.1-2
[3] gam_1.15 doParallel_1.0.11
[5] iterators_1.0.9 foreach_1.4.4
[7] slingshot_0.1.2-3 princurve_1.1-12
[9] zinbwave_1.0.0 SingleCellExperiment_1.0.0
[11] scone_1.2.0 clusterExperiment_1.4.0
[13] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[15] matrixStats_0.53.1 Biobase_2.38.0
[17] GenomicRanges_1.30.3 GenomeInfoDb_1.14.0
[19] IRanges_2.12.0 S4Vectors_0.16.0
[21] BiocGenerics_0.24.0 BiocParallel_1.12.0
[23] R.utils_2.6.0 R.oo_1.21.0
[25] R.methodsS3_1.7.1

loaded via a namespace (and not attached):
[1] copula_0.999-18 uuid_0.1-2
[3] aroma.light_3.8.0 NMF_0.21.0
[5] igraph_1.2.1 plyr_1.8.4
[7] lazyeval_0.2.1 pspline_1.0-18
[9] rncl_0.8.2 ggplot2_2.2.1
[11] gridBase_0.4-7 digest_0.6.15
[13] viridis_0.5.0 gdata_2.18.0
[15] magrittr_1.5 memoise_1.1.0
[17] cluster_2.0.6 mixtools_1.1.0
[19] limma_3.34.9 readr_1.1.1
[21] Biostrings_2.46.0 annotate_1.56.2
[23] bayesm_3.1-0.1 stabledist_0.7-1
[25] rARPACK_0.11-0 prettyunits_1.0.2
[27] colorspace_1.3-2 blob_1.1.0
[29] dplyr_0.7.4 hexbin_1.27.2
[31] RCurl_1.95-4.10 jsonlite_1.5
[33] genefilter_1.60.0 bindr_0.1.1
[35] phylobase_0.8.4 survival_2.41-3
[37] zoo_1.8-1 ape_5.0
[39] glue_1.2.0 registry_0.5
[41] gtable_0.2.0 zlibbioc_1.24.0
[43] XVector_0.18.0 compositions_1.40-1
[45] kernlab_0.9-25 prabclus_2.2-6
[47] DEoptimR_1.0-8 scales_0.5.0
[49] DESeq_1.30.0 mvtnorm_1.0-7
[51] DBI_0.8 edgeR_3.20.9
[53] rngtools_1.2.4 Rcpp_0.12.16
[55] viridisLite_0.3.0 xtable_1.8-2
[57] progress_1.1.2 bit_1.1-12
[59] bold_0.5.0 mclust_5.4
[61] glmnet_2.0-13 httr_1.3.1
[63] gplots_3.0.1 fpc_2.1-11
[65] modeltools_0.2-21 pkgconfig_2.0.1
[67] reshape_0.8.7 XML_3.98-1.10
[69] flexmix_2.3-14 nnet_7.3-12
[71] locfit_1.5-9.1 crul_0.5.2
[73] softImpute_1.4 howmany_0.3-1
[75] rlang_0.2.0 reshape2_1.4.3
[77] AnnotationDbi_1.40.0 munsell_0.4.3
[79] tools_3.4.3 RSQLite_2.0
[81] ade4_1.7-10 stringr_1.3.0
[83] bit64_0.9-7 robustbase_0.92-8
[85] caTools_1.17.1 purrr_0.2.4
[87] dendextend_1.7.0 bindrcpp_0.2
[89] EDASeq_2.12.0 nlme_3.1-131.1
[91] whisker_0.3-2 taxize_0.9.3
[93] xml2_1.2.0 biomaRt_2.34.2
[95] compiler_3.4.3 curl_3.1
[97] tibble_1.4.2 geneplotter_1.56.0
[99] pcaPP_1.9-73 gsl_1.9-10.3
[101] RNeXML_2.0.8 stringi_1.1.7
[103] GenomicFeatures_1.30.3 RSpectra_0.12-0
[105] lattice_0.20-35 trimcluster_0.1-2
[107] Matrix_1.2-12 tensorA_0.36
[109] pillar_1.2.1 ADGofTest_0.3
[111] data.table_1.10.4-3 bitops_1.0-6
[113] rtracklayer_1.38.3 R6_2.2.2
[115] latticeExtra_0.6-28 hwriter_1.3.2
[117] RMySQL_0.10.14 ShortRead_1.36.1
[119] KernSmooth_2.23-15 gridExtra_2.3
[121] codetools_0.2-15 energy_1.7-2
[123] boot_1.3-20 MASS_7.3-49
[125] gtools_3.5.0 assertthat_0.2.0
[127] rhdf5_2.22.0 pkgmaker_0.22
[129] RUVSeq_1.12.0 GenomicAlignments_1.14.1
[131] Rsamtools_1.30.0 GenomeInfoDbData_1.0.0
[133] locfdr_1.1-8 hms_0.4.2
[135] diptest_0.75-7 grid_3.4.3
[137] tidyr_0.8.0 class_7.3-14
[139] segmented_0.5-3.0 numDeriv_2016.8-1

data.frame read.table • 1.4k views

ADD COMMENT • link updated 7.9 years ago by davide risso ▴ 980 • written 7.9 years ago by viviana.marin-esteban • 0

score 0 · Answer 1 · 2018-03-30

Hi Viviana,

You have changed this line of the workflow:

urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file= GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz"
"https://raw.githubusercontent.com/rufletch/p63-HBC-diff/master/ref/oeHBCdiff_clusterLabels.txt")

With this line in your code:

urls = c("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE95601&format=file&file=GSE95601%5FoeHBCdiff%5FCufflinks%5FeSet%2ERda%2Egz",
         "https://github.com/rufletch/p63-HBC-diff/tree/master/ref/oeHBCdiff_clusterLabels.txt")

For this reason, you are not downloading the raw text file with the cluster labels but the html page from Github.

Note that the html version of the workflow has a typo that prevents you to copy and paste this correctly (I will email F1000 to make sure they correct that). The PDF version of the workflow is fine.