Hi,
I'm doing an RNA-Seq analysis of blood samples from 28 lyme infected patients and 13 control samples. The reads were aligned with TopHat2 and read were summarized using featureCounts. When estimating the dispersion, I get a value of 3.29, and the following BCV plot:
This looks like something went horribly wrong.
The MDS plot does show a lack of separation between the control and infected group. However, I don't think that explains the BCV plot - or could it?.
Code:
##Load gene lengths exonic.gene.sizes.df<-read.table("hg19_gene_Lengths.txt",header=TRUE) ##Load read counts from csv counts_csv<-read.csv("counts.csv", header=TRUE) ##Make into matrix ##Function to create a matrix and set the right names matrix.please<-function(x) { m<-as.matrix(x[,-1]) rownames(m)<-x[,1] m } ##Run on counts counts <- matrix.please(counts_csv) ##assign groups group<-c(rep("patient", times=28),rep("control", times=13)) ##set up DGEList RNA_data<-DGEList(counts=counts,group=group) #add in Gene Length RNA_data$genes<-exonic.gene.sizes.df # filter low expressing genes #Keeping Genes with greater than 8 count per million (cpm) in at least 13 sample keep<-rowSums(cpm(RNA_data)>8)>=13 RNA_data_filtered<-RNA_data[keep,,keep.lib.sizes=F] # normalization RNA_data_filtered<- calcNormFactors(RNA_data_filtered) RNA_data_filtered<-estimateDisp(RNA_data_filtered,design=design) sqrt(RNA_data_filtered$common.disp) ###[1] 3.285567 ##plot BCV plotBCV(RNA_data_filtered)
sessionInfo:
R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.6 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.12.4 SummarizedExperiment_1.2.3 Biobase_2.32.0 [4] GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 IRanges_2.6.1 [7] S4Vectors_0.10.3 BiocGenerics_0.18.0 edgeR_3.14.0 [10] limma_3.28.21 loaded via a namespace (and not attached): [1] genefilter_1.54.2 locfit_1.5-9.1 splines_3.3.2 lattice_0.20-34 [5] colorspace_1.3-0 htmltools_0.3.5 rtracklayer_1.32.2 GenomicFeatures_1.24.5 [9] chron_2.3-47 survival_2.40-1 XML_3.98-1.5 foreign_0.8-67 [13] DBI_0.5-1 BiocParallel_1.6.6 RColorBrewer_1.1-2 plyr_1.8.4 [17] stringr_1.1.0 zlibbioc_1.18.0 Biostrings_2.40.2 munsell_0.4.3 [21] gtable_0.2.0 labeling_0.3 latticeExtra_0.6-28 knitr_1.15 [25] geneplotter_1.50.0 biomaRt_2.28.0 AnnotationDbi_1.34.4 htmlTable_1.7 [29] Rcpp_0.12.7 acepack_1.4.1 xtable_1.8-2 scales_0.4.1 [33] Hmisc_4.0-0 annotate_1.50.1 XVector_0.12.1 Rsamtools_1.24.0 [37] gridExtra_2.2.1 ggplot2_2.2.0 digest_0.6.10 stringi_1.1.2 [41] grid_3.3.2 tools_3.3.2 bitops_1.0-6 magrittr_1.5 [45] lazyeval_0.2.0 RCurl_1.95-4.8 tibble_1.2 RSQLite_1.0.0 [49] Formula_1.2-1 cluster_2.0.5 Matrix_1.2-7.1 data.table_1.9.6 [53] assertthat_0.1 rpart_4.1-10 GenomicAlignments_1.8.4 nnet_7.3-12
The links to your plots seem to be inaccessible.
Fixed the links.