Dear Experts,
I am trying to find out DEGs between wild type and double knock out mutant lines of Arabidopsis. Following the available tutorial from the internet, I have written the following R script.
Everything works well initially except "summarizeOvelaps" function. After around 30 minutes of almost freezing of the desktop, it shows the following message every time.
> se=summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=FALSE, ignore.strand=FALSE)
Error in names(res) <- nms :
'names' attribute [2] must be the same length as the vector [1]
In addition: Warning messages:
1: In serialize(data, node$con) :
'package:stats' may not be available when loading
2: In serialize(data, node$con) :
'package:stats' may not be available when loading
3: stop worker failed:
'clear_cluster' receive data failed:
reached elapsed time limit
It is worth mentioning*, my two Bam files are around 2.7GB each. And my desktop is quite old, core i5 3.4 GHz with 8GB RAM only. I am posting here with the script that I am using for my analysis.
Eagerly waiting for experts' response.
##########################################################################################
############## INSTALL BIOCONDUCTOR ##############
##########################################################################################
install.packages("BiocManager")
############## Install requried packages of bioconductor for RNAseq ##############
BiocManager::install("Rsamtools", version = "3.8")
BiocManager::install("GenomicFeatures", version = "3.8")
BiocManager::install("GenomicAlignments", version = "3.8")
BiocManager::install("DESeq2", version = "3.8")
BiocManager::install("GOstats", version = "3.8")
BiocManager::install("GO.db", version = "3.8")
BiocManager::install("Category", version = "3.8")
BiocManager::install("org.At.tair.db", version = "3.8")
##########################################################################################
############## LOAD LIBRARY ##############
##########################################################################################
library(Rsamtools)
##########################################################################################
############## SET WORKING DIRECTORY AND INPUT BAM FILES ##############
##########################################################################################
setwd("D:/My RNASeq")
(bamfiles = BamFileList(dir(pattern=".bam")))
##########################################################################################
############## LOAD ANNOTATION FILES ##############
##########################################################################################
install.packages("rlang")
suppressPackageStartupMessages(library('GenomicFeatures'))
txdb = makeTxDbFromGFF("Arabidopsis.gtf", format="gtf")
(ebg = exonsBy(txdb, by="gene"))
##########################################################################################
############## Experimental Design Inforamtion ##############
##########################################################################################
expdesign = read.csv("expdesign.txt", row.names=1, sep=",")
##########################################################################################
############## Counting Reads ##############
##########################################################################################
library("GenomicAlignments")
se = summarizeOverlaps(features=ebg,
reads=bamfiles, mode="Union", singleEnd=FALSE,
ignore.strand=TRUE )
counts = assay(se)
countgenenames = gsub("[.][1234567890]", "", row.names(counts))
rownames(counts)=countgenenames
Dear J. W. MacDonald,
Thank you so much for your prompt response. I really do appreciate it.
As per your suggestion, I added asMates = TRUE and yieldSize=1000000 in my BamFileList like this ( I am posting the text here because my level of R script could be compared with "Kindergarten")
bamfiles = BamFileList(dir(pattern=".bam")), yieldSize=1000000, asMates=TRUE)
However, still following warning message appeared.
Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading
Although, I can proceed to downstream command. But seems like all of the counts become zero.
Please suggest me where I am making mistakes.
Thanks