Error in summarizeOverlaps
1
0
Entering edit mode
@mahmudornob-19533
Last seen 5.6 years ago

Dear Experts,

I am trying to find out DEGs between wild type and double knock out mutant lines of Arabidopsis. Following the available tutorial from the internet, I have written the following R script.

Everything works well initially except "summarizeOvelaps" function. After around 30 minutes of almost freezing of the desktop, it shows the following message every time.

> se=summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=FALSE, ignore.strand=FALSE)
Error in names(res) <- nms : 
  'names' attribute [2] must be the same length as the vector [1]
In addition: Warning messages:
1: In serialize(data, node$con) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

It is worth mentioning*, my two Bam files are around 2.7GB each. And my desktop is quite old, core i5 3.4 GHz with 8GB RAM only. I am posting here with the script that I am using for my analysis.

Eagerly waiting for experts' response.

##########################################################################################
############## INSTALL BIOCONDUCTOR ##############
##########################################################################################

install.packages("BiocManager")

############## Install requried packages of bioconductor for RNAseq ##############
BiocManager::install("Rsamtools", version = "3.8")
BiocManager::install("GenomicFeatures", version = "3.8")
BiocManager::install("GenomicAlignments", version = "3.8")
BiocManager::install("DESeq2", version = "3.8")
BiocManager::install("GOstats", version = "3.8")
BiocManager::install("GO.db", version = "3.8")
BiocManager::install("Category", version = "3.8")
BiocManager::install("org.At.tair.db", version = "3.8")


##########################################################################################
############## LOAD LIBRARY ##############
##########################################################################################

library(Rsamtools)

##########################################################################################
############## SET WORKING DIRECTORY AND INPUT BAM FILES ##############
##########################################################################################

setwd("D:/My RNASeq")
(bamfiles = BamFileList(dir(pattern=".bam")))


##########################################################################################
############## LOAD ANNOTATION FILES ##############
##########################################################################################
install.packages("rlang")
suppressPackageStartupMessages(library('GenomicFeatures'))


txdb = makeTxDbFromGFF("Arabidopsis.gtf", format="gtf")
(ebg = exonsBy(txdb, by="gene"))


##########################################################################################
############## Experimental Design Inforamtion ##############
##########################################################################################

expdesign = read.csv("expdesign.txt", row.names=1, sep=",")


##########################################################################################
############## Counting Reads ##############
##########################################################################################

library("GenomicAlignments")

se = summarizeOverlaps(features=ebg, 
                       reads=bamfiles, mode="Union", singleEnd=FALSE, 
                       ignore.strand=TRUE ) 

counts = assay(se) 
countgenenames = gsub("[.][1234567890]", "", row.names(counts))
rownames(counts)=countgenenames
GenomicAlignments • 1.3k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

If your BAM files are PE reads, then you should use asMates = TRUE when setting up your BamFileList. If you think you might be having memory constraints, you can also try adding yieldSize = 1e6 in order to read in the data in chunks rather than all at once.

ADD COMMENT
0
Entering edit mode

Dear J. W. MacDonald,

Thank you so much for your prompt response. I really do appreciate it.

As per your suggestion, I added asMates = TRUE and yieldSize=1000000 in my BamFileList like this ( I am posting the text here because my level of R script could be compared with "Kindergarten")

bamfiles = BamFileList(dir(pattern=".bam")), yieldSize=1000000, asMates=TRUE)

However, still following warning message appeared.

Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading

Although, I can proceed to downstream command. But seems like all of the counts become zero.

Please suggest me where I am making mistakes.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6