Question

Error in summarizeOverlaps

0

Entering edit mode

mahmudornob • 0

@mahmudornob-19533

Last seen 6.7 years ago

Dear Experts,

I am trying to find out DEGs between wild type and double knock out mutant lines of Arabidopsis. Following the available tutorial from the internet, I have written the following R script.

Everything works well initially except "summarizeOvelaps" function. After around 30 minutes of almost freezing of the desktop, it shows the following message every time.

> se=summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=FALSE, ignore.strand=FALSE)
Error in names(res) <- nms : 
  'names' attribute [2] must be the same length as the vector [1]
In addition: Warning messages:
1: In serialize(data, node$con) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

It is worth mentioning*, my two Bam files are around 2.7GB each. And my desktop is quite old, core i5 3.4 GHz with 8GB RAM only. I am posting here with the script that I am using for my analysis.

Eagerly waiting for experts' response.

##########################################################################################
############## INSTALL BIOCONDUCTOR ##############
##########################################################################################

install.packages("BiocManager")

############## Install requried packages of bioconductor for RNAseq ##############
BiocManager::install("Rsamtools", version = "3.8")
BiocManager::install("GenomicFeatures", version = "3.8")
BiocManager::install("GenomicAlignments", version = "3.8")
BiocManager::install("DESeq2", version = "3.8")
BiocManager::install("GOstats", version = "3.8")
BiocManager::install("GO.db", version = "3.8")
BiocManager::install("Category", version = "3.8")
BiocManager::install("org.At.tair.db", version = "3.8")


##########################################################################################
############## LOAD LIBRARY ##############
##########################################################################################

library(Rsamtools)

##########################################################################################
############## SET WORKING DIRECTORY AND INPUT BAM FILES ##############
##########################################################################################

setwd("D:/My RNASeq")
(bamfiles = BamFileList(dir(pattern=".bam")))


##########################################################################################
############## LOAD ANNOTATION FILES ##############
##########################################################################################
install.packages("rlang")
suppressPackageStartupMessages(library('GenomicFeatures'))


txdb = makeTxDbFromGFF("Arabidopsis.gtf", format="gtf")
(ebg = exonsBy(txdb, by="gene"))


##########################################################################################
############## Experimental Design Inforamtion ##############
##########################################################################################

expdesign = read.csv("expdesign.txt", row.names=1, sep=",")


##########################################################################################
############## Counting Reads ##############
##########################################################################################

library("GenomicAlignments")

se = summarizeOverlaps(features=ebg, 
                       reads=bamfiles, mode="Union", singleEnd=FALSE, 
                       ignore.strand=TRUE ) 

counts = assay(se) 
countgenenames = gsub("[.][1234567890]", "", row.names(counts))
rownames(counts)=countgenenames

GenomicAlignments • 1.5k views

ADD COMMENT • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by mahmudornob • 0

score 3 · Accepted Answer · 2019-04-03

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

If your BAM files are PE reads, then you should use asMates = TRUE when setting up your BamFileList. If you think you might be having memory constraints, you can also try adding yieldSize = 1e6 in order to read in the data in chunks rather than all at once.

ADD COMMENT • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

Dear J. W. MacDonald,

Thank you so much for your prompt response. I really do appreciate it.

As per your suggestion, I added asMates = TRUE and yieldSize=1000000 in my BamFileList like this ( I am posting the text here because my level of R script could be compared with "Kindergarten")

bamfiles = BamFileList(dir(pattern=".bam")), yieldSize=1000000, asMates=TRUE)

However, still following warning message appeared.

Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading

Although, I can proceed to downstream command. But seems like all of the counts become zero.

Please suggest me where I am making mistakes.

Thanks

ADD REPLY • link 6.7 years ago mahmudornob • 0