Search
Question: Problem with summarizeOverlaps function
1
gravatar for kgorczak
2.2 years ago by
kgorczak10
Belgium
kgorczak10 wrote:

I am following RNA-seq workflow (http://www.bioconductor.org/help/workflows/rnaseqGene/) using my data and I am trying to get read counts using summarizeOverlaps function. Unfortunately, it is impossible because of this error: 

error 'validObject(.Object)': invalid class “SummarizedExperiment” object: 'rowRanges' length differs from 'assays' nrow

I am able to prepare read counts table only for one chromosome (e.g. seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1]), but when I want to do this for more chromosomes (1, 2, ..., 22, X and Y) seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1:24] I get mentioned error. 

Any idea why?

My second problem is with function register(MulticoreParam(workers = 2)) (BiocParallel package). Not always, but very often I get error 'unserialize(socklist[[n]])'. Do you know why or what I can do to avoid this error?

 

ADD COMMENTlink written 2.2 years ago by kgorczak10

Hi,

Would be much easier to help you if you could provide a reproducible example and include the output of your sessionInfo(). Please see our Posting Guide:

http://www.bioconductor.org/help/support/posting-guide/

In addition to the above, here are  couple of things you could try that would help us diagnose the problem:

  1. Try to run your code again without parallelization (i.e. don't load the BiocParallel package) and let us know what happens.
  2. Instead of trying to select the seqlevels on the TxDb object, could you please try to select them on the GRangesList object returned by exonsBy(txdb, by="gene") instead? Let us know if that helps.

Thanks,

H.

ADD REPLYlink written 2.2 years ago by Hervé Pagès ♦♦ 13k

I followed your ideas:

1. I didn’t load the BiocParallel package. I got the same error:

error 'validObject(.Object)': invalid class “SummarizedExperiment” object: 'rowRanges' length differs from 'assays' nrow

2. I selected chromosomes on the GRangesList object:

library("Rsamtools")

bamfile <- BamFileList("accepted_hits.bam")

seqinfo(bamfiles)

library("GenomicFeatures")

txdb <- makeTxDbFromGFF("file.gtf", format="gtf", circ_seqs=character())

ebg <- exonsBy(txdb, by="gene")

seqlevels(ebg,force=TRUE)<-seqlevels(ebg)[1:24]

seqlevels(ebg)

library("GenomicAlignments")

se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=TRUE,ignore.strand=TRUE)

I got the same error.

traceback()

14: stop(msg, ": ", errors, domain = NA)

13: validObject(.Object)

12: initialize(value, ...)

11: initialize(value, ...)

10: (function (Class, ...)

    {

        ClassDef <- getClass(Class, where = topenv(parent.frame()))

        value <- .Call(C_new_object, ClassDef)

        initialize(value, ...)

    })("SummarizedExperiment", exptData = <S4 object of class "SimpleList">,

        rowData = <S4 object of class "GRangesList">, colData = <S4 object of class "DataFrame">,

        assays = <S4 object of class "ShallowSimpleListAssays">)

9: do.call(new, c(list("SummarizedExperiment", exptData = exptData,

       rowData = rowRanges, colData = colData, assays = assays),

       extra_args))

8: do.call(new, c(list("SummarizedExperiment", exptData = exptData,

       rowData = rowRanges, colData = colData, assays = assays),

       extra_args))

7: .local(assays, ...)

6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,

       colData = colData)

5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,

       colData = colData)

4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm),

       ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd,

       fragments = fragments, param = param, preprocess.reads = preprocess.reads,

       ...)

3: .local(features, reads, mode, algorithm, ignore.strand, ...)

2: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = TRUE, ignore.strand = TRUE)

1: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = TRUE, ignore.strand = TRUE)

 

I have one more issue. After getting mentioned error, I tried to get sessionInfo() and I got such error:

Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :

cannot open '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'

 

traceback()

5: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)

4: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))

3: suppressWarnings(system(paste(which, shQuote(names[i])), intern = TRUE,

       ignore.stderr = TRUE))

2: Sys.which("uname")

1: sessionInfo()

 

Below sessionInfo() when I run this code only for chromosome Y (then summarizeOverlaps function works):

R version 3.2.2 (2015-08-14)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 15.04

 

locale:

 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C             

 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8   

 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8  

 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                

 [9] LC_ADDRESS=C               LC_TELEPHONE=C           

[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C      

 

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base    

 

other attached packages:

 [1] GenomicAlignments_1.4.1 GenomicFeatures_1.20.3  AnnotationDbi_1.30.1  

 [4] Biobase_2.28.0          Rsamtools_1.20.4        Biostrings_2.36.4     

 [7] XVector_0.8.0           GenomicRanges_1.20.6    GenomeInfoDb_1.4.2    

[10] IRanges_2.2.7           S4Vectors_0.6.4         BiocGenerics_0.14.0   

 

loaded via a namespace (and not attached):

 [1] XML_3.98-1.3         bitops_1.0-6         futile.options_1.0.0

 [4] DBI_0.3.1            RSQLite_1.0.0        zlibbioc_1.14.0    

 [7] futile.logger_1.4.1  lambda.r_1.1.7       BiocParallel_1.2.20

[10] tools_3.2.2          biomaRt_2.24.0       RCurl_1.95-4.7     

[13] rtracklayer_1.28.9  

ADD REPLYlink written 2.2 years ago by kgorczak10

Mmmh... loading or not the BiocParallel package doesn't change anything, sorry (summarizeOverlaps() still uses it behind the scene). To really disable parallel evaluation you actually need to do:

library(BiocParallel)
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        ignore.strand=TRUE, BPPARAM=SerialParam())

So please try this and let us know if that helps.

Otherwise I guess it would still be very useful if you could provide a reproducible example (the code you show above is not: it won't run for me if I try to copy-and-paste it in my session because I don't have access to your BAM and GTF files).

Thanks,

H.

ADD REPLYlink written 2.2 years ago by Hervé Pagès ♦♦ 13k

Is it possible to get your e-mail? I will send you a link to my data on dropbox.

Thank you.

ADD REPLYlink written 2.2 years ago by kgorczak10

Sure: hpages@fhcrc.org

It's no secret, it's in my profile:

Hervé Pagès

;-)

Thanks.

H.

ADD REPLYlink written 2.2 years ago by Hervé Pagès ♦♦ 13k

I even checked your profile ;-) but unfortunately I can see there only star signs instead of your email. Anyway, thank you and you will get my invitation to dropbox soon 

ADD REPLYlink written 2.2 years ago by kgorczak10

I sent to you a link to dropbox (by accident maybe even twice). And here is the code I use:

library("Rsamtools")

bamfiles <- BamFileList("accepted_hits.bam")
seqinfo(bamfiles)

library("GenomicFeatures")

txdb <- makeTxDbFromGFF("exon_havana.gtf", format="gtf", circ_seqs=character())
columns(txdb)
seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1:24]
seqlevels(txdb)

ebg <- exonsBy(txdb, by="gene")

#seqlevels(ebg,force=TRUE)<-seqlevels(ebg)[1:24]
#seqlevels(ebg)

#g_ids<-names(ebg)
#head(select(txdb,keys=g_ids,columns="EXONNAME",keytype="GENEID"))

library("GenomicAlignments")
library("BiocParallel")
#register(SerialParam())
#register(MulticoreParam(workers = 2))

ptm=proc.time() # check how long the summarization takes
se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=TRUE,ignore.strand=TRUE,BPPARAM=SerialParam())
proc.time()-ptm

head(assay(se))

mydata <- assay(se)
write.table(mydata, "mydata.txt", sep="\t")

ADD REPLYlink written 2.2 years ago by kgorczak10

Thanks for sending this. I'll look at it and will let you know.

H.

PS: You're right, one can only see the email address in his/her own profile but not the email address in other's profiles.

ADD REPLYlink written 2.2 years ago by Hervé Pagès ♦♦ 13k

Thank you a lot! if it's necessary, here is my e-mail: kgorczak@up.poznan.pl 

ADD REPLYlink written 2.2 years ago by kgorczak10

Hi all,

 

Can I please know whether this issue was solved? If so how? I have the same issue with summarizeOverlaps() function.

 

Thank you.

ADD REPLYlink written 2.0 years ago by shanika.amarasinghe0

Hi,

Unfortunately I was never able to reproduce this. I tried to run Katarzyna's code above on the data sets she sent me but didn't get any error or warning. I tried this with BioC 3.2 (current release) and with BioC 3.1 (previous release, not supported anymore), the latter being the version Katarzyna was using when she first reported this issue.

Can you give some minimal reproducible example? Please make sure you're using the latest BioC (3.2) and that all your packages are up-to-date (run biocLite() with no arguments to update them). Maybe I'll have more luck this time...

Thanks,

H.

ADD REPLYlink written 2.0 years ago by Hervé Pagès ♦♦ 13k

Let me first try it with BioC 3.2. I only tried it with the BioC 3.1. It might be the problem. But the thing is, I ran it in BioC 3.1 in August 2015 though. But, please let me try the upgrade first. Thank you for your time.

ADD REPLYlink written 2.0 years ago by shanika.amarasinghe0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour