Problem with summarizeOverlaps function
0
1
Entering edit mode
kgorczak ▴ 10
@kgorczak-8529
Last seen 6.8 years ago
Belgium

I am following RNA-seq workflow (http://www.bioconductor.org/help/workflows/rnaseqGene/) using my data and I am trying to get read counts using summarizeOverlaps function. Unfortunately, it is impossible because of this error: 

error 'validObject(.Object)': invalid class “SummarizedExperiment” object: 'rowRanges' length differs from 'assays' nrow

I am able to prepare read counts table only for one chromosome (e.g. seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1]), but when I want to do this for more chromosomes (1, 2, ..., 22, X and Y) seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1:24] I get mentioned error. 

Any idea why?

My second problem is with function register(MulticoreParam(workers = 2)) (BiocParallel package). Not always, but very often I get error 'unserialize(socklist[[n]])'. Do you know why or what I can do to avoid this error?

 

genomicalignments read counting • 4.2k views
ADD COMMENT
0
Entering edit mode

Hi,

Would be much easier to help you if you could provide a reproducible example and include the output of your sessionInfo(). Please see our Posting Guide:

http://www.bioconductor.org/help/support/posting-guide/

In addition to the above, here are  couple of things you could try that would help us diagnose the problem:

  1. Try to run your code again without parallelization (i.e. don't load the BiocParallel package) and let us know what happens.
  2. Instead of trying to select the seqlevels on the TxDb object, could you please try to select them on the GRangesList object returned by exonsBy(txdb, by="gene") instead? Let us know if that helps.

Thanks,

H.

ADD REPLY
0
Entering edit mode

I followed your ideas:

1. I didn’t load the BiocParallel package. I got the same error:

error 'validObject(.Object)': invalid class “SummarizedExperiment” object: 'rowRanges' length differs from 'assays' nrow

2. I selected chromosomes on the GRangesList object:

library("Rsamtools")

bamfile <- BamFileList("accepted_hits.bam")

seqinfo(bamfiles)

library("GenomicFeatures")

txdb <- makeTxDbFromGFF("file.gtf", format="gtf", circ_seqs=character())

ebg <- exonsBy(txdb, by="gene")

seqlevels(ebg,force=TRUE)<-seqlevels(ebg)[1:24]

seqlevels(ebg)

library("GenomicAlignments")

se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=TRUE,ignore.strand=TRUE)

I got the same error.

traceback()

14: stop(msg, ": ", errors, domain = NA)

13: validObject(.Object)

12: initialize(value, ...)

11: initialize(value, ...)

10: (function (Class, ...)

    {

        ClassDef <- getClass(Class, where = topenv(parent.frame()))

        value <- .Call(C_new_object, ClassDef)

        initialize(value, ...)

    })("SummarizedExperiment", exptData = <S4 object of class "SimpleList">,

        rowData = <S4 object of class "GRangesList">, colData = <S4 object of class "DataFrame">,

        assays = <S4 object of class "ShallowSimpleListAssays">)

9: do.call(new, c(list("SummarizedExperiment", exptData = exptData,

       rowData = rowRanges, colData = colData, assays = assays),

       extra_args))

8: do.call(new, c(list("SummarizedExperiment", exptData = exptData,

       rowData = rowRanges, colData = colData, assays = assays),

       extra_args))

7: .local(assays, ...)

6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,

       colData = colData)

5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,

       colData = colData)

4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm),

       ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd,

       fragments = fragments, param = param, preprocess.reads = preprocess.reads,

       ...)

3: .local(features, reads, mode, algorithm, ignore.strand, ...)

2: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = TRUE, ignore.strand = TRUE)

1: summarizeOverlaps(features = ebg, reads = bamfiles, mode = "Union",

       singleEnd = TRUE, ignore.strand = TRUE)

 

I have one more issue. After getting mentioned error, I tried to get sessionInfo() and I got such error:

Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :

cannot open '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'

 

traceback()

5: system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE)

4: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))

3: suppressWarnings(system(paste(which, shQuote(names[i])), intern = TRUE,

       ignore.stderr = TRUE))

2: Sys.which("uname")

1: sessionInfo()

 

Below sessionInfo() when I run this code only for chromosome Y (then summarizeOverlaps function works):

R version 3.2.2 (2015-08-14)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 15.04

 

locale:

 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C             

 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8   

 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8  

 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                

 [9] LC_ADDRESS=C               LC_TELEPHONE=C           

[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C      

 

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base    

 

other attached packages:

 [1] GenomicAlignments_1.4.1 GenomicFeatures_1.20.3  AnnotationDbi_1.30.1  

 [4] Biobase_2.28.0          Rsamtools_1.20.4        Biostrings_2.36.4     

 [7] XVector_0.8.0           GenomicRanges_1.20.6    GenomeInfoDb_1.4.2    

[10] IRanges_2.2.7           S4Vectors_0.6.4         BiocGenerics_0.14.0   

 

loaded via a namespace (and not attached):

 [1] XML_3.98-1.3         bitops_1.0-6         futile.options_1.0.0

 [4] DBI_0.3.1            RSQLite_1.0.0        zlibbioc_1.14.0    

 [7] futile.logger_1.4.1  lambda.r_1.1.7       BiocParallel_1.2.20

[10] tools_3.2.2          biomaRt_2.24.0       RCurl_1.95-4.7     

[13] rtracklayer_1.28.9  

ADD REPLY
0
Entering edit mode

Mmmh... loading or not the BiocParallel package doesn't change anything, sorry (summarizeOverlaps() still uses it behind the scene). To really disable parallel evaluation you actually need to do:

library(BiocParallel)
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        ignore.strand=TRUE, BPPARAM=SerialParam())

So please try this and let us know if that helps.

Otherwise I guess it would still be very useful if you could provide a reproducible example (the code you show above is not: it won't run for me if I try to copy-and-paste it in my session because I don't have access to your BAM and GTF files).

Thanks,

H.

ADD REPLY
0
Entering edit mode

Is it possible to get your e-mail? I will send you a link to my data on dropbox.

Thank you.

ADD REPLY
0
Entering edit mode

Sure: hpages@fhcrc.org

It's no secret, it's in my profile:

Hervé Pagès

;-)

Thanks.

H.

ADD REPLY
0
Entering edit mode

I even checked your profile ;-) but unfortunately I can see there only star signs instead of your email. Anyway, thank you and you will get my invitation to dropbox soon 

ADD REPLY
0
Entering edit mode

I sent to you a link to dropbox (by accident maybe even twice). And here is the code I use:

library("Rsamtools")

bamfiles <- BamFileList("accepted_hits.bam")
seqinfo(bamfiles)

library("GenomicFeatures")

txdb <- makeTxDbFromGFF("exon_havana.gtf", format="gtf", circ_seqs=character())
columns(txdb)
seqlevels(txdb,force=TRUE)<-seqlevels(txdb)[1:24]
seqlevels(txdb)

ebg <- exonsBy(txdb, by="gene")

#seqlevels(ebg,force=TRUE)<-seqlevels(ebg)[1:24]
#seqlevels(ebg)

#g_ids<-names(ebg)
#head(select(txdb,keys=g_ids,columns="EXONNAME",keytype="GENEID"))

library("GenomicAlignments")
library("BiocParallel")
#register(SerialParam())
#register(MulticoreParam(workers = 2))

ptm=proc.time() # check how long the summarization takes
se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=TRUE,ignore.strand=TRUE,BPPARAM=SerialParam())
proc.time()-ptm

head(assay(se))

mydata <- assay(se)
write.table(mydata, "mydata.txt", sep="\t")

ADD REPLY
0
Entering edit mode

Thanks for sending this. I'll look at it and will let you know.

H.

PS: You're right, one can only see the email address in his/her own profile but not the email address in other's profiles.

ADD REPLY
0
Entering edit mode

Thank you a lot! if it's necessary, here is my e-mail: kgorczak@up.poznan.pl 

ADD REPLY
0
Entering edit mode

Hi all,

 

Can I please know whether this issue was solved? If so how? I have the same issue with summarizeOverlaps() function.

 

Thank you.

ADD REPLY
0
Entering edit mode

Hi,

Unfortunately I was never able to reproduce this. I tried to run Katarzyna's code above on the data sets she sent me but didn't get any error or warning. I tried this with BioC 3.2 (current release) and with BioC 3.1 (previous release, not supported anymore), the latter being the version Katarzyna was using when she first reported this issue.

Can you give some minimal reproducible example? Please make sure you're using the latest BioC (3.2) and that all your packages are up-to-date (run biocLite() with no arguments to update them). Maybe I'll have more luck this time...

Thanks,

H.

ADD REPLY
0
Entering edit mode

Let me first try it with BioC 3.2. I only tried it with the BioC 3.1. It might be the problem. But the thing is, I ran it in BioC 3.1 in August 2015 though. But, please let me try the upgrade first. Thank you for your time.

ADD REPLY

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6