Devel version of easyRNASeq: using the simpleRNASeq method gives an error

0

Entering edit mode

Nicolas Delhomme ▴ 320

@nicolas-delhomme-6252

Last seen 7.1 years ago

Sweden

Hej Sylvain! I just brought it back to the list so that everyone benefits from the answer :-) ; see inline. --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 5478 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On 27 Mar 2014, at 15:24, Sylvain Foisy Ph. D. <sylvain.foisy at="" diploide.net=""> wrote: > Hi Nicolas, > > Thanks for the input ;-) I am running this: > > [1] easyRNASeq_1.9.5 OK the latest is 1.9.7 but that?s not going to solve your problem yet? hopefully by the end of the week, I?ll have fixed the simpleRNASeq function. > > I actually started to use the R-devel branch to try to see if I could overcome a problem I am having with the current version (1.8.6). Actually 1.8.7 is the latest and fixes a bug that addresses your last question below. > I realized that my first attempts at DE analysis were using tophat2 alignments unfiltered for badly mapped reads. That?s a good idea since easyRNASeq was not paying attention to that previously apart from mentioning it in the vignette. The new function which you tried is meant to fix this among other things. > I therefore used samtools to keep only the properly paired reads (-f 2) and tried to run easyRNASeq again on the new BAM files. Whenever I used ?exons" or "transcripts" as counts, it would work ok but I would get lots (about half of my BAM files) of these: > > 20: In fetchCoverage(rnaSeq, format = format, filename = filename, ... : > You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. > 21: In fetchCoverage(rnaSeq, format = format, filename = filename, ... : > The read length stored in the object (probably provided as argument): 100 > is not the same as the ones: 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 7 > 0, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50 determined from the file: /shares/new- data/htseq_datastore/RNASeq_dat > aster/123456_CELLTYPE_accepted_hits_properly_paired.bam > > For the UCSC warning, I do believe that it stands from the fact that I am using the genes.gtf file from Illumina's iGenome archive for the NCBI annotation source. Yes. This is also a remnant of early coding, when RNA-Seq analyses were not so standardised and that?s also going to disappear in the new function. > The second one leaves me clueless :-( Again, a remnant of the time when aligner could not deal with variable length reads. easyRNASeq used to check for the read size as a sanity check, which he does not anymore obviously, but the warning stayed. I have to admit that I simply got used to see that warning but could have removed it eons ago. It will be gone in the next function also. > > if I tried using "genes" and summeraization=?geneModels?, i would get these but I am also unable to write the count data to file, complaining that the count.table object cannot be written to file using the write.table function? Try 1.8.7 then. And instead of the genes / geneModels paradigm, have a look at the section 7.1 of the vignettes for a more accurate (IMO) and faster approach at looking for geneModels (what I now call "synthetic transcripts? rather than "gene models" as that last one was confusing because of its many meanings, e.g. if you think of a genic locus in an assembly project. > All of these were not observed on the unfiltered data, which flags me as something that samtools did? This looks more like some quality-based read trimming was done on the data rather than having anything to do with samtools. Anyway, using 1.8.7 should help getting your count.table written out properly. Let me know if not. Cheers, Nico > > Any ideas? > > Best regards > > S > > On Mar 27, 2014, at 10:01 AM, Nicolas Delhomme <delhomme at="" embl.de=""> wrote: > >> Hej Sylvain! >> >> Indeed that function is under very active development and I?ve broken it times and again :-) >> >> I would suggest you to keep using the easyRNASeq function until the next Bioc release (planned in ~10 days). I?m rushing to get the simpleRNASeq in the next release as bug-free as possible and I expect it to be fairly unstable by then. >> >> Anyway, can you tell me which easyRNASeq version you are using (run sessionInfo() or package.version(?easyRNASeq?) in your R session)? Just so that I make sure I?ve got that bug squashed already. >> >> Thanks, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On 27 Mar 2014, at 14:40, Sylvain Foisy Ph. D. <sylvain.foisy at="" diploide.net=""> wrote: >> >>> Hi to all, >>> >>> I am trying to use the R-devel branch for easyRNASeq to get gene counts from filtered BAM files and I get the following error: >>> >>>> exp<-simpleRNASeq(bamFiles = bfl, param = rParam, nnodes = 1, verbose = TRUE) >>> Error in (function (classes, fdef, mtable) : >>> unable to find an inherited method for function ?simpleRNASeq? for signature ?"BamFileList?? >>> >>> According to the help material, simpleRNASeq should need this: >>> >>> simpleRNASeq(bamFiles = BamFileList(), >>> param = RnaSeqParam(), nnodes = 1, verbose = FALSE) >>> >>> When I check the nature of my bfl object, I get this: >>> >>>> bfl >>> BamFileList of length 103 >>> names(103): 123456_CELLTYPE_B_accepted_hits_properly_paired.bam ? >>> >>> So bfl is the right type? Any ideas or is this a bug? It?s the devel version after all ;-) >>> >>> Best regards >>> >>> S >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> Email secured by Check Point >

Annotation easyRNASeq Annotation easyRNASeq • 1.7k views

ADD COMMENT • link updated 11.8 years ago by Sylvain Foisy ▴ 80 • written 11.8 years ago by Nicolas Delhomme ▴ 320

0

Entering edit mode

Sylvain Foisy ▴ 80

@sylvain-foisy-4051

Last seen 11.3 years ago

Hi Nicolas, On Mar 27, 2014, at 10:40 AM, Nicolas Delhomme <nicolas.delhomme at="" umu.se=""> wrote: >> I realized that my first attempts at DE analysis were using tophat2 alignments unfiltered for badly mapped reads. > > That?s a good idea since easyRNASeq was not paying attention to that previously apart from mentioning it in the vignette. The new function which you tried is meant to fix this among other things. Am I to understand that easyRNASeq::simpleRNASeq will do the filtering? :-) >> >> if I tried using "genes" and summeraization=?geneModels?, i would get these but I am also unable to write the count data to file, complaining that the count.table object cannot be written to file using the write.table function? > > Try 1.8.7 then. And instead of the genes / geneModels paradigm, have a look at the section 7.1 of the vignettes for a more accurate (IMO) and faster approach at looking for geneModels (what I now call "synthetic transcripts? rather than "gene models" as that last one was confusing because of its many meanings, e.g. if you think of a genic locus in an assembly project. I actually read that section ;-) I have to admit that the process to create the synthetic transcripts looks a bit daunting? I have used BioC/R mostly with canned methods so some of the high-end operations are still bizarre to me. Well, let?s dive! > >> All of these were not observed on the unfiltered data, which flags me as something that samtools did? > > This looks more like some quality-based read trimming was done on the data rather than having anything to do with samtools. Anyway, using 1.8.7 should help getting your count.table written out properly. Let me know if not. I have used Trimmomatic to remove/trim low-quality reads and nucleotides prior to Tophat 2 alignments and that was all. I am upgrading as I write this and I?ll start the runs anew; I?ll let you know how it worked out. Thanks for the inputs Best regards S

ADD COMMENT • link 11.8 years ago Sylvain Foisy ▴ 80

0

Entering edit mode

On 27 Mar 2014, at 15:54, Sylvain Foisy Ph. D. <sylvain.foisy at="" diploide.net=""> wrote: > Hi Nicolas, > > On Mar 27, 2014, at 10:40 AM, Nicolas Delhomme <nicolas.delhomme at="" umu.se=""> wrote: > >>> I realized that my first attempts at DE analysis were using tophat2 alignments unfiltered for badly mapped reads. >> >> That?s a good idea since easyRNASeq was not paying attention to that previously apart from mentioning it in the vignette. The new function which you tried is meant to fix this among other things. > > Am I to understand that easyRNASeq::simpleRNASeq will do the filtering? :-) Well, it?s going to do what nowadays is a standard, i.e. dropping multimapping and/or ambiguously mapping reads. Adding a mapped quality cutoff should not be too difficult either though, is that what you had in mind by ?filtering?? If you meant something like what ?trimmomatic? does, then the answer is no; we?re very happy using trimmomatic :-) No need to re-invent the wheel, right ;-) > >>> >>> if I tried using "genes" and summeraization=?geneModels?, i would get these but I am also unable to write the count data to file, complaining that the count.table object cannot be written to file using the write.table function? >> >> Try 1.8.7 then. And instead of the genes / geneModels paradigm, have a look at the section 7.1 of the vignettes for a more accurate (IMO) and faster approach at looking for geneModels (what I now call "synthetic transcripts? rather than "gene models" as that last one was confusing because of its many meanings, e.g. if you think of a genic locus in an assembly project. > > I actually read that section ;-) I have to admit that the process to create the synthetic transcripts looks a bit daunting? I have used BioC/R mostly with canned methods so some of the high-end operations are still bizarre to me. Well, let?s dive! > I?ll try to ?can" that function asap ;-), but annotation format can be a challenge as they are so lenient. Luckily there are some Bioc resources for these already. >> >>> All of these were not observed on the unfiltered data, which flags me as something that samtools did? >> >> This looks more like some quality-based read trimming was done on the data rather than having anything to do with samtools. Anyway, using 1.8.7 should help getting your count.table written out properly. Let me know if not. > > I have used Trimmomatic to remove/trim low-quality reads and nucleotides prior to Tophat 2 alignments and that was all. I am upgrading as I write this and I?ll start the runs anew; I?ll let you know how it worked out. Great. Thanks for the feedback! Nico > > Thanks for the inputs > > Best regards > > S >

ADD REPLY • link 11.8 years ago Nicolas Delhomme ▴ 320

Login before adding your answer.