RinGO problem
1
0
Entering edit mode
Jianping Jin ▴ 890
@jianping-jin-1212
Last seen 9.6 years ago
Dear list, I ran "runExonerate.sh" and had the per-chromosome output files condensed into one single file (see below, I read in the file just for viewing): > str(allchranno) 'data.frame': 390388 obs. of 6 variables: $ SEQ_ID : Factor w/ 388250 levels "chr1:10000013-10000070",..: 31687 28404 33240 34681 29011 26915 ... $ PROBE_ID : Factor w/ 373478 levels "5313_0001_0001",..: 138251 325230 15265 268671 45500 270116 ... $ CHROMOSOME: Factor w/ 21 levels "1","10","11",..: 2 2 2 2 2 2 2 2 2 2 ... $ POSITION : int 75573476 4540877 79390517 80647222 5734395 30338873 82085749 7386228 61247293 ... $ LENGTH : int 50 50 50 60 50 60 50 50 50 52 ... $ MISMATCHES: int 0 0 0 0 0 0 0 0 0 0 ... But when I tried to map probes to the genome I got an error and warnings: probeAnno <- posToProbeAnno("C:/from_DriveD/Chip- chip/Bultman/allChromExonerateOut_scott.txt") Creating probeAnno mapping for chromosome 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 X Y Done. Error in validObject(.Object) : invalid class "probeAnno" object: FALSE In addition: Warning message: In validityMethod(object) : Some match positions end before they actually start. Please check elements 1.start and 1.end . Appreciate it if you can help! Jianping FYI: > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] Ringo_1.6.0 SparseM_0.78 RColorBrewer_1.0-2 vsn_3.8.0 affy_1.20.0 [6] limma_2.16.3 geneplotter_1.20.0 annotate_1.20.1 xtable_1.5-4 AnnotationDbi_1.4.2 [11] lattice_0.17-15 genefilter_1.22.0 survival_2.34-1 Biobase_2.2.1 loaded via a namespace (and not attached): [1] affyio_1.10.1 DBI_0.2-4 grid_2.8.0 KernSmooth_2.22-22 preprocessCore_1.4.0 [6] RSQLite_0.7-1 ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu
• 1.3k views
ADD COMMENT
0
Entering edit mode
@joern-toedling-1244
Last seen 9.6 years ago
Hi Jianping, I am not completely sure what the source of the error is yet, so bear with me as I am trying to find out. First, something could be wrong with the merged Exonerate output file. Which version of Exonerate are you using? Can you also please tell me what the output of summary(allchranno$LENGTH) and summary(allchranno$POSITION) are? Another issue could be the use of factors for probe and chromosome identifiers. When reading in the output file using read.table, read.delim etc., please try the argument "as.is=TRUE", which will prevent the conversion of character vectors into factors. You can then directly supply the data.frame allchranno to the function posToProbeAnno. probeAnno <- posToProbeAnno(allchranno) Please tell me whether the error message still persists. If so could you please provide me with the file allChromExonerateOut_scott.txt or an excerpt thereof (please do not attach it to the mail but provide it for download on some server) such that I can further look into the issue. Best regards, Joern Jianping Jin wrote: > Dear list, > > I ran "runExonerate.sh" and had the per-chromosome output files > condensed into one single file (see below, > I read in the file just for viewing): > >> str(allchranno) > 'data.frame': 390388 obs. of 6 variables: > $ SEQ_ID : Factor w/ 388250 levels "chr1:10000013-10000070",..: > 31687 28404 33240 34681 29011 26915 ... > $ PROBE_ID : Factor w/ 373478 levels "5313_0001_0001",..: 138251 > 325230 15265 268671 45500 270116 ... > $ CHROMOSOME: Factor w/ 21 levels "1","10","11",..: 2 2 2 2 2 2 2 2 2 > 2 ... > $ POSITION : int 75573476 4540877 79390517 80647222 5734395 30338873 > 82085749 7386228 61247293 ... > $ LENGTH : int 50 50 50 60 50 60 50 50 50 52 ... > $ MISMATCHES: int 0 0 0 0 0 0 0 0 0 0 ... > > But when I tried to map probes to the genome I got an error and warnings: > > probeAnno <- > posToProbeAnno("C:/from_DriveD/Chip- chip/Bultman/allChromExonerateOut_scott.txt") > > Creating probeAnno mapping for chromosome 1 10 11 12 13 14 15 16 17 18 > 19 2 3 4 5 6 7 8 9 X Y Done. > Error in validObject(.Object) : invalid class "probeAnno" object: FALSE > In addition: Warning message: > In validityMethod(object) : > Some match positions end before they actually start. > Please check elements 1.start and 1.end . > > Appreciate it if you can help! > > Jianping > -- Joern Toedling EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Phone +44(0)1223 492566 Email toedling at ebi.ac.uk
ADD COMMENT
0
Entering edit mode
Hi Joern, Please see below, --On Monday, January 12, 2009 5:44 PM +0000 Joern Toedling <toedling at="" ebi.ac.uk=""> wrote: > Hi Jianping, > > I am not completely sure what the source of the error is yet, so bear > with me as I am trying to find out. > First, something could be wrong with the merged Exonerate output file. > Which version of Exonerate are you using? exonerate-2.2.0-x86_64 > > Can you also please tell me what the output of > > summary(allchranno$LENGTH) > summary(allchranno$LENGTH) Min. 1st Qu. Median Mean 3rd Qu. Max. -72.00 50.00 50.00 50.13 51.00 75.00 > > and > > summary(allchranno$POSITION) > summary(allchranno$POSITION) Min. 1st Qu. Median Mean 3rd Qu. Max. 133400 37680000 75400000 77490000 112700000 197100000 > > are? Another issue could be the use of factors for probe and chromosome > identifiers. When reading in the output file using read.table, > read.delim etc., please try the argument "as.is=TRUE", which will > prevent the conversion of character vectors into factors. You can then > directly supply the data.frame allchranno to the function posToProbeAnno. > > probeAnno <- posToProbeAnno(allchranno) > > Please tell me whether the error message still persists. If so could you > please provide me with the file allChromExonerateOut_scott.txt > or an excerpt thereof (please do not attach it to the mail but provide > it for download on some server) such that I can further look into the > issue. > Yes. The problem is the same. I put up the data file on <http: seattle.med.unc.edu="" jjin=""/>. You can check that out. Thanks, Jianping > Best regards, > Joern > > Jianping Jin wrote: >> Dear list, >> >> I ran "runExonerate.sh" and had the per-chromosome output files >> condensed into one single file (see below, >> I read in the file just for viewing): >> >>> str(allchranno) >> 'data.frame': 390388 obs. of 6 variables: >> $ SEQ_ID : Factor w/ 388250 levels "chr1:10000013-10000070",..: >> 31687 28404 33240 34681 29011 26915 ... >> $ PROBE_ID : Factor w/ 373478 levels "5313_0001_0001",..: 138251 >> 325230 15265 268671 45500 270116 ... >> $ CHROMOSOME: Factor w/ 21 levels "1","10","11",..: 2 2 2 2 2 2 2 2 2 >> 2 ... >> $ POSITION : int 75573476 4540877 79390517 80647222 5734395 30338873 >> 82085749 7386228 61247293 ... >> $ LENGTH : int 50 50 50 60 50 60 50 50 50 52 ... >> $ MISMATCHES: int 0 0 0 0 0 0 0 0 0 0 ... >> >> But when I tried to map probes to the genome I got an error and warnings: >> >> probeAnno <- >> posToProbeAnno("C:/from_DriveD/Chip- chip/Bultman/allChromExonerateOut_sc >> ott.txt") >> >> Creating probeAnno mapping for chromosome 1 10 11 12 13 14 15 16 17 18 >> 19 2 3 4 5 6 7 8 9 X Y Done. >> Error in validObject(.Object) : invalid class "probeAnno" object: FALSE >> In addition: Warning message: >> In validityMethod(object) : >> Some match positions end before they actually start. >> Please check elements 1.start and 1.end . >> >> Appreciate it if you can help! >> >> Jianping >> > > -- > Joern Toedling > EMBL - European Bioinformatics Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge CB10 1SD > United Kingdom > Phone +44(0)1223 492566 > Email toedling at ebi.ac.uk > ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu
ADD REPLY
0
Entering edit mode
Hello, well, the culprit(s) is/are the matches with a negative entry in LENGTH, as these are not supposed to happen. I am not sure how these came about, but it might have to do with changes in Exonerate (the scripts were written for Exonerate version 2.0.0). I shall investigate this further and get back to you. But for the moment, I am afraid you will have either to discard these lines with negative length matches before calling posToProbeAnno (how many are these?) or find a way to correct them in the Exonerate output file. Regards, Joern Jianping Jin wrote: > Hi Joern, > > Please see below, > > --On Monday, January 12, 2009 5:44 PM +0000 Joern Toedling > <toedling at="" ebi.ac.uk=""> wrote: > >> Hi Jianping, >> >> I am not completely sure what the source of the error is yet, so bear >> with me as I am trying to find out. >> First, something could be wrong with the merged Exonerate output file. >> Which version of Exonerate are you using? > > exonerate-2.2.0-x86_64 >> >> Can you also please tell me what the output of >> >> summary(allchranno$LENGTH) > >> summary(allchranno$LENGTH) > Min. 1st Qu. Median Mean 3rd Qu. Max. > -72.00 50.00 50.00 50.13 51.00 75.00 -- Joern Toedling EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Phone +44(0)1223 492566 Email toedling at ebi.ac.uk
ADD REPLY
0
Entering edit mode
Hi Joern, There are 6247 probes with LENGTH <= 0 (actually <= -50). The minus sign may refer to the genome strain. I checked the demo data set which also contains minus values for LENGTH. The only difference I can tell is the format of PROBE_ID. In the demo file it is something beginning with MM, e.g. MM5000P01479955, while in my file it is something like 5313_0514_0052. Jianping --On Monday, January 12, 2009 6:46 PM +0000 Joern Toedling <toedling at="" ebi.ac.uk=""> wrote: > Hello, > > well, the culprit(s) is/are the matches with a negative entry in LENGTH, > as these are not supposed to happen. I am not sure how these came about, > but it might have to do with changes in Exonerate (the scripts were > written for Exonerate version 2.0.0). I shall investigate this further > and get back to you. But for the moment, I am afraid you will have > either to discard these lines with negative length matches before > calling posToProbeAnno (how many are these?) or find a way to correct > them in the Exonerate output file. > > Regards, > Joern > > Jianping Jin wrote: >> Hi Joern, >> >> Please see below, >> >> --On Monday, January 12, 2009 5:44 PM +0000 Joern Toedling >> <toedling at="" ebi.ac.uk=""> wrote: >> >>> Hi Jianping, >>> >>> I am not completely sure what the source of the error is yet, so bear >>> with me as I am trying to find out. >>> First, something could be wrong with the merged Exonerate output file. >>> Which version of Exonerate are you using? >> >> exonerate-2.2.0-x86_64 >>> >>> Can you also please tell me what the output of >>> >>> summary(allchranno$LENGTH) >> >>> summary(allchranno$LENGTH) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> -72.00 50.00 50.00 50.13 51.00 75.00 > > -- > Joern Toedling > EMBL - European Bioinformatics Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge CB10 1SD > United Kingdom > Phone +44(0)1223 492566 > Email toedling at ebi.ac.uk > ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu
ADD REPLY
0
Entering edit mode
Hi Jianping, the problem turned indeed out to be a change in the Exonerate output file format. Now match start coordinates can be higher than the end coordinate if the match is on the minus strand, while previously in this case the output file would only contain the indication that the reverse complement of the query. I have modified the utility Perl script "condenseExonerateOutput.pl" accordingly, and the modified script will be in the new development version of Ringo (>= 1.7.3) . This new script should resolve your problem, please let me know if it does not. The entries in PROBE_ID are the unique identifiers of the probes on the array and in the example data these were for probes designed on the assembly mm5 of the mouse genome. It's no source of concern that your probes have completely different identifiers. Best regards, Joern Jianping Jin wrote: > Hi Joern, > > There are 6247 probes with LENGTH <= 0 (actually <= -50). The minus > sign may refer to the genome strain. I checked the demo data set which > also contains minus values for LENGTH. > > The only difference I can tell is the format of PROBE_ID. In the demo > file it is something beginning with MM, e.g. MM5000P01479955, while in > my file it is something like 5313_0514_0052. > > Jianping > -- Joern Toedling EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Phone +44(0)1223 492566 Email toedling at ebi.ac.uk
ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6