NucleR: processReads/solveUserSEW0

NucleR: processReads/solveUserSEW0 - Error

0

Entering edit mode

Stefanie Ververs ▴ 70

@stefanie-ververs-4940

Last seen 9.6 years ago

Hi everybody, first at all: I'm quite new to R and Bioconductor (and the whole field of Bio-Informatics). I'm trying to get some nuclesome positioning/peak calling data for further analyses. Therefor I tried the NucleR package. The underlying data (solexa reads groomed into sanger reads) is mapped with Bowtie and results in a SAM-file. I converted that file to BAM because BAM-files can be imported with ShortRead into a format, that NucleR can handle. alignedReads <- readAligned(dir, pattern=filename, type="BAM") The imports seems to run without errors. My script (I'm using a file containing the R script, started from command line with R --slave --vanilla --file=<r-file> --args <inputfile> <more arguments=""> <outputfile>) stops at this line: reads_pair = processReads(nucleosome_htseq, type="paired", fragmentLen=fragment_len) with the following error: / Fehler in solveUserSEW0(start = start, end = end, width = width) : solving row 16512893: range cannot be determined from the supplied arguments (too many NAs) Calls: processReads ... RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/ fragmentLen is 200 in my example. The problem is, that it's impossible to view row /16512893. /I read both help texts for processReads and solveUserSWE0, but I think I do not knew enough that it will help me much. So, maybe someone can give me a hint to the right way or has any idea what maybe wrong... Thanks in advance! Regards, Steffi [[alternative HTML version deleted]]

IRanges ShortRead nucleR IRanges ShortRead nucleR • 2.3k views

ADD COMMENT • link 12.5 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

Oscar Flores ▴ 50

@oscar-flores-4943

Last seen 9.6 years ago

Hi Stefanie, I'm the developer of nucleR, so let's see if I can help you ;) After the processing, processReads converts the input data to a RangedData object for a easier manipulation later, so this error is occurs at the last step of the call, but data can be messed in previous steps. It's hard to tell what is happening without having a look to the input data, which I guess is huge... I would like to have a look to the raw data, but I know it is difficult to send it if it's not in a public repository. Maybe you can contact me directly about that (oflores at mmb.pcb.ub.es)... Meanwhile, if you want to try a workaround, you can directly convert the imported reads to RangedData format (which is the other format supported by processReads): (being "ar" your imported AlignedReads object) res = RangedData(IRanges(start=position(ar),width=width(ar)), strand=strand(ar),space=ar at chromosome) reads_pair = processReads(res, type="paired", fragmentLen=fragment_len) This should work, but will be nice to have a look to your data to fix a possible problem in the AlignedReads method. Regards, Oscar

ADD COMMENT • link 12.5 years ago Oscar Flores ▴ 50

0

Entering edit mode

Hi Oscar, thanks for your quick answer - I think I would have contacted you, if there were no answers on the bioconductor-mailinglist. I just tried the workaround as you suggested, but I got the same error again: Fehler in solveUserSEW0(start = start, end = end, width = width) : solving row 16512893: range cannot be determined from the supplied arguments (too many NAs) Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call I'll think about how to show you the data (it's hosted and processed with Galaxy, so it might be possible to share it.) Regards, Steffi On 03.11.2011 13:16, Oscar Flores wrote: > Hi Stefanie, > > I'm the developer of nucleR, so let's see if I can help you ;) > > After the processing, processReads converts the input data to a RangedData > object for a easier manipulation later, so this error is occurs at the last > step of the call, but data can be messed in previous steps. It's > hard to tell what is happening without having a look to the input data, > which I guess is huge... > > I would like to have a look to the raw data, but I know it is difficult > to send it if it's not in a public repository. Maybe you can contact me > directly about that (oflores at mmb.pcb.ub.es)... > > Meanwhile, if you want to try a workaround, you can directly convert the > imported reads to RangedData format (which is the other format supported > by processReads): > > (being "ar" your imported AlignedReads object) > > res = RangedData(IRanges(start=position(ar),width=width(ar)), > strand=strand(ar),space=ar at chromosome) > > reads_pair = processReads(res, type="paired", fragmentLen=fragment_len) > > This should work, but will be nice to have a look to your data > to fix a possible problem in the AlignedReads method. > > Regards, > > Oscar > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.5 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

So this error happens here, no? res = RangedData(IRanges(start=position(ar),width=width(ar)), strand=strand(ar),space=ar at chromosome) If this is the case the problem is not in nucleR, maybe there are some rows in a strange format in the AlignedRead (could be due the multiple format changes) that may avoid the conversion to the RangedData. Maybe I can detect them and skip those cases, but I would need to see what's happening in that odd case. Let me know if there's something I can do. Regards, Oscar El 03/11/2011 13:58, Stefanie Ververs escribi?: > Hi Oscar, > > thanks for your quick answer - I think I would have contacted you, if > there were no answers on the bioconductor-mailinglist. > > I just tried the workaround as you suggested, but I got the same error > again: > Fehler in solveUserSEW0(start = start, end = end, width = width) : > solving row 16512893: range cannot be determined from the supplied > arguments (too many NAs) > Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call > > I'll think about how to show you the data (it's hosted and processed > with Galaxy, so it might be possible to share it.) > > Regards, > > Steffi > > On 03.11.2011 13:16, Oscar Flores wrote: >> Hi Stefanie, >> >> I'm the developer of nucleR, so let's see if I can help you ;) >> >> After the processing, processReads converts the input data to a >> RangedData >> object for a easier manipulation later, so this error is occurs at >> the last >> step of the call, but data can be messed in previous steps. It's >> hard to tell what is happening without having a look to the input data, >> which I guess is huge... >> >> I would like to have a look to the raw data, but I know it is difficult >> to send it if it's not in a public repository. Maybe you can contact me >> directly about that (oflores at mmb.pcb.ub.es)... >> >> Meanwhile, if you want to try a workaround, you can directly convert the >> imported reads to RangedData format (which is the other format supported >> by processReads): >> >> (being "ar" your imported AlignedReads object) >> >> res = RangedData(IRanges(start=position(ar),width=width(ar)), >> strand=strand(ar),space=ar at chromosome) >> >> reads_pair = processReads(res, type="paired", fragmentLen=fragment_len) >> >> This should work, but will be nice to have a look to your data >> to fix a possible problem in the AlignedReads method. >> >> Regards, >> >> Oscar >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Oscar Flores Computer Engineer - PhD Student Molecular Modelling& Bioinformatics Group Institute for Research in Biomedicine (IRB Barcelona) Parc Cient?fic de Barcelona

ADD REPLY • link 12.5 years ago Oscar Flores ▴ 50

0

Entering edit mode

On 11/03/2011 06:14 AM, Oscar Flores wrote: > So this error happens here, no? > > res = RangedData(IRanges(start=position(ar),width=width(ar)), > strand=strand(ar),space=ar at chromosome) better to use the accessor chromosome(ar). The error > Fehler in solveUserSEW0(start = start, end = end, width = width) : > solving row 16512893: range cannot be determined from the supplied > arguments (too many NAs) suggests that position(ar)[16512893] and / or width(ar)[16512893] is NA. You could filter these out, e.g., ar[!is.na(position(ar)) & !is.na(position(ar))] or identify why these are read in in the first place using the 'param' argument as described on ?readAligned. Martin > > If this is the case the problem is not in nucleR, maybe there are some > rows in a strange format in the AlignedRead (could be due the multiple > format changes) that may avoid the conversion to the RangedData. Maybe I > can detect them and skip those cases, but I would need to see what's > happening in that odd case. > > Let me know if there's something I can do. > > Regards, > > Oscar > > > El 03/11/2011 13:58, Stefanie Ververs escribi?: >> Hi Oscar, >> >> thanks for your quick answer - I think I would have contacted you, if >> there were no answers on the bioconductor-mailinglist. >> >> I just tried the workaround as you suggested, but I got the same error >> again: >> Fehler in solveUserSEW0(start = start, end = end, width = width) : >> solving row 16512893: range cannot be determined from the supplied >> arguments (too many NAs) >> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call >> >> I'll think about how to show you the data (it's hosted and processed >> with Galaxy, so it might be possible to share it.) >> >> Regards, >> >> Steffi >> >> On 03.11.2011 13:16, Oscar Flores wrote: >>> Hi Stefanie, >>> >>> I'm the developer of nucleR, so let's see if I can help you ;) >>> >>> After the processing, processReads converts the input data to a >>> RangedData >>> object for a easier manipulation later, so this error is occurs at >>> the last >>> step of the call, but data can be messed in previous steps. It's >>> hard to tell what is happening without having a look to the input data, >>> which I guess is huge... >>> >>> I would like to have a look to the raw data, but I know it is difficult >>> to send it if it's not in a public repository. Maybe you can contact me >>> directly about that (oflores at mmb.pcb.ub.es)... >>> >>> Meanwhile, if you want to try a workaround, you can directly convert the >>> imported reads to RangedData format (which is the other format supported >>> by processReads): >>> >>> (being "ar" your imported AlignedReads object) >>> >>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>> strand=strand(ar),space=ar at chromosome) >>> >>> reads_pair = processReads(res, type="paired", fragmentLen=fragment_len) >>> >>> This should work, but will be nice to have a look to your data >>> to fix a possible problem in the AlignedReads method. >>> >>> Regards, >>> >>> Oscar >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 12.5 years ago Martin Morgan 25k

0

Entering edit mode

Stefanie Ververs ▴ 70

@stefanie-ververs-4940

Last seen 9.6 years ago

I got some new information, recently: A statistical report on the converted SAM-file says: 16512892 in total 0 QC failure 0 duplicates 16512892 mapped (100.00%) 16512892 paired in sequencing 8256446 read1 8256446 read2 16512892 properly paired (100.00%) 16512892 with itself and mate mapped 0 singletons (0.00%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) Seems, as if line 16512893 is the last one or somehow corrupt. Additionally, I need to use some "cleaning" tools on our data, so I'll do that and try to run NucleR/RangedData later again. Thanks for your help so far! Steffi On 03.11.2011 14:39, Martin Morgan wrote: > On 11/03/2011 06:14 AM, Oscar Flores wrote: >> So this error happens here, no? >> >> res = RangedData(IRanges(start=position(ar),width=width(ar)), >> strand=strand(ar),space=ar at chromosome) > > better to use the accessor chromosome(ar). The error > > > Fehler in solveUserSEW0(start = start, end = end, width = width) : > > solving row 16512893: range cannot be determined from the supplied > > arguments (too many NAs) > > suggests that position(ar)[16512893] and / or width(ar)[16512893] is > NA. You could filter these out, e.g., ar[!is.na(position(ar)) & > !is.na(position(ar))] or identify why these are read in in the first > place using the 'param' argument as described on ?readAligned. > > Martin > >> >> If this is the case the problem is not in nucleR, maybe there are some >> rows in a strange format in the AlignedRead (could be due the multiple >> format changes) that may avoid the conversion to the RangedData. Maybe I >> can detect them and skip those cases, but I would need to see what's >> happening in that odd case. >> >> Let me know if there's something I can do. >> >> Regards, >> >> Oscar >> >> >> El 03/11/2011 13:58, Stefanie Ververs escribi?: >>> Hi Oscar, >>> >>> thanks for your quick answer - I think I would have contacted you, if >>> there were no answers on the bioconductor-mailinglist. >>> >>> I just tried the workaround as you suggested, but I got the same error >>> again: >>> Fehler in solveUserSEW0(start = start, end = end, width = width) : >>> solving row 16512893: range cannot be determined from the supplied >>> arguments (too many NAs) >>> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call >>> >>> I'll think about how to show you the data (it's hosted and processed >>> with Galaxy, so it might be possible to share it.) >>> >>> Regards, >>> >>> Steffi >>> >>> On 03.11.2011 13:16, Oscar Flores wrote: >>>> Hi Stefanie, >>>> >>>> I'm the developer of nucleR, so let's see if I can help you ;) >>>> >>>> After the processing, processReads converts the input data to a >>>> RangedData >>>> object for a easier manipulation later, so this error is occurs at >>>> the last >>>> step of the call, but data can be messed in previous steps. It's >>>> hard to tell what is happening without having a look to the input >>>> data, >>>> which I guess is huge... >>>> >>>> I would like to have a look to the raw data, but I know it is >>>> difficult >>>> to send it if it's not in a public repository. Maybe you can >>>> contact me >>>> directly about that (oflores at mmb.pcb.ub.es)... >>>> >>>> Meanwhile, if you want to try a workaround, you can directly >>>> convert the >>>> imported reads to RangedData format (which is the other format >>>> supported >>>> by processReads): >>>> >>>> (being "ar" your imported AlignedReads object) >>>> >>>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>>> strand=strand(ar),space=ar at chromosome) >>>> >>>> reads_pair = processReads(res, type="paired", >>>> fragmentLen=fragment_len) >>>> >>>> This should work, but will be nice to have a look to your data >>>> to fix a possible problem in the AlignedReads method. >>>> >>>> Regards, >>>> >>>> Oscar >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >

ADD COMMENT • link 12.5 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

Stefanie Ververs ▴ 70

@stefanie-ververs-4940

Last seen 9.6 years ago

Hi Martin, I just ran my script with another dataset but got the same error (even if on a different line). I read the description of readAligned and ScanBamParam, but I still don't get how to provide this parameters to readAligned. (I see that I can create an object/instance of ScanBamParam, but then? Is it something like "alignedReads <- readAligned(dir, pattern=filename, type="BAM", param=myScanBamParamObject)? There is no example and I am *really* new to R, sorry ;) Then I would try parsing the file with more specified arguments.. I got some more information about my file/dataset: 17709538 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 120774 + 0 mapped (0.68%:-nan%) 17709538 + 0 paired in sequencing 8854769 + 0 read1 8854769 + 0 read2 120774 + 0 properly paired (0.68%:-nan%) 120774 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) And the error is (this time) in row /120775./ /(Fehler in solveUserSEW0(start = start, end = end, width = width) : solving row 120775: range cannot be determined from the supplied arguments (too many NAs) Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/) On 03.11.2011 14:39, Martin Morgan wrote: > On 11/03/2011 06:14 AM, Oscar Flores wrote: >> So this error happens here, no? >> >> res = RangedData(IRanges(start=position(ar),width=width(ar)), >> strand=strand(ar),space=ar@chromosome) > > better to use the accessor chromosome(ar). The error > > > Fehler in solveUserSEW0(start = start, end = end, width = width) : > > solving row 16512893: range cannot be determined from the supplied > > arguments (too many NAs) > > suggests that position(ar)[16512893] and / or width(ar)[16512893] is > NA. You could filter these out, e.g., ar[!is.na(position(ar)) & > !is.na(position(ar))] or identify why these are read in in the first > place using the 'param' argument as described on ?readAligned. > > Martin > >> >> If this is the case the problem is not in nucleR, maybe there are some >> rows in a strange format in the AlignedRead (could be due the multiple >> format changes) that may avoid the conversion to the RangedData. Maybe I >> can detect them and skip those cases, but I would need to see what's >> happening in that odd case. >> >> Let me know if there's something I can do. >> >> Regards, >> >> Oscar >> >> >> El 03/11/2011 13:58, Stefanie Ververs escribió: >>> Hi Oscar, >>> >>> thanks for your quick answer - I think I would have contacted you, if >>> there were no answers on the bioconductor-mailinglist. >>> >>> I just tried the workaround as you suggested, but I got the same error >>> again: >>> Fehler in solveUserSEW0(start = start, end = end, width = width) : >>> solving row 16512893: range cannot be determined from the supplied >>> arguments (too many NAs) >>> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call >>> >>> I'll think about how to show you the data (it's hosted and processed >>> with Galaxy, so it might be possible to share it.) >>> >>> Regards, >>> >>> Steffi >>> >>> On 03.11.2011 13:16, Oscar Flores wrote: >>>> Hi Stefanie, >>>> >>>> I'm the developer of nucleR, so let's see if I can help you ;) >>>> >>>> After the processing, processReads converts the input data to a >>>> RangedData >>>> object for a easier manipulation later, so this error is occurs at >>>> the last >>>> step of the call, but data can be messed in previous steps. It's >>>> hard to tell what is happening without having a look to the input >>>> data, >>>> which I guess is huge... >>>> >>>> I would like to have a look to the raw data, but I know it is >>>> difficult >>>> to send it if it's not in a public repository. Maybe you can >>>> contact me >>>> directly about that (oflores@mmb.pcb.ub.es)... >>>> >>>> Meanwhile, if you want to try a workaround, you can directly >>>> convert the >>>> imported reads to RangedData format (which is the other format >>>> supported >>>> by processReads): >>>> >>>> (being "ar" your imported AlignedReads object) >>>> >>>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>>> strand=strand(ar),space=ar@chromosome) >>>> >>>> reads_pair = processReads(res, type="paired", >>>> fragmentLen=fragment_len) >>>> >>>> This should work, but will be nice to have a look to your data >>>> to fix a possible problem in the AlignedReads method. >>>> >>>> Regards, >>>> >>>> Oscar >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.5 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

On 11/07/2011 02:53 AM, Stefanie Ververs wrote: > Hi Martin, > > I just ran my script with another dataset but got the same error (even > if on a different line). > I read the description of readAligned and ScanBamParam, but I still > don't get how to provide this parameters to readAligned. (I see that I > can create an object/instance of ScanBamParam, but then? Is it something > like "alignedReads <- readAligned(dir, pattern=filename, type="BAM", > param=myScanBamParamObject)? There is no example and I am *really* new yes, for example dir <- system.file("extdata", package="Rsamtools") param <- ScanBamParam(simpleCigar=TRUE,, reverseComplement=TRUE) aln <- readAligned(dir, "ex1.bam$", type="BAM", param=param) > to R, sorry ;) Then I would try parsing the file with more specified > arguments.. > > I got some more information about my file/dataset: > > 17709538 + 0 in total (QC-passed reads + QC-failed reads) > 0 + 0 duplicates > 120774 + 0 mapped (0.68%:-nan%) > 17709538 + 0 paired in sequencing > 8854769 + 0 read1 > 8854769 + 0 read2 > 120774 + 0 properly paired (0.68%:-nan%) > 120774 + 0 with itself and mate mapped > 0 + 0 singletons (0.00%:-nan%) > 0 + 0 with mate mapped to a different chr > 0 + 0 with mate mapped to a different chr (mapQ>=5) > > > And the error is (this time) in row /120775./ > /(Fehler in solveUserSEW0(start = start, end = end, width = width) : > solving row 120775: range cannot be determined from the supplied > arguments (too many NAs) > Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/) Again it appears to be the last record. Can you provide a simpler example? For instance, from your original message you said that you input the data as alignedReads <- readAligned(dir, pattern=filename, type="BAM") but that the error occurred at reads_pair = processReads(nucleosome_htseq, type="paired", fragmentLen=fragment_len) but there is no obvious connection between 'alignedReads' and the arguments to 'processReads'. Also, what is the output of idx = is.na(position(alignedReads)) & is.na(width(alignedReads)) sum(idx) ? If sum(idx) != 0, then try using alignedReads[!idx]. And finally, after library(ShortRead) library(nucleR) what is the result of sessionInfo() ? > > > > On 03.11.2011 14:39, Martin Morgan wrote: >> On 11/03/2011 06:14 AM, Oscar Flores wrote: >>> So this error happens here, no? >>> >>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>> strand=strand(ar),space=ar at chromosome) >> >> better to use the accessor chromosome(ar). The error >> >> > Fehler in solveUserSEW0(start = start, end = end, width = width) : >> > solving row 16512893: range cannot be determined from the supplied >> > arguments (too many NAs) >> >> suggests that position(ar)[16512893] and / or width(ar)[16512893] is >> NA. You could filter these out, e.g., ar[!is.na(position(ar)) & >> !is.na(position(ar))] or identify why these are read in in the first >> place using the 'param' argument as described on ?readAligned. >> >> Martin >> >>> >>> If this is the case the problem is not in nucleR, maybe there are some >>> rows in a strange format in the AlignedRead (could be due the multiple >>> format changes) that may avoid the conversion to the RangedData. Maybe I >>> can detect them and skip those cases, but I would need to see what's >>> happening in that odd case. >>> >>> Let me know if there's something I can do. >>> >>> Regards, >>> >>> Oscar >>> >>> >>> El 03/11/2011 13:58, Stefanie Ververs escribi?: >>>> Hi Oscar, >>>> >>>> thanks for your quick answer - I think I would have contacted you, if >>>> there were no answers on the bioconductor-mailinglist. >>>> >>>> I just tried the workaround as you suggested, but I got the same error >>>> again: >>>> Fehler in solveUserSEW0(start = start, end = end, width = width) : >>>> solving row 16512893: range cannot be determined from the supplied >>>> arguments (too many NAs) >>>> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call >>>> >>>> I'll think about how to show you the data (it's hosted and processed >>>> with Galaxy, so it might be possible to share it.) >>>> >>>> Regards, >>>> >>>> Steffi >>>> >>>> On 03.11.2011 13:16, Oscar Flores wrote: >>>>> Hi Stefanie, >>>>> >>>>> I'm the developer of nucleR, so let's see if I can help you ;) >>>>> >>>>> After the processing, processReads converts the input data to a >>>>> RangedData >>>>> object for a easier manipulation later, so this error is occurs at >>>>> the last >>>>> step of the call, but data can be messed in previous steps. It's >>>>> hard to tell what is happening without having a look to the input >>>>> data, >>>>> which I guess is huge... >>>>> >>>>> I would like to have a look to the raw data, but I know it is >>>>> difficult >>>>> to send it if it's not in a public repository. Maybe you can >>>>> contact me >>>>> directly about that (oflores at mmb.pcb.ub.es)... >>>>> >>>>> Meanwhile, if you want to try a workaround, you can directly >>>>> convert the >>>>> imported reads to RangedData format (which is the other format >>>>> supported >>>>> by processReads): >>>>> >>>>> (being "ar" your imported AlignedReads object) >>>>> >>>>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>>>> strand=strand(ar),space=ar at chromosome) >>>>> >>>>> reads_pair = processReads(res, type="paired", >>>>> fragmentLen=fragment_len) >>>>> >>>>> This should work, but will be nice to have a look to your data >>>>> to fix a possible problem in the AlignedReads method. >>>>> >>>>> Regards, >>>>> >>>>> Oscar >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >> > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 12.5 years ago Martin Morgan 25k

0

Entering edit mode

On 07.11.2011 19:52, Martin Morgan wrote: > On 11/07/2011 02:53 AM, Stefanie Ververs wrote: >> Hi Martin, >> >> I just ran my script with another dataset but got the same error (even >> if on a different line). >> I read the description of readAligned and ScanBamParam, but I still >> don't get how to provide this parameters to readAligned. (I see that I >> can create an object/instance of ScanBamParam, but then? Is it something >> like "alignedReads <- readAligned(dir, pattern=filename, type="BAM", >> param=myScanBamParamObject)? There is no example and I am *really* new > > yes, for example > > dir <- system.file("extdata", package="Rsamtools") > param <- ScanBamParam(simpleCigar=TRUE,, reverseComplement=TRUE) > aln <- readAligned(dir, "ex1.bam$", type="BAM", param=param) I tried this: /param <- ScanBamParam(simpleCigar=TRUE, reverseComplement=TRUE) alignedReads <- readAligned(dir, pattern=filename, type="BAM", param=param)/ and got a warning: /Warnmeldung: UserArgumentMismatch using 'qname' 'flag' 'rname' 'strand' 'pos' 'mapq' 'seq' 'qual' for 'bamWhat(param)' / (I did not test with more arguments by now.) > >> to R, sorry ;) Then I would try parsing the file with more specified >> arguments.. >> >> I got some more information about my file/dataset: >> >> 17709538 + 0 in total (QC-passed reads + QC-failed reads) >> 0 + 0 duplicates >> 120774 + 0 mapped (0.68%:-nan%) >> 17709538 + 0 paired in sequencing >> 8854769 + 0 read1 >> 8854769 + 0 read2 >> 120774 + 0 properly paired (0.68%:-nan%) >> 120774 + 0 with itself and mate mapped >> 0 + 0 singletons (0.00%:-nan%) >> 0 + 0 with mate mapped to a different chr >> 0 + 0 with mate mapped to a different chr (mapQ>=5) >> >> >> And the error is (this time) in row /120775./ >> /(Fehler in solveUserSEW0(start = start, end = end, width = width) : >> solving row 120775: range cannot be determined from the supplied >> arguments (too many NAs) >> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/) > > Again it appears to be the last record. Can you provide a simpler > example? For instance, from your original message you said that you > input the data as I got some smaller input file and will try to use it for NucleR, too. By now: > > alignedReads <- readAligned(dir, pattern=filename, type="BAM") > > but that the error occurred at > > reads_pair = processReads(nucleosome_htseq, type="paired", > fragmentLen=fragment_len) > > but there is no obvious connection between 'alignedReads' and the > arguments to 'processReads'. Also, what is the output of This was because of copying and changing some names, alignedReads and nucleosome_htseq are the same. (I checked and ran again, still the same error.) > > idx = is.na(position(alignedReads)) & is.na(width(alignedReads)) > sum(idx) > > ? If sum(idx) != 0, then try using alignedReads[!idx]. And finally, after sum(idx) is 0. > > library(ShortRead) > library(nucleR) > > what is the result of > > sessionInfo() > > ? /The sessionInfo-Output: R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] nucleR_0.99.4 Biobase_2.12.2 ShortRead_1.10.4 [4] Rsamtools_1.4.3 lattice_0.19-33 Biostrings_2.20.4 [7] GenomicRanges_1.4.8 IRanges_1.10.6 loaded via a namespace (and not attached): [1] grid_2.13.2 hwriter_1.3/ > >> >> >> >> On 03.11.2011 14:39, Martin Morgan wrote: >>> On 11/03/2011 06:14 AM, Oscar Flores wrote: >>>> So this error happens here, no? >>>> >>>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>>> strand=strand(ar),space=ar@chromosome) >>> >>> better to use the accessor chromosome(ar). The error >>> >>> > Fehler in solveUserSEW0(start = start, end = end, width = width) : >>> > solving row 16512893: range cannot be determined from the supplied >>> > arguments (too many NAs) >>> >>> suggests that position(ar)[16512893] and / or width(ar)[16512893] is >>> NA. You could filter these out, e.g., ar[!is.na(position(ar)) & >>> !is.na(position(ar))] or identify why these are read in in the first >>> place using the 'param' argument as described on ?readAligned. >>> >>> Martin >>> >>>> >>>> If this is the case the problem is not in nucleR, maybe there are some >>>> rows in a strange format in the AlignedRead (could be due the multiple >>>> format changes) that may avoid the conversion to the RangedData. >>>> Maybe I >>>> can detect them and skip those cases, but I would need to see what's >>>> happening in that odd case. >>>> >>>> Let me know if there's something I can do. >>>> >>>> Regards, >>>> >>>> Oscar >>>> >>>> >>>> El 03/11/2011 13:58, Stefanie Ververs escribió: >>>>> Hi Oscar, >>>>> >>>>> thanks for your quick answer - I think I would have contacted you, if >>>>> there were no answers on the bioconductor-mailinglist. >>>>> >>>>> I just tried the workaround as you suggested, but I got the same >>>>> error >>>>> again: >>>>> Fehler in solveUserSEW0(start = start, end = end, width = width) : >>>>> solving row 16512893: range cannot be determined from the supplied >>>>> arguments (too many NAs) >>>>> Calls: RangedData -> is -> IRanges -> solveUserSEW0 -> .Call >>>>> >>>>> I'll think about how to show you the data (it's hosted and processed >>>>> with Galaxy, so it might be possible to share it.) >>>>> >>>>> Regards, >>>>> >>>>> Steffi >>>>> >>>>> On 03.11.2011 13:16, Oscar Flores wrote: >>>>>> Hi Stefanie, >>>>>> >>>>>> I'm the developer of nucleR, so let's see if I can help you ;) >>>>>> >>>>>> After the processing, processReads converts the input data to a >>>>>> RangedData >>>>>> object for a easier manipulation later, so this error is occurs at >>>>>> the last >>>>>> step of the call, but data can be messed in previous steps. It's >>>>>> hard to tell what is happening without having a look to the input >>>>>> data, >>>>>> which I guess is huge... >>>>>> >>>>>> I would like to have a look to the raw data, but I know it is >>>>>> difficult >>>>>> to send it if it's not in a public repository. Maybe you can >>>>>> contact me >>>>>> directly about that (oflores@mmb.pcb.ub.es)... >>>>>> >>>>>> Meanwhile, if you want to try a workaround, you can directly >>>>>> convert the >>>>>> imported reads to RangedData format (which is the other format >>>>>> supported >>>>>> by processReads): >>>>>> >>>>>> (being "ar" your imported AlignedReads object) >>>>>> >>>>>> res = RangedData(IRanges(start=position(ar),width=width(ar)), >>>>>> strand=strand(ar),space=ar@chromosome) >>>>>> >>>>>> reads_pair = processReads(res, type="paired", >>>>>> fragmentLen=fragment_len) >>>>>> >>>>>> This should work, but will be nice to have a look to your data >>>>>> to fix a possible problem in the AlignedReads method. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Oscar >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>> >>> >> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

El 09/11/2011 13:38, Stefanie Ververs escribi?: > I got some smaller input file and will try to use it for NucleR, too. > By now: The problem you have with nucleR is in the last part of processReads, but seems it is not related a malfunction of nucleR but is due the creation of a RangedData object with <na> values. See: > RangedData(IRanges(NA)) Error in solveUserSEW0(start = start, end = end, width = width) : solving row 1: range cannot be determined from the supplied arguments (too many NAs) I don't have experience working with BAM files, but could be that you have an extra blank line at the end of your file and this creates a NA record? Which is the output of: > alignedReads <- readAligned(dir, pattern=filename, type="BAM") > tail(alignedReads) Also, FYI, if you type "options(error=recover)" you will be able to see the calling stack of the different functions when an error ocurres. You can then look at the variables in that namespace (with "ls") and take a look at them, maybe this could help...

ADD REPLY • link 12.5 years ago Oscar Flores ▴ 50

0

Entering edit mode

First: Thanks for your input and ideas. I'm some steps further now: I get peak calling-results (at least - I think so ;)). I filtered the NA-lines with "alignedReads <- alignedReads[!is.na(position(alignedReads)) & !is.na(width(alignedReads))]" (I wasn't sure if it is correct in this way, but printing the last line (tail) of my alignedReads-object doesn't show the NA-content anymore) and added "options(error=recover)" - then I'm able to go through the whole script and print the peak-results (on command line). But there are still errors - /Fehler: Unerwartete(s) 'else' in "else"/ /Fehler in solveUserSEW0(start = start, end = end, width = width) : solving row 2: range cannot be determined from the supplied arguments (too many NAs) Calls: processReads ... RangedData -> is -> IRanges -> solveUserSEW0 -> .Call/ ---> Now it seems to by line 2.. / Fehler: Unerwartete(s) '}' in "}"/ These are my peaks: /RangedData with 7944 rows and 1 value column across 1 space/ / space ranges | score/ /<factor> <iranges> | <numeric>/ /1 1 [ 54252, 54391] | 0.6153529/ /2 1 [ 54401, 54540] | 0.6046273/ /3 1 [150147, 150286] | 0.5669448/ /4 1 [150278, 150417] | 0.5664235/ /5 1 [346518, 346657] | 0.5689336/ /6 1 [346651, 346790] | 0.5701618/ /7 1 [411677, 411816] | 0.5939812/ /8 1 [411820, 411959] | 0.5840657/ /9 1 [530270, 530409] | 0.6358737/ /... ... ... ... .../ /7936 1 [246900789, 246900928] | 0.5669448/ /7937 1 [246946659, 246946798] | 0.5665375/ /7938 1 [246946789, 246946928] | 0.5650955/ /7939 1 [247125296, 247125435] | 0.5689336/ /7940 1 [247125429, 247125568] | 0.5701618/ /7941 1 [247154735, 247154874] | 0.5803391/ /7942 1 [247154849, 247154988] | 0.5904050/ /7943 1 [247160188, 247160327] | 0.5664235/ /7944 1 [247160313, 247160452] | 0.5669448/ And, at last, there is an error when exporting to bed: /Fehler in as.vector(x, "character") : cannot coerce type 'closure' to vector of type 'character' Calls: export.bed ... eval -> .Method -> as.character -> as.character.default/ Current code: args <- commandArgs() inputfile <- args[6] filename <- basename(inputfile) dir <- dirname(inputfile) outputfile <- args[10] type <- args[7] fragment_len <- as.numeric(args[8]) sel_trim <- as.numeric(args[9]) param <- ScanBamParam(simpleCigar=TRUE, reverseComplement=TRUE) alignedReads <- readAligned(dir, pattern=filename, type="BAM", param=param) alignedReads <- alignedReads[!is.na(position(alignedReads)) & !is.na(width(alignedReads))] if(type == 'p') { #Process the paired end reads, but discard those with length > 200 reads_pair = processReads(alignedReads, type="paired", fragmentLen=fragment_len) #Process the reads, but now trim each read to 40bp around the dyad reads_trim = processReads(alignedReads, type="paired", fragmentLen=fragment_len, trim=sel_trim) } else { reads_pair = processReads(alignedReads, type="single", fragmentLen=fragment_len) reads_trim = processReads(alignedReads, type="single", fragmentLen=fragment_len, trim=sel_trim) } ################################################### ### code chunk number 4: coverage ################################################### #Calculate the coverage, directly in reads per million (r.p.m) cover_pair = coverage.rpm(reads_pair) cover_trim = coverage.rpm(reads_trim) ################################################### ### code chunk number 11: fft ################################################### htseq_raw = as.vector(cover_trim[[1]]) htseq_fft = filterFFT(htseq_raw, pcKeepComp=0.02) ################################################### ### code chunk number 20: peaks2 ################################################### peaks = peakDetection(htseq_fft, threshold="25%", score=TRUE, width=140) export.bed(peaks, score=NULL, name="Peaks", description=name, filepath=outputfile, splitByChrom=FALSE) I'm still waiting for some shorter files (the ones I have cannot be imported with ShortRead) and will try to split the big files and try with less data. On 09.11.2011 15:35, Oscar Flores wrote: > El 09/11/2011 13:38, Stefanie Ververs escribió: >> I got some smaller input file and will try to use it for NucleR, too. >> By now: > > The problem you have with nucleR is in the last part of processReads, > but seems it is not related a malfunction of nucleR but is due the > creation of a RangedData object with <na> values. See: > > > RangedData(IRanges(NA)) > Error in solveUserSEW0(start = start, end = end, width = width) : > solving row 1: range cannot be determined from the supplied > arguments (too many NAs) > > I don't have experience working with BAM files, but could be that you > have an extra blank line at the end of your file and this creates a NA > record? Which is the output of: > > > alignedReads <- readAligned(dir, pattern=filename, type="BAM") > > tail(alignedReads) > > Also, FYI, if you type "options(error=recover)" you will be able to > see the calling stack of the different functions when an error > ocurres. You can then look at the variables in that namespace (with > "ls") and take a look at them, maybe this could help... > > [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Stefanie Ververs ▴ 70

0

Entering edit mode

This are general coding errors. For else issue, R expects else brackets in the same line. Try changing: } else { for }else{ And the problem with exporting the bed file, I thing that the problem is that you don't have the variable "name" defined in the description attribute export.bed(peaks, score=NULL, name="Peaks", *description=name*, filepath=outputfile, splitByChrom=FALSE) I hope this fix your problems Regards Oscar [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Oscar Flores ▴ 50

Login before adding your answer.