parameter "refSeqName" in function "getReadCountsFromBAM"
2
0
Entering edit mode
@mohammad-alkhamis-10862
Last seen 4.7 years ago
Canada/ Victoria, BC / University of Vi…

Hello everyone,

I think that the argument "refSeqName" in the function "getReadCountsFromBAM" is a bit misleading. While looking at the source code in "getReadCountsFromBAM.R", I noticed the following code line:

for (i in 1:length(refSeqName)) 

It turned out that I am able to use the function with multiple reference sequence names like this, for example:

bamDataRanges <- getReadCountsFromBAM(BAMFiles,
                                      sampleNames=paste("Sample",1:2),
                                      refSeqName=c("1", "2"),
                                      WL=100,
                                      mode="paired", 
                                      parallel=12)

which will give an output in the console as following:

Identified the following reference sequences:  1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT,GL000207.1,GL000226.1 [...]
Using 1, 2 as reference.
Using indexed BAM files.

 

Though, the documentation (quoted below) suggests that the function is only capable of handing a single reference sequence name. This is also the case with the examples that come with the cn.mops package.

"refSeqName:    Name (Names?) of the reference sequence (sequences?) that should be analyzed. The name (names?) must appear in the header of the BAM file. If it is not given the function will select the first reference sequence that appears in the header of the BAM files."

 

Thanks for accepting my comment.

cn.mops • 792 views
ADD COMMENT
2
Entering edit mode
@gunter-klambauer-5426
Last seen 7 months ago
Austria
Hello Malkhamis, Thanks for bringing this to my attention. Indeed users are often struggling with that parameter. I will make changes to the documentation as suggested. Regards, Günter On 2016-08-26 21:31, malkhamis [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User malkhamis <https: support.bioconductor.org="" u="" 10862=""/> wrote > Question: parameter "refSeqName" in function "getReadCountsFromBAM" > <https: support.bioconductor.org="" p="" 86536=""/>: > > Hello everyone, > > I think that the argument "|refSeqName|" in the function > "|getReadCountsFromBAM|" is a bit misleading. While looking at the > source code in "getReadCountsFromBAM.R", I noticed the following code > line: > > for (i in 1:length(refSeqName)) > > It turned out that I am able to use the function with multiple > reference sequence names like this, for example: > > bamDataRanges <- getReadCountsFromBAM(BAMFiles, > sampleNames=paste("Sample",1:2), > *refSeqName=c("1", "2"),* > WL=100, > mode="paired", > parallel=12) > > which will give an output in the console as following: > > /Identified the following reference sequences: > 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT,GL000207.1,GL000226.1 > [...] Using 1, 2 as reference. Using indexed BAM files./ > > Though, the documentation (quoted below) suggests that the function is > only capable of handing a single reference sequence name. This is also > the case with the examples that come with the cn.mops package. > > "refSeqName: Name *(Names?)* of the reference sequence > *(sequences?)* that should be analyzed. The name *(names?)* must > appear in the header of the BAM file. If it is not given the function > will select the first reference sequence that appears in the header of > the BAM files." > > Thanks for accepting my comment. > > ------------------------------------------------------------------------ > > Post tags: cn.mops > > You may reply via email or visit > parameter "refSeqName" in function "getReadCountsFromBAM" > -- Günter Klambauer, PhD Institute of Bioinformatics Johannes Kepler University, Linz, Austria http://www.bioinf.jku.at/people/klambauer/
ADD COMMENT
2
Entering edit mode
@gunter-klambauer-5426
Last seen 7 months ago
Austria

Just to make it visible also here... Something like the following typically works for human genomes and runs the function for chr1-22:

X <- getReadCountsFromBAM(BAMFiles,refSeqName=as.character(1:22),WL=1000, mode="paired")

If your sequence names are "chr1", "chr2" rather than "1", "2", then:

X <- getReadCountsFromBAM(BAMFiles,refSeqName=paste0("chr",1:22),WL=1000, mode="paired")

should work.

ADD COMMENT
0
Entering edit mode

Thanks a lot Gunter, that helps tremendously ! especially when running cnMOPS on our computer cluster ;)

ADD REPLY

Login before adding your answer.

Traffic: 254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6