I followed the following steps...

Question

Convertion of indexed bam file to tsv file format

0

Entering edit mode

malikbashirkhan2 • 0

@malikbashirkhan2-8918

Last seen 9.1 years ago

Pakistan

I followed the following steps...

bowtie2-build Danio_rerio.Zv9.ncrna.fa zebra_virus

bowtie2 -p 4 -q --phred33 -x zebra_virus -1 file1.fastq -2 file2.fastq -S 0.sam

samtools view -bS 0.sam > 0.bam

samtools sort 0.bam 0_sorted

samtools index 0_sorted.bam

Now we have sorted indexed file. i want to convert this bam file into tsv file.

python samtools • 6.8k views

ADD COMMENT • link updated 9.0 years ago by Hervé Pagès 16k • written 9.0 years ago by malikbashirkhan2 • 0

0

Entering edit mode

sir my problem is that i have .BAI file and i wants to convert this into TSV file format.

for this we need a python script and i have that python script.

now plz sir help me how to convert this file into tsv.

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

2

Entering edit mode

Like Aaron told you, this site is for R bioconductor support, not for analysis with Python or Bash. If you need help with your Python scripts or with Bash use other sites such as https://www.biostars.org/

ADD REPLY • link 9.0 years ago b.nota ▴ 370

score 0 · Answer 1 · 2015-10-12

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 7 hours ago

The city by the bay

Depending on what fields you want to extract, you could do:

require(Rsamtools)
all.fields <- scanBam("0_sorted.bam")[[1]]

Running names(all.fields) should give you:

 [1] "qname"  "flag"   "rname"  "strand" "pos"    "qwidth" "mapq"   "cigar"
 [9] "mrnm"   "mpos"   "isize"  "seq"    "qual"

You can then put any number of these into a *.tsv file, e.g.:

write.table(file="out.tsv", data.frame(all.fields[c("qname", "flag", "rname", "pos")]),
    quote=FALSE, sep="\t", row.names=FALSE)

Be aware that some of the fields may need coercion to a simpler data type, e.g., as.character(all.fields$seq). Also, if the file is large, you should look at restricting the number of fields you extract with the what argument in ScanBamParam; or, limiting yourself to extracting only one chromosome at a time with the which argument.

ADD COMMENT • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for the detailed reply i will try it

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

i tried but it gives syntax error

bash: syntax error near unexpected token `('

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

1

Entering edit mode

This is the Bioconductor support site, and these are R commands. You should be running them at the R prompt, after installing Rsamtools. If you want a non-R answer, then this forum is not the right place to ask.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

sir i installed the Rsamtools Package again and i had run them at the R prompt.

At R prompt it gives this error.

bash: syntax error near unexpected token `('

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

sir i tried these commands in another way it runs but it hangs my system for unlimited time then i restarts my system. i tried it multiple times but it hangs the system all the time.

plz sir help me what to do.

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

bioinformatics@bioinformatics-desktop:~/bowtie2/SRR630463$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(Rsamtools)
Loading required package: Rsamtools
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
    is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
    pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
    rownames, sapply, setdiff, sort, table, tapply, union, unique,
    unlist, unsplit

Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: XVector
Loading required package: Biostrings
> all.fields <- scanBam("0_sorted.bam")[[1]]

after this the system will become hang and it will become hang for unlimited time

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

Your machine probably doesn't have enough memory to read the entire BAM file in at once. Try using that what or which arguments as I've described in my original answer, to restrict the amount of data it loads in at a time.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

sir i followed your instruction and limited the parameters and perform the following steps

p2 <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))

then run this command

res2 <- scanBam("0_sorted.bam", param=p2)

then i wrote this command

write.table(file="out.tsv",data.frame(c("rname","strand", "pos","qwidth")),quote=FALSE,sep="\t", row.names=FALSE)

but the out.tsv has no values..correct me if i did something wrong

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

actually we want to tabulate the count of genes ..which parameter will do it ?

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

1

Entering edit mode

AFAIK the "count of genes" is never in the BAM/SAM file, even after you've sorted/indexed the file. So you're on the wrong path if you expect to get this information by converting the file to tsv. More generally speaking opening a BAM/SAM file in Excel and perform analysis on it won't get you anywhere, trust me.

To get the read counts for each gene you can use the featureCounts() function from the Rsubread package as Aaron suggested, or the summarizeOverlaps() function from the GenomicAlignments package, or the HTSeq tool (Python package) from Simon Anders:

http://www-huber.embl.de/HTSeq/doc/overview.html

If you have further questions about read counting, please start a new thread and be clear and concise about what you're trying to do and also tell us what you've done so far.

Thanks,

H.

ADD REPLY • link 9.0 years ago Hervé Pagès 16k

0

Entering edit mode

If you want to get read counts for each gene, I'd suggest using the featureCounts function in the Rsubread package. This function can be directly applied to the SAM/BAM files, so you don't need to do any conversion. It's also a tried-and-tested method that is preferable to manually writing your own code.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Read my original post, and you'll notice that you've made an error in your command.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Sir this is the output of feature count.How to tabulate read count through it,,Sir actually there is no concerned Professor here to help us out so may be our questions are not explained properly .kindly let me know if you didnt get what i said.i need those read count in a table form which is further used in DSEQ

[ reached getOption("max.print") -- omitted 9635 rows ]

$targets
[1] "X0_sorted.bam"

$stat
                       Status X0_sorted.bam
1                    Assigned             0
2        Unassigned_Ambiguity             0
3     Unassigned_MultiMapping             0
4       Unassigned_NoFeatures        531165
5         Unassigned_Unmapped      30268995
6   Unassigned_MappingQuality             0
7 Unassigned_FragementLength             0
8          Unassigned_Chimera             0
9        Unassigned_Secondary             0
10     Unassigned_Nonjunction             0
11       Unassigned_Duplicate             0

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

You have zero reads 'Assigned'.

If you provide your code, maybe someone can see why you have zero reads assigned.

ADD REPLY • link 9.0 years ago b.nota ▴ 370

0

Entering edit mode

i run the following commands

ann <- data.frame(

+ GeneID=c("gene1","gene1","gene2","gene2"),

+ Chr="chr_dummy",

+ Start=c(100,1000,3000,5000),

+ End=c(500,1800,4000,5500),

+ Strand=c("+","+","-","-"),

+ stringsAsFactors=FALSE)

> fc_SE <- featureCounts("0_sorted.bam",annot.ext=ann)

> fc_SE

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0

0

Entering edit mode

With this code you summarized the reads which are mapped to 4 genes named "gene1" (2 x) and "gene2" (2 x) which are located on chromosome "chr_dummy".

Why do you make this data.frame called ann?

You need to get the right gtf file from UCSC or ENSEMBL instead of your "dummy" ann data.frame.

ADD REPLY • link 9.0 years ago b.nota ▴ 370

0

Entering edit mode

It looks like the vast majority of your reads are unmapped. You should check that you've aligned your reads to the correct genome build. If you haven't, then you should realign your reads - see the Rsubread documentation for more details. However, if the genome is nominally correct, then the only conclusion I can make is that you've got contaminating DNA in your sample. This is indicative of poor-quality data that cannot be rescued at the analysis stage - the best you could do is to map the foreign DNA, but that's not of interest.

ADD REPLY • link 9.0 years ago Aaron Lun ★ 28k

score 0 · Answer 2 · 2015-10-16

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi,

Maybe I'm missing something but isn't SAM a TAB-delimited text format? So converting BAM to tsv should just be a matter of going back to SAM:

samtools view -h -o 0_sorted.sam 0_sorted.bam

Right? Or is there something else to it that you didn't tell us?

Cheers,

H.

PS: I found the answer to "How Can I Convert Bam To Sam" on Biostars: https://www.biostars.org/p/1701/

ADD COMMENT • link 9.0 years ago Hervé Pagès 16k

1

Entering edit mode

Just as a heads-up, the original poster has posted the same question repeatedly on both Biostars and Bioconductor and has been told what you have just told him several times, see:

How to convert a bam file into tsv file format?

https://www.biostars.org/p/160176/

https://www.biostars.org/p/160012/

and also

How to convert bia file to tsv file

ADD REPLY • link 9.0 years ago Gordon Smyth 51k

0

Entering edit mode

sir we have tha sam file but we need tsv file so that we can open it using excel and perform analysis on it ..

ADD REPLY • link 9.0 years ago malikbashirkhan2 • 0