Convertion of indexed bam file to tsv file format
2
0
Entering edit mode
@malikbashirkhan2-8918
Last seen 6.9 years ago
Pakistan

# I followed the following steps...

bowtie2-build Danio_rerio.Zv9.ncrna.fa zebra_virus

bowtie2 -p 4 -q --phred33 -x zebra_virus -1 file1.fastq -2 file2.fastq -S 0.sam

samtools view -bS 0.sam > 0.bam

samtools sort 0.bam 0_sorted

samtools index 0_sorted.bam

Now we have sorted indexed file. i want to convert this bam file into tsv file.

python samtools • 3.9k views
0
Entering edit mode

sir my problem is that i have .BAI file and i wants to convert this into TSV file format.

for this we need a python script and i have that python script.

now plz sir help me how to convert this file into tsv.

2
Entering edit mode

Like Aaron told you, this site is for R bioconductor support, not for analysis with Python or Bash. If you need help with your Python scripts or with Bash use other sites such as https://www.biostars.org/

0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 4 hours ago
The city by the bay

Depending on what fields you want to extract, you could do:

require(Rsamtools)
all.fields <- scanBam("0_sorted.bam")[[1]]

Running names(all.fields) should give you:

 [1] "qname"  "flag"   "rname"  "strand" "pos"    "qwidth" "mapq"   "cigar"
[9] "mrnm"   "mpos"   "isize"  "seq"    "qual"  

You can then put any number of these into a *.tsv file, e.g.:

write.table(file="out.tsv", data.frame(all.fields[c("qname", "flag", "rname", "pos")]),
quote=FALSE, sep="\t", row.names=FALSE)

Be aware that some of the fields may need coercion to a simpler data type, e.g., as.character(all.fields$seq). Also, if the file is large, you should look at restricting the number of fields you extract with the what argument in ScanBamParam; or, limiting yourself to extracting only one chromosome at a time with the which argument. ADD COMMENT 0 Entering edit mode Thanks for the detailed reply i will try it ADD REPLY 0 Entering edit mode i tried but it gives syntax error bash: syntax error near unexpected token (' ADD REPLY 1 Entering edit mode This is the Bioconductor support site, and these are R commands. You should be running them at the R prompt, after installing Rsamtools. If you want a non-R answer, then this forum is not the right place to ask. ADD REPLY 0 Entering edit mode sir i installed the Rsamtools Package again and i had run them at the R prompt. At R prompt it gives this error. bash: syntax error near unexpected token (' ADD REPLY 0 Entering edit mode sir i tried these commands in another way it runs but it hangs my system for unlimited time then i restarts my system. i tried it multiple times but it hangs the system all the time. plz sir help me what to do. ADD REPLY 0 Entering edit mode bioinformatics@bioinformatics-desktop:~/bowtie2/SRR630463$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(Rsamtools)

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
rownames, sapply, setdiff, sort, table, tapply, union, unique,
unlist, unsplit

Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
> all.fields <- scanBam("0_sorted.bam")[[1]]

after this the system will become hang and it will become hang for unlimited time

0
Entering edit mode

Your machine probably doesn't have enough memory to read the entire BAM file in at once. Try using that what or which arguments as I've described in my original answer, to restrict the amount of data it loads in at a time.

0
Entering edit mode

sir i followed your instruction and limited the parameters and perform the following steps

p2 <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))

then run this command

res2 <- scanBam("0_sorted.bam", param=p2)

then i wrote this command

write.table(file="out.tsv",data.frame(c("rname","strand", "pos","qwidth")),quote=FALSE,sep="\t", row.names=FALSE)

but the out.tsv has no values..correct me if i did something wrong

0
Entering edit mode

actually we want to tabulate the count of genes ..which parameter will do it ?

1
Entering edit mode

AFAIK the "count of genes" is never in the BAM/SAM file, even after you've sorted/indexed the file. So you're on the wrong path if you expect to get this information by converting the file to tsv. More generally speaking opening a BAM/SAM file in Excel and perform analysis on it won't get you anywhere, trust me.

To get the read counts for each gene you can use the featureCounts() function from the Rsubread package as Aaron suggested, or the summarizeOverlaps() function from the GenomicAlignments package, or the HTSeq tool (Python package) from Simon Anders:

http://www-huber.embl.de/HTSeq/doc/overview.html

If you have further questions about read counting, please start a new thread and be clear and concise about what you're trying to do and also tell us what you've done so far.

Thanks,

H.

0
Entering edit mode

If you want to get read counts for each gene, I'd suggest using the featureCounts function in the Rsubread package. This function can be directly applied to the SAM/BAM files, so you don't need to do any conversion. It's also a tried-and-tested method that is preferable to manually writing your own code.

0
Entering edit mode

Read my original post, and you'll notice that you've made an error in your command.

0
Entering edit mode

Sir this is the output of feature count.How to tabulate read count through it,,Sir actually there is no concerned Professor here to help us out so may be our questions are not explained properly .kindly let me know if you didnt get what i said.i need those read count in a table form which is further used in DSEQ

[ reached getOption("max.print") -- omitted 9635 rows ]

$targets [1] "X0_sorted.bam"$stat
Status X0_sorted.bam
1                    Assigned             0
2        Unassigned_Ambiguity             0
3     Unassigned_MultiMapping             0
4       Unassigned_NoFeatures        531165
5         Unassigned_Unmapped      30268995
6   Unassigned_MappingQuality             0
7  Unassigned_FragementLength             0
8          Unassigned_Chimera             0
9        Unassigned_Secondary             0
10     Unassigned_Nonjunction             0
11       Unassigned_Duplicate             0

0
Entering edit mode

You have zero reads 'Assigned'.

If you provide your code, maybe someone can see why you have zero reads assigned.

0
Entering edit mode

i run the following commands

ann <- data.frame(

+ GeneID=c("gene1","gene1","gene2","gene2"),

+ Chr="chr_dummy",

+ Start=c(100,1000,3000,5000),

+ End=c(500,1800,4000,5500),

+ Strand=c("+","+","-","-"),

+ stringsAsFactors=FALSE)

> fc_SE <- featureCounts("0_sorted.bam",annot.ext=ann)

> fc_SE

0
Entering edit mode

With this code you summarized the reads which are mapped to 4 genes named "gene1" (2 x) and "gene2" (2 x) which are located on chromosome "chr_dummy".

Why do you make this data.frame called ann?

You need to get the right gtf file from UCSC or ENSEMBL instead of your "dummy" ann data.frame.

0
Entering edit mode

It looks like the vast majority of your reads are unmapped. You should check that you've aligned your reads to the correct genome build. If you haven't, then you should realign your reads - see the Rsubread documentation for more details. However, if the genome is nominally correct, then the only conclusion I can make is that you've got contaminating DNA in your sample. This is indicative of poor-quality data that cannot be rescued at the analysis stage - the best you could do is to map the foreign DNA, but that's not of interest.

0
Entering edit mode
@herve-pages-1542
Last seen 3 days ago
Seattle, WA, United States

Hi,

Maybe I'm missing something but isn't SAM a TAB-delimited text format? So converting BAM to tsv should just be a matter of going back to SAM:

samtools view -h -o 0_sorted.sam 0_sorted.bam

Right? Or is there something else to it that you didn't tell us?

Cheers,

H.

PS: I found the answer to "How Can I Convert Bam To Sam" on Biostars: https://www.biostars.org/p/1701/

1
Entering edit mode

Just as a heads-up, the original poster has posted the same question repeatedly on both Biostars and Bioconductor and has been told what you have just told him several times, see:

and also

0
Entering edit mode

sir we have tha sam file but we need tsv file so that we can open it using excel and perform analysis on it ..