Bioconductor Forum

Hi! I have performed the transcript abundance quantification with RSEM and then I created the gene-level count matrices for use with DESeq2 by importing the...txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE) txi.rsem$length[txi.rsem$length == 0] <- 1 ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~ condition) I wa…

deseq2 tximport

updated 5.3 years ago • dequattro.concetta

very much for the valuable feedback! Yes, the function expects peak.ranges to be a RangedData with a "names" field as the name of the binding site. I will add your fix to make sure the function works when "names" field is not set. Thanks...Julie Zhu, Ph.D Research Associate Professor Program Gene Function and Expression University of Massachusetts Medical School 364 Plantation Street, Room 613 …

ChIPSeq chipseq ChIPSeq chipseq

updated 16.0 years ago • Julie Zhu

I'm trying to create a set with my .BAM file that was generated from reduced "paired end" sequencing (using restriction enzyme), I can easily run it as "single end", but when I add "paired = T" in the command, this message appears

medips R input files

updated 9.9 years ago • pertille

sample' (mc.cores = 32, mc.preschedule = FALSE) [BSmooth] smoothing done in 17656.1 sec Error in names(object) <- nm : 'names' attribute [25] must be the same length as the vector [2]</pre> Here is the session info: <pre> > sessionInfo

bsseq bsmooth

updated 8.2 years ago • ravi.tharakan

Dear all, I am facing a technical issue in miRNA. I want to retrieve the sequences of 500 duck miRNAs from ENSEMBLE database using their coordinates. For example I have a miRNA genomic location...named as "KB742382_1_145970_145992", and I want to retrieve the sequence and the name of this miRNAs. Note: duck genome on ensemble...I think two ways exist: 1. To download a fasta file from ense…

ensembl

updated 6.0 years ago • mohsamir2016

the hypothesis that genomic windows within an organisms genome might have higher read mapping abundances than the same region for a different organism. I am wondering if edgeR, or any other differential expression software...would be applicable for testing this hypothesis. I understand that read mapping and differential gene and transcript expression have different - and likely harder - challenge…

edger differential expression

updated 5.8 years ago • hollandademello

Am I missing something? Probably something major (like, say, the relationship of GC content or read length to variance)... Is the idea that features with similar sequence properties/size and abundance will have their mean-variance...better) is as follows: align with Rsubread, run subjunc and splicegrapher, and count against exon/gene/feature models: alignedToRPKM <- function(readcounts) …

Normalization GO graph Rsubread Normalization GO graph Rsubread

updated 13.7 years ago • Tim Triche

I decided to use the Dada2 basic pipeline for 16S in order to detect variants in my viral amplicon sequences. I have PCR amplified sequences from a 302 base-pair region of the viral genome. I used the DADA2 pipeline to detect...over time. I of course just skip the phylogeny step. Is there any bias towards microbiome sequencing in the pipeline that I used? At a late time-point, a clear tropism…

VariantDetection dada2 RNASeq

updated 2.2 years ago • Sara

seq data and to make heatmaps. However, I can't figure out how to label the rows in the heatmap with gene names as opposed to ensembl IDs. I've tried to add gene names/hgnc symbols to the rlog transform values using bioMart, but...255)) Is there an easy way to do this. I would greatly appreciate any advice on how to get the gene names onto the heatmaps. Thanks

deseq2 gplot

updated 10.8 years ago • alchemist4au

div class="preformatted">Hello, Can anyone tell me how to display heatmap with gene names instead of Affy probes when displaying heatmap from Affy exprsSet. I looked at the heatmap function and exprs class...but I am at a loss as how to change the name on the heatmap to Gene name instead of Affy ID. I managed to get the geneids as follows : geneid <- mget(geneNames(eset), en…

affy affy

updated 18.5 years ago • Ruppert Valentino

I want to calculate the total length of genes (including introns) from the start codon to the stop codon.  I have loaded a GFF file into R using the GenomicFeatures...package.  Using genes(txdb) I get the output below that shows the positions of the first nucleotide of the start codon and the last nucleotide...of the stop codon.  Is there a function in GenomicFeatures t…

genomicfeatures gene size gene length

updated 8.3 years ago • ehimelbl

mismatching positions between a read (e.g. in a > "GappedAlignments" object) and the reference sequence (a "BSgenome" > object)? In general, I am looking for an operation that maps the read > sequence against the reference...would make it a little bit more complicated. We'll proceed in 3 steps: (a) Extract the query sequences, clip them, and flip them if needed. (b) Ex…

Infrastructure Cancer BSgenome BSgenome Infrastructure Cancer BSgenome BSgenome

updated 12.8 years ago • Hervé Pagès

I have a table of FPKM values generated by DEseq2, and I'm trying to find out what DEseq2 uses as gene lengths when these are not supplied (I'm trying to assess to what extent my results are likely to change by supplying these...According to the manual, "feature length is calculated from the rowRanges of the dds object, if a column basepairs is not present in mcols(dds). The calculated...length i…

deseq2

updated 5.8 years ago • hasse.bossenbroek

Z70218" "L17328" "S81916" "U63332" "M77235" and I would like to find ( in a repository for ex.) the names as: [1] "hypothetical protein LOC221823" [2] "meningioma (disrupted in balanced translocation) 1" [3] "fasciculation and elongation...annotate package, and I found the way to extract almost everything, from locuslink to PMID to FASTA sequences, but the Descriptions (or names)... I've looked…

annotate annotate

updated 19.8 years ago • Giulio Di Giovanni

but I am running to the error that the data contains ambiguity characters in sequences. I used Biostrings::replaceAmbiguities() but I am not sure how to save the updated version and I don't know what to do...GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz' Content type 'application/x-gzip' length 37482399 bytes (35.7 MB) ================================================== down…

BSgenomeForge

updated 2.4 years ago • Bruno

EdgeR for some sRNA datasets I have received. I have 3 sRNA datasets, and I have calculated all abundances (just read counts) of every sequence in each dataset. Unfortunately, there are no replicates. The goal is to find specific...sRNA sequences that are higher in abundance in dataset1 and dataset2 compared to dataset3. As there are no replicates, I understand...with confidence can be done on t…

Normalization edgeR Normalization edgeR

updated 12.5 years ago • Kenlee Nakasugi

How could one produce a set of faceted-by-gene plots of UMAPs, each one showing a gene's abundance as the colour scale? `colour_by` accepts a single gene synbol but not a...gene set

scater

updated 23 months ago • Dario Strbenac

Dear Sir/Madam, I have a general question: Is it possible to combine count gene expression data coming from one sequencing technology (RNA seq) with intensity gene expression data coming from another...sequencing technology (e.g., illuminex), of course for the same set of genes? If this is possbile, would you please explain the methods

DESeq2 RNASeqData edgeR limma DifferentialExpression

updated 2.4 years ago • Sep

I am trying to obtain the promoter sequences of several genes using the GRCh38 genome and the information about transcripts locations from the TxDb package...gt; txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene # get transcript locations by gene: > chr_loc <- transcriptsBy(txdb, by = "gene") # get sequence 300 bp upstream of TSS for 10 first genes. &gt…

bsgenome genomeinfodb transcriptdb

updated 11.1 years ago • Diego Diez

pathway2only_DMSO) ``` i get dotplot graph and in the first annotation i get that there is 130 gene that involve, and i want to know what is this gene, how can i extract it? if i write: ```r pathway2only_DMSO[1]$geneID gene_list...3084" "23189" "6907" "22866" "55869" "5519" "7314" "471" so how can i know the full name gene from this R names? thank you very nutch for any ans…

chipseq ChIPseeker

updated 2.9 years ago • chavaleab

Series GSE138260 Agilent-034879 ADchip_1.0 033934 (Probe name version) In the above GEO Series, I have extracted top genes for differential expression analysis. Some of the IDs are as...ACUST_4167_PI426418842 Is there any Bioconductor package to convert these probe IDs to gene names or symbols? DAVID and g:Profiler gave no results

AgilentChip

updated 3.8 years ago • Prateek

span style="background-color:rgb(255, 248, 220)">rownames( counts ) <- raw.data\[ , 1 \] \# gene names </span> <span style="background-color:rgb(255, 248, 220)">colnames( counts ) <- paste(c(rep("C\_R",4),rep("T\_R",3)),c(1:4,1:3),sep="") \# sample names...tmp* ``<span style="background-color:rgb(255, 248, 220)">, value = value) : invalid 'row.names' length In …

edger tibble

updated 8.4 years ago • jattnicole29

div class="preformatted">Dear all, I observed this problem regarding the maximal length of a Rle vector: > rle = Rle(rep(0, 1000000000)) > length(rle) [1] 1000000000 > length(c(rle, rle, rle)) [1] -1294967296 Probably, it...is no warning message. I noticed this problem when I wanted to calculate the average coverage of a sequencing project accross the human genome. I…

Sequencing Sequencing

updated 14.1 years ago • Hans-Ulrich Klein

We are performing larval zebrafish RNAseq using STAR to determine abundances and we can successfully run this data through DEseq2. We would like to know whether it is possible to export a table...the dispersion corrections on the original input data (i.e. pre vs post DEseq2 effects on the actual abundance values). Thank you

DESeq2

updated 4.4 years ago • michael.morash

I have created an object of 3'-UTRs and can export them as gff. But I would like to also include the gene names in the gff-file. They are there in the column Name, but i don't know how to include this in the gff-file. My script: <pre> txdb...asGFF(utr) export(utr, "3_UTR.gff", format = "GFF")</pre>   I would like to also have the gene names under the column Name to appe…

rtracklayer gff

updated 8.8 years ago • Jon Bråte

I have performed a differential gene expression (DEG) analysis using the Clariom D human microrray and pd.clariomd.human (affymetrix-provided) annotation...package. In the top 100 DEG list, there are many gene names in lower-case letters that I am unable to identify. This includes names such as "shasmar", "tusweyb", "flybler.1" and "nugo

microarray pd.clariom.d.human

updated 8.0 years ago • Antonio Ahn

div class="preformatted">Hi, I need some help with the annotation package: To get the gene name for a given probe I perform something like this: > zebrafishGENENAME$"Dr.19073.2.S1_a_at" [1] "annexin A11a" My question...is: How can I do it the other way around, i.e. get all probes with gene names containing "annexin"? Any hint will be highly appreciated. Cheers, Georg </div

Annotation probe Annotation probe

updated 19.5 years ago • Georg Otto

enrichment analysis, using a table with KO numbers. Œs it possible to retrieve information about the abundance of the differential pathways? Because I have only KO numbers in my table (without all the kegg identification levels...so I don't know how to collapse them to pathways and find out how abundant is a specific pathway in each sample. Thanks Francesca

gage pathways kegg

updated 10.9 years ago • francesca.defilippis

asked me to repeat my question on this mailing list considering the statistical analysis of my RNA sequencing data. I am rather new to Bioconductor and RNA sequencing analysis (molecular biologist) and tried to read myself...3 contigs only the 3' most gets read counts assigned. This means I am testing against 3 features of which only 1 is capable of giving me informative...asexually. Also the 4 …

Sequencing Organism qvalue DESeq Sequencing Organism qvalue DESeq

updated 14.1 years ago • Markus Grohme

div class="preformatted">Dear all, I have the following probe names selected from a dataset (GDS592), taken from GEO (SOFT format). gnf1m29878_a_at gnf1m13556_at gnf1m09610_a_at gnf1m11352_a_at...gnf1m26036_at ...more... And I want to get the GO term and Official Gene Name from these probe names. Is there a way to do it? I tried these two Bioconductor commands but fail. > library(…

GO probe GO probe

updated 17.5 years ago • Gundala Viswanath

I am new to R and am trying to use the Variant Annotation package to annotate my list of SNPs to genes within a distance of 50kb. For the intergenic SNPs, there are several gene values/IDs listed under the PRECEDEID and FOLLOWID...if anyone could suggest a solution to this? The intronic SNPs were successfully converted to gene symbols, only the intergenic ones with more than 1 gene were not succe…

annotation variantannotation rstudio

updated 9.9 years ago • noor.suaini

Sleuth, Salmon+tximport+deseq2 or Tophat+StringTie+Ballgown. However I just want to detect the most effected (highest foldchange or hightest absolute expression change) promoter/TSS per gene between two conditions...groups (such as transcripts that share the same transcription start site (TSS)), Cuffdiff identifies genes that are differentially regulated at the transcriptional or post-transcript…

tximport differential isoform usage salmon tss rna-seq

updated 8.6 years ago • Steffen Heyne

I generated a metagenomics dataset (16S + ITS2) from soil samples, from where I also collected the abundance of an insect (counts of that insect per soil sample). Would it be statistically correct to include my insect count...analysis? My objective is to detect any possible correlations among my 16S/ITS2 taxa and insect abundance? Thank you

sparcc Microbiome microbiomeExplorer MicrobiomeData

updated 4.6 years ago • GlycineMax

I find that unlisting the GenomicRanges returned from a call to `transcriptsBy` returns a list with names that are gene names... only they are incorrect! Look: > txdb<-makeTranscriptDbFromBiomart(biomart="ensembl", dataset...dmelanogaster_gene_ensembl") ... > transcriptsBy(txdb,'gene')[2] GRangesList of length 1: $FBgn0000008 GRanges with 3 ranges and 2 elementMetadata cols…

GenomicRanges GenomicRanges

updated 13.4 years ago • Malcolm Cook

Hi, I recently use DECIPHER on nine 16S rRNA and eight functional genes to do microarray probe design. First I combine these target genes as DNAStringSet, and I use DECIPHER function named...OrientNucleotides" on my DNAStringSet to check the orientation of my target genes. Second, I use DECIPHER function named "AlignSeqs" on my DNAStringSet to align my target genes because I think their sequen…

DECIPHER

updated 5.2 years ago • henry1995910343

I am attempting to do differential gene expression analysis on kallisto aligned data from the TOIL project. I want to use `tximport` to summarize the transcript...level data to the gene level. The format of the abundance and count files is a matrix with ENST transcript IDs as rows and sample names as columns...I am wondering how I can use `tximport` to summarize these transcripts to the gene leve…

kallisto tximport TOIL

updated 10 months ago • Nicholas

23 > query(hub, c("cannabis","orgdb")) AnnotationHub with 1 record # snapshotDate(): 2023-10-23 # names(): AH114845 # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ # $species: Cannabis sativa # $rdataclass: OrgDb # $rdatadateadded...2023-10-20 # $title: org.Cannabis_sativa.eg.sqlite # $description: NCBI gene ID based annotations about Cannabis sativa # $taxonomyid: 3483 # $genome:…

KEGGgraph org.Cs.eg.db clusterProfiler KEGG AnnotationHubSoftware

updated 21 months ago • fernanda.backsouza

height:377px; width:572px"/> __What does this plot mean? Should I be using as my background \*all\* gene lengths, including those not tested by limma, for example because of insufficient read counts /isexpr <- rowSums(cpm(y...expression), and a named vector with gene lengths. 2. I am then running the following code (as a first step before testing GO enrichment of genes...in contrast 1…

goseq

updated 9.5 years ago • Darya Vanichkina

pre> x <- findChromPeaks(raw, param = cwp) Error in names(res) <- nms : 'names' attribute [12] must be the same length as the vector [6]</pre> I don't understand the error above. I can't seem to

R xcms vector

updated 8.2 years ago • bhgyu

Dear all, I have noticed that BSgenome.Mmusculus.UCSC.mm10 does not contain entries for upstream sequences (upstream1000, upstream2000, upstream5000) like for example BSgenome.Mmusculus.UCSC.mm9 does (see bellow). Is...Mus musculus (Mouse) | provider: UCSC | provider version: mm9 | release date: Jul. 2007 | release name: NCBI Build 37 | | single sequences (see '?seqnames'): | chr1 chr…

Mus musculus BSgenome BSgenome Mus musculus BSgenome BSgenome

updated 12.2 years ago • Diego Diez

Completed Deseq2, still seeing a bimodal distribution in my RNA sequencing data after normalization on a histogram![enter image description here][1]. I know you always say that one of the "bumps...description here][2] I am not sure if I am expected to see a unimodal distribution. Due to the nature of DESeq2 I am not expecting a perfect bell-shaped curve, as it uses raw RNA counts...however, jus…

DESeq2

updated 20 months ago • kcarey

of my RNA seq data: NBp <- p.vector(matrix\_counts, design, counts=TRUE) NBt <- T.fit(NBp) names(NBt)   get<-get.siggenes(NBt, vars="groups") s<-get$summary "s" is a dataframe which contains the significant genes for...each comparison, each experimental group vs. the control group.  I got around 6000 genes for each comparison but I need to sel…

masigpro rnaseq

updated 8.6 years ago • itspilipineiro

Hello, I have the list of gene IDs which are ensembl Ids I wanted do GO analysis for this list of genes. How can I do this from the GOseq I went through vignette...which is running the GOseq from the DE genes output, my genes list are not from the DE this are obtained according to my objective. can anyone suggest me? Gene IDs list...ENSMUSG00000074182 ENSMUSG00000078453 code What I hav…

goseq

updated 4.8 years ago • Lucky

Hey, I am using Deseq2 in conjuncture with Phyloseq to check for deferentially abundant OTUs along a natural gradient. My sample come from different habitats and I know that the community composition...Habitat + continuous_predictor + Habitat:continuous_predictor `` followed by `` results ( ... name = Habitat_1.continuous_predictor) ``. Yet as is stated in Eaxmple 3 in the ?Results documentatio…

deseq2 r

updated 8.4 years ago • fabian.roger

Hi, after I added gene description to `` y <- DGEList(counts=rawCountTable, group=group, genes = merged.descriptions) `` the gene names have replaced...by numbers > logCPM <- cpm(y, prior.count=2, log=TRUE) > head(y$genes) gene_name gene_description 3 sp0000003 &am…

heatmap.2 edger

updated 8.3 years ago • mictadlo

complexity produced by ATACseqQC. I think that we have to optimize the ratio of cell number and Tn5 enzyme concentration. However, I don't know whether I have to increase or decrease Tn5 enzyme concentration. Also, does the library...complexity result means that we have to increase the sequencing depth? Any suggestions are very welcome. Best, Gary ![Fragment size distribution][1] ![…

ATACseqQC ATAC-Seq quality analysis fragment size distribution library complexity

updated 6.9 years ago • Gary

Hi all! I have a question about subsetting with Biostrings that I hope I can get some insight on. I've read in a fasta file with multiple sequences as a DNAStringSet and translated that to an AAStringSet. From here, I would like to extract a subsequence from those AA sequences according to position within the sequence. For example, from start point 65-75 or 14-38. When I try to use the subseq fu…

biostrings aastringset subsetting

updated 8.7 years ago • bri.isabella

RSEM sample.isoforms.results files with tximport and have transcript level information summarized to gene level (Gene Name). The endpoint is to perform differential expression analysis with either DESeq2 or edgeR. To this end...I am planning to provide a tx2gene file in which each transcript points to the corresponding Gene Name. Is the code below correct? Please accept my apologies for this b…

tximport

updated 4.5 years ago • luca.s

I have BSgenome hg19, a GTF file containing gene coordinates, and a BAM file. This is what I want to do: 1. Search for all occurrences of my sequence, and save the coordinates...findPattern<- function(pat="CG", mydnaseqset) { spat <- DNAString(pat) sapply(seq_len(length(dnaset)), function(i) { y <- dnaset[[i]] matchPattern(spat, y) …

genomicfeatures rsamtools

updated 9.0 years ago • Vang Le

gt; files <- file.path(dir, "salmon", samples$treatment, "quant.sf") > names(files) <- paste0("sample", 1:6) > all(file.exists(files)) [1] TRUE > edb <- EnsDb.Hsapiens.v86 > txs <- transcripts(edb, return.type...reading in files with read_tsv 1 2 3 4 5 6 transcripts missing from tx2gene: 17113 summarizing abundance summarizing coun…

salmon EnsDb.Hsapiens.v86 tximport

updated 5.0 years ago • TJ

I prepared some of my Hi-C libraries with the Arima Genomics Hi-C prep kit that uses a restriction enzyme cocktail. If anyone works with similar libraries, do you know what option I should use for --sig when running presplit_map.py

diffhic Arima Genomics presplit_map.py

updated 6.5 years ago • shopnil99

not sure the best way to proceed. I have a non-model organism for which we would like to know the sequence for a set of genes. It's a bird species, and I can for example align to the zebra finch genome, then run bam_tally over a...region to get variants where my species differs from zebra finch, and then infer the sequence based on the reference and the variants. But that seems harder than it …

Organism

updated 11.4 years ago • James W. MacDonald

in the database will be used as background", or I can use every genes that I have kept to do my differential gene expression analysis, since I have filtered them to get rid of the low counts...r keep <- rowSums(counts(dds) >= 10) > 3 dds <- dds[keep,] ``` I get around 18k genes left. Then, I used the enrichGO function in two differents ways, as follows : ```r e…

DESeq2 clusterProfiler Pathways RNASeq org.Mm.eg.db

updated 19 months ago • adripel

I am relatively new to Bioconductor, and am strugling  to find the genome coordinates to a few genes, say for instance "APC".  I believe I managed to obtain the transcripts associated with the gene:   library(GenomicFeatures...txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene   Then I used the select method to get the gene id fo "APC", and then the TXNAME of…

genomicfeatures txdb.hsapiens.ucsc.hg19.knowngene cds

updated 9.4 years ago • madsheilskov

I just found a problem with rtracklayer's import function when importing amino acid fasta (also the default way of storing amino acid sequences. (E.g. the "Protein-coding transcript translation sequences" at [Gencode genes official site][1])). Here is an easy to reproduce example: ### Define local output path (here i use ~/Downloads as a tmp dir) outputPath <- '~/Downloads/'…

rtracklayer fasta import Biostrings

updated 7.0 years ago • k.vitting.seerup

Hi I have been trying to use IsocorrectoR for natural background correction but looks like it is unable to handle some molecular formulas with more than 99 hydrogen...Hi I have been trying to use IsocorrectoR for natural background correction but looks like it is unable to handle some molecular formulas with more than 99 hydrogen atoms...doesn't match the required syntax. Please let me k…

IsocorrectoR Error with molecular formula syntax

updated 6.0 years ago • ffarheen

Dear all, I have a set of short DNA sequences extracted from a Fastq into a data.frame that need to be translated to amino acids. After the extraction, they are...characters, need to transfer to DNA string, then translate. So I wrote sth like  for (n in 1:length(seqs.frame$DNA\_seqs)) {   translate(DNAString(seqs.frame$DNA\_seqs\[n\])) } The translation seems to be …

translation

updated 8.9 years ago • XIA.PAN

I have a RNA-seq data set that have many zero due to insufficient sequencing depth and low abundance for certain genes. I want to use DESeq2 to analyse my data, but not sure if DESeq2 can deal

deseq2

updated 10.2 years ago • KELVINLEE

egg develops into a complex multicellular organism is one of the most fascinating topics in biology. The current challenge is to understand how genes function as part of gene regulatory...the mouse limb bud as a paradigm to study the transcriptional and epigenetic mechanisms controlling gene expression during organogenesis. To this aim we combine genetic, molecular and cellular analysis with bioi…

PostDoc JobPosting Basel Switzerland

updated 4.1 years ago • Robert Ivanek

sure there are other ways to do this, so this i just a rough skelton There are different groups of abundance values for the facet\_plot 'barploth' function, but they will not plot using this function, even though the data is...scales::rescale(col,to=c(1.3,2))) \#this will demonstrate that one can make a ggtree object with abundance data (other examples show this tpp) g <- gt + geom\_t…

ggtree ggplot2 phyloseq geom_boxplot facet_plot

updated 8.9 years ago • matthew.wipperman