Bioconductor Forum

The tximport vignette suggests to just use RSEM’s summary from transcripts to genes (ie, rsem.genes.results.txt) instead of using tximport to do the summary from the transcripts file (i.e., use rsem.isoforms.results.txt...set `` type="none" ``, and supplying the appropriate column names as arguments).  Playing around I noticed that the main difference between the two approaches is the ef…

tximport

updated 9.6 years ago • ty.thomson

Sorry for horrible formatting. I am not used to the markdown on this site. I'm quite new to RNA-sequencing and am playing around with data to get a handle on it. I have quantified with \`Kallisto\` and am using \`tximport\` to summarize...transcript counts for differential gene expression analysis. I am running into a problem associating gene ID's with my transcripts for the summarization porti…

tximport Kallisto rna-seq

updated 8.9 years ago • cguzman.bioinformatics

to be less variation among depths than among treatments. The goal is to assess the differences in abundance among treatments and among depths. Ultimately, I would like to create relative abundance heat maps and bar charts...abundance heat maps and bar charts based on simple counts/library size proportions. Later, to do differential abundance tests...exptData(0): assays(1): counts rownames(1064):…

DESeq2 DESeq2

updated 12.0 years ago • Guest User

can not find  a way to do it. <pre> exons <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene") unlist(exons) # this will lost the gene name information.</pre> How do I add the name of the gene for the exons? I have read&nbsp

granges grangeslist txdb.hsapiens.ucsc.hg19.knowngene

updated 8.2 years ago • tangming2005

To get started analyzing my m6A sequencing, I used the Guitar R package to generate a feature distribution map， but I got an error report Error in stop_if_wrong_length...seqnames'", ans_len) : 'seqnames' must have the length of the object to construct (1) or length 1 here is my code library(BayesPeak) rm(list=ls()) options(stringsAsFactors = F) library

Guitar

updated 3.9 years ago • 建国

x, suggest.trim = TRUE) : GRanges object contains 1 out-of-bound range located on sequence MT. Note that ranges located on a sequence whose length is unknown (NA) or on a circular sequence are not considered...out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See ?`trim,Genomi…

GenomicPlot

updated 15 months ago • Lucky

While doing the gene mapping from Affymatrix ids, I found out that there are multiple affymetrix IDs corresponding to a single gene name for...a sample. My work revolves around gene names only. Please suggest the appropriate step to deal with this situation. The code is as follows: ```r require("biomaRt") mart

biomaRt

updated 21 months ago • Reeya

<div class="preformatted">Hello friends, i am working on some SMD two color data for Toxoplasma gondii , i have done differential expression analysis. Now i want to annotate these genes, now the problem is that how can i convert the probe ID's to Gene names using annotation package and with what database because there is no database in bioconductor related to t. gondii. if i have to pick…

Annotation GO probe annotate convert Annotation GO probe annotate convert

updated 15.0 years ago • Budhayash Gautam

div class="preformatted">Hello to the list I have recently started mapping next generation sequencing data to the human genome and would now like to map this to genes and other annotated genomic features. Whilst I have...found that I can map and sort genes with the biomaRt package I have completely failed to extract genome sequence using simple chromosome coordinates...tried to use the getSe…

Sequencing biomaRt Sequencing biomaRt

updated 17.0 years ago • RS Illingworth

<div class="preformatted">Hi, Is it normal not to have the same number of sequences in the fastq file and the object generated from readfastq? grep @ SRR062641.filt.fastq | wc -l 187786   data <- readFastq...div class="preformatted">Hi, Is it normal not to have the same number of sequences in the fastq file and the object generated from readfastq? grep @ SRR062641.filt…

updated 12.8 years ago • carol white

would be the best way to correct for GC content bias and transcript length bias? I have previously used CQN after quantification with HTSeq to normalise for these biases, but I am now quantifying...gene expression using Salmon and was wondering if that step was necessary if using the option --gcBias in Salmon, and then using...DESeqDataSetFromTximport, which does a gene length correction, in ord…

salmon cqn tximport deseq2 rnaseq

updated 8.9 years ago • dmr210

get information on which feature is overlapping with another feature. But, instead of using "feature names" it just indicates 1,2, 3...etc. Is there a way, i can make it output gene names. (my GRanges objects are read from gff files using...Rtracklayer & later have names (geneids) assigned to them using names() function). > findOverlaps(au, am) Hits of length 1740 queryLength: 790…

updated 13.6 years ago • gowtham

<div class="preformatted">I'm trying to get text descriptions of PFAM family names from the PFAM package. I've tried to run the example code from the pfamAC2PDB help. > AC2DE <- pfamAC2DE() > head(AC2DE) $PF00244 [1] "14-3-3 protein" > pfamAC2DE(ac=sample(names(AC2DE), 3)) Error in sample(names(AC2DE), 3) : cannot take a sample larger than the population…

updated 18.0 years ago • Daniel Gatti

nbsp;      mode = "onDisk")   > mzs <- mz(raw\_data) Error in names(res) <- nms :    'names' attribute \[50\] must be the same length as the vector \[25\] In addition: Warning message: stop worker

xcms xcms3 mzxml

updated 7.4 years ago • goh

Background: I have in a fasta file with reference sequences for strains of bacteria and I want to align sequences I have from another source, in a data frame, to their corresponding...elegant way to do it in order to keep the reference sequences separate? Specific questions in bullet points: My fasta with reference sequences was read in using > allemmDNA...Just a small question here, i…

R decipher biostrings

updated 8.0 years ago • reubenmcgregor88

<div class="preformatted"> I've done plenty of preprocessing, just for the sake of statistical methods, but right now I find myself being rather illiterate when it comes to finding out about specific genes. I gather I have to use something like get(<myfavoritegene_name>, revmap(<description_of_array>)) but unfortunately I'm...but right now I find myself being rather illiterate …

Preprocessing Preprocessing

updated 13.1 years ago • Guest User

mm10 as my annotation file. What I currently have are all the NM\_\#\#\#\# and NR\_\#\#\#\# for the genes, but I need the gene names. I was able to get a .csv that had quite a few, so I just used R to search (grep) and match the names for...task. I'm hoping there is a function that allows me to input, for example, NM\_001001130, and the gene name Zfp85 would be returned. Thank you in advance for…

ucsc gene set analysis genome annotation

updated 10.7 years ago • n_bormann1

been experimenting with the very nice biomaRt package and noticed in the vignette (section 5) that sequence retrieval appears to be restricted to the cDNA (possibly only UTR) or peptide sequences. From an earlier posting on...the mailing list, I saw a way to retrieve upstream sequences: library(biomaRt) ens<-useMart("ensembl",dataset="hsapiens_gene_ensembl", mysql=TRUE) entrez <- c…

biomaRt biomaRt

updated 18.9 years ago • peter robinson

Hi Can anyone please guide me on plotting a graph showing the gene names of top differentially expressed genes in edgeR ?   &nbsp

edgeR plot differential gene expression

updated 8.2 years ago • fawazfebin

Hi, I have been using the pre-ranked cameraPR function for a custom ranking of genes for camera.  If I use the function as is, I get no hit genes at all: <pre> <strong>cameraPR(statistic=the_stat,index=Hs.c5.gene_name...0 Up NaN NaN</pre>   I think this might be due to gene names of the statistic getting stripped away inside the function by as.n…

limma camera

updated 8.4 years ago • sarah.williams1

  Hi Everyone at Bioconductor,   I've been doing work with large multiple sequence alignments in which I find the polymorphic sites, keeping them and their base positions, and rejecting all the non-polymorphic sites. I usually use a package like ape for this - which implements the DNAbin class. This time the MSA I have is simply too large and can't be represented as a DNAbin …

sequence alignment biostrings

updated 11.1 years ago • ben.ward

with KEGGgraph are the entry IDs 12801 and 13489, but what I would like to get are the associated gene names (Cnr1 and Drd2). I did the following: library(KEGGgraph) tmp <- tempfile() retrieveKGML(pathwayid='mmu04015' , organism...mmu' , destfile=tmp, method="wget") pathway <- parseKGML(tmp) nodes\[\[60\]\]@name\[\[1\]\] ---> retrieves mmu:12801 nodes\[\[6…

kegggraph

updated 10.0 years ago • Osvaldo

div class="preformatted">Dear list, I have gene expression data with probeset IDs & gene set data with gene names. So both gene set and expression data are not using...the same gene ID system. Both gene set and expression data should use the same GENE ID system which is a requirement of the GAGE analysis...So the problem is that if i convert the Probeset IDs to gene name, i get a singl…

convert gage convert gage

updated 13.8 years ago • Javerjung Sandhu

lt;- UniProt.ws(taxId = 9606) select(x = database, keys = c("P01613", "P01861"), columns = "GENES", keytype = "UNIPROTKB") Getting extra data for P01861 'select()' returned 1:1 mapping between keys and columns UNIPROTKB GENES...1 P01613 <na> 2 P01861 IGHG4 If you check the UniProt website both [P01613][1] and [P01861][2] have a gene symbol. Why do I ge…

UniProt.ws

updated 4.5 years ago • Dario Strbenac

www.ensembl.org/index.html). If you are using biomaRt, you can change your host to access our most recent data: ensembl\_mart\_84 <- useEnsembl(biomart=“ensembl")   * Ensembl Genes 84 * Renamed "phase" attribute to "start...phase" in the structure and sequence sections * Renamed homolog, paralog and ortholog to homologue, paralogue and orthologue …

ensembl mart release84 biomart ensembl release News

updated 9.7 years ago • Thomas Maurel

1:6],fit=fit,eb=eb,adjust ="fdr") ? Can I replace genelist=genelist[,1:6] with genelist=RG$genes or something similar? It seems to be necessary to get the names of the genes in the output from Bayes analysis. I am using...non-model organism with some cDNA libs of my own construction, so only a fraction of the clones are sequenced and the rest have clone ID's. By the way, most of the image link…

limma limma

updated 21.9 years ago • Dennis Hazelett

probes but have no clue which packages I need. --Probe SNPs, SNP10 --UCSC refgene name, group and accession number. I've come across SNPlocs.Hsapiens.dbSNP.20101109, GenomicRanges, GenomicFeatures as...any related packages, but cannot see how I can use these. Any help is most appreciated. Best Regards, Hajja [[alternative HTML version deleted]] </div

SNPlocs GenomicRanges SNPlocs GenomicRanges

updated 14.5 years ago • khadeeja ismail

simply do not know enough about what is going on. I have seen data, where if you find significant genes on one platform and then select significant genes on another (same sample run on multiple platforms), and you draw a VENN...said it was just a logical guess. > > > > What I meant was that if you had 2 homologous genes, obviously it > > is going to be h…

SNP Cancer affy gcrma SNP Cancer affy gcrma

updated 21.5 years ago • Peter Wilkinson

of multi-omics data, (2) integrative analysis of multi-omics dataset for better understanding gene regulation and cancer etiology, and for biomarker discovery, and 3) development of algorithms and tools for designing...effective gRNAs with minimal offtarget effects. Most recent relevant publications include Nature Methods 6(6):453-454. 2019. PMID: 31133757; Genome Res. 2019 PMID: 31201210; Nat..…

single cell sequencing multi-omics data mining and integration Job

updated 6.4 years ago • Julie Zhu

conditions. My first issue is that I would really like the transcript IDs to be replaced by the Gene names. I've used Biomart before to convert the transcript ID to ensmbl gene names in the past. But when I do this, I end up with...multiple transcripts for the same gene. I will look at the fold changes for Gene A, only to find that there are three different entries. Is there any way to generate..…

DESeq2 DESEQ2 tximeta SummarizedExperiment heatmaps

updated 3.3 years ago • uhlkatie

gt;From: Srinivas Iyyer <srini_iyyer_bio at="" yahoo.com=""> >Subject: [BioC] Limma: How to read gene list , coordinates of sport > when NO GAL file available >To: bioconductor at stat.math.ethz.ch > >Dera group, &gt...is an excellent module for gene expression data >preprocessing and analysis. >however, I looked into many places…

limma ArrayExpress limma ArrayExpress

updated 19.8 years ago • Gordon Smyth

255, 254, 196)">Biostrings</span>::readAAStringSet on the latest UniprotKB/Swissprot FASTA-File many sequences are read in wrong. Only random sub-sequences remain after reading in the FASTA File. I confirmed this manually and...filexp\_list, nrec, skip, seek.first.rec, : reading FASTA file /opt/share/blastdb/uniprotkb/FASTA/uniprot\_sprot.fasta: ignored 68720968 invalid one-letter se…

bug biostrings readAAStringSet

updated 9.3 years ago • asis.hallab

Using the msa package, I did a multiple alignment of 3 highly similar sequences, each of length ~8 kb.   I then tried to print them to an image using the  __msaPrettyPrint__ function.   LaTeX

msa msaprettyprint() multiple sequence alignment

updated 8.9 years ago • map2085

Dear professor. When I used clusterProfiler for KEGG annotation, the results only provided gene ID for each pathway, since the function of "enrichKEGG" could not use argument "readable = TRUE" like "enrichGO" function. I...Dear professor. When I used clusterProfiler for KEGG annotation, the results only provided gene ID for each pathway, since the function of "enrichKEGG" could not use argument "…

annotation

updated 7.0 years ago • huangzhiguang2016

of translated transcripts that contain the amino acid patterns I am interested in. Here are their names:> names(cds_seqs[i]) [1] "uc001ack.2" "uc001acv.3" "uc001adm.3" "uc001ado.3" "uc001adp.3" "uc001adq.3" "uc001adr.3" "uc001aee.1...uc009vle.1" "uc001ajj.1" "uc001ajk.1" [19] "uc001ajy.2" The question now is how do I go from these names to conventional protein names and (or) ENTREZ id…

GO GO

updated 13.7 years ago • Zybaylov, Boris L

Dear all, I've a dataset of around 150 RNAseq from different patients and after performing quantification of transcript with Salmon. I'm interested in founding immunologic signature correlate with the expression of a specific gene isoform. So I've retrieve the abundance (TPM) of that isoform among my 150 RNAseq and I'm using this as a continuous phenotype into GSEA. Then I've performed gene su…

deseq2 gsea tpm rnaseq salmon rna seq

updated 7.2 years ago • leo_CD

<div class="preformatted">Hello, Lately I have been working on counting sequence fragments in larger sets of sequences. I am searching for thousands of fragments of 30 to 130 bases in hundreds of thousands of sequences between 1200 and 1600 bases. Currently I am using the following method to count the number of "hits": #### start #### library(Biostrings) fragments <- DNAStringSet…

updated 15.5 years ago • Erik Wright

Dear Pablo, It is great that you got the annotated peaks now. To add additional annotation such as gene symbol to the annotatedPeak, I would suggest do the following, Ann.peaks = as.data.frame(annotatedPeak) Then merge Ann.peaks...with the annotation data containing gene symbol and ensembl ID etc obtained from getBM function in biomaRt package. If you encounter problem using getBM, please...so…

Annotation AnnotationData annotate convert biomaRt Annotation AnnotationData annotate

updated 15.0 years ago • Julie Zhu

between the two groups, I visualize them in a plotMA graph. To my surprise this shows most genes are not around 0 for the y-axe (log2Fold) but -1. I added a link to the plot I made. What am I doing wrong? https://imgur.com/a...C1ZRSAH   library(DESeq2) \# g1 = cell names group 1 \# g2 = cell names group 2 \# x = data frame with unique reads (raw) of all cells x <- …

deseq2

updated 7.0 years ago • mdroog

Hi everyone, I finding that bitr is returning different gene ID lengths, depending on which keytype I am using. Could anyone shed some light? For example... ```r sample_list <- c("A2M","ABL1","ADCYS...fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db") # Sample_list returns a vector of length 4, but the resulting conversions are both of length 3. Missing the ADCYS gene. …

clusterProfiler

updated 3.3 years ago • quoc.t.nguyen96

We have created a new experimental data package called 'seqc'. It includes gene-level read count data generated by the SEQC (SEquencing Quality Control) project, which is the third stage of the well-known...initiative). The SEQC/MAQC-III Consortium produced benchmark RNA-seq data for the assessment of RNA sequencing technologies and data analysis methods (published recently on Nature Biotechnolog…

seqc rsubread featurecounts subjunc ercc News

updated 11.2 years ago • Wei Shi

also Arabidopsis. Here is a short example using Gramene: > library(biomaRt) > listMarts() name version 1 ensembl ENSEMBL 42 GENE (SANGER) 2 compara_mart_homology_42 ENSEMBL 42 HOMOLOGY (SANGER) 3 compara_mart_pairwise_ga_42...42 VARIATION (SANGER) 5 …

SNP Transcription biomaRt Vega SNP Transcription biomaRt Vega

updated 9.2 years ago • Steffen Durinck

certificate and I am tutoring and helping colleagues. Right now, I am trying to analyze a RNA-seq gene expression data set of genes related to intramuscular fat in black and White pigs to asses meat quality and see which...genes are most crucial for meat quality. The genes are all over the place, expressed in different tissues, ATPases, and yes, some...need to be too complex because right now, …

edgeR RNA-Seq log2 foldchange q_value

updated 4.0 years ago • johndavidgriner74

Hi, I am trying to add gene names to match ensembl gene ID, below is the code and the error. Any suggestions will be good. Thanks a lot! genemap=getBM(attributes...query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. Please report this on the support site at http://support.bioconductor.org

biomart

updated 5.8 years ago • xiaolei.zhou

BioMart database will fix it's current Ensembl 47 release with the following attribute/filter name changes involving gene symbols. * For the hsapiens_gene_ensembl (human) dataset you'll need hgnc_symbol as attribute...and filter name to use/retrieve gene symbols. * For the mmusculus_ensembl_gene (mouse) dataset you'll need the mgi_symbol as attribute...and filter name (this was markersymbol in p…

biomaRt biomaRt

updated 18.2 years ago • Steffen

trinityrnaseq/wiki/Trinity-Differential-Expression **How well does it work without the RSEM abundance of estimation?** We were thinking of feeding normalized gene counts (e.g., by TMM logCPM) into the Trinity protocol. Thank

Trinity DifferentialExpression RSEM RNAseq edgeR

updated 5.1 years ago • harelarik

v1.34) to analyze RNA seq data in r(4.13), but I run into two problems. (1) DESeq2 renames all the genes (2) lack of downregulated genes. My data look like this (top and bottom 5 entries): ```r "normal1" "normal2" "normal3" "tumor1" "tumor2...1440 845 1145 2097 ``` The seq of commands that I am following is: ```r #read in the data for gene expression and disease conditions coldata &a…

DESeq DESeq2

updated 3.5 years ago • manwar

chipqc) ``` When I ran the function ChIPQC, I got the following error: ``` "Error in names(res) <- c("Reads", "Map%", "Filt%", "Dup%", "ReadL", "FragL", : 'names' attribute [9] must be the same length as the vector [7]" ``` I continued to run the ChIPQCreport

ChIPQC

updated 6.6 years ago • analeigh.gui

am editing/filtering NGS data using ShortRead and I want to trim the reads __outside__ of my primer sequences. The reads are not the same length though. I know I could trim the primers and what flanks them but including the primer...sequences is important to me. Here is an example of what I'm looking for with 3 sequences in a DNAStringSet: __start with:__ <pre

sequencing biostrings shortread

updated 10.2 years ago • steven.everman

I recently run into the problem that the gene annotation in MSigDB is not up to date (for example, the alias C14orf129 is used as a gene name in various gene sets, while...symbol # DUX4L DUX4 # DUX4L LOC112268343 indices <- NULL for (gene in setdiff(duplicated.aliases, aliasSymbol$symbol)) { tmp.index <- which(aliasSymbol$alias_symbol == gene) ali…

gseabase msigdb genesetcollection Tutorial

updated 7.6 years ago • t.kuilman

<div class="preformatted">Hi, When I tried to get the exons sequence, I got an error.. library("BSgenome.Hsapiens.UCSC.hg19") txdb<-TxDb.Hsapiens.UCSC.hg19.knownGene tx_Exons&lt...div class="preformatted">Hi, When I tried to get the exons sequence, I got an error.. library("BSgenome.Hsapiens.UCSC.hg19") txdb<-TxDb.Hsapiens.UCSC.hg19.knownGene tx_Exons...exonsBy(…

updated 11.5 years ago • Asma rabe

professionals, and specialists. We are now preparing materials for our monthly journal named "Protein, Nucleic Acid and Enzyme (PNE.)", Vol.48 (2003), No.2, Feburuary, Issues. In this connection, we would like to introduce...so that they could go this site easily; + http://www.bioconductor.org/ We should be most grateful if you would grant us your permission to reprint and …

GO GO

updated 23.0 years ago • Momoko Takahashi

Hi,   I'm trying to retrieve the sequences of a certain transcript represented as a `` GenomicRanges `` object.   Here's the `` data.frame `` of that transcript...Hi,   I'm trying to retrieve the sequences of a certain transcript represented as a `` GenomicRanges `` object.   Here's the `` data.frame `` of that transcript: <code...attr(,"package") …

BSgenome getSeq

updated 8.5 years ago • rubi

Hi, I am trying to convert gene names to enterz ids for further analysis. I am using: BiocManager::install("AnnotationHub") library(AnnotationHub) hub &lt...columns(SL_OrgDb) BiocManager::install("clusterProfiler") library(clusterProfiler) genes <- read.table("genes.txt", quote="\"", comment.char="") geneList<-genes geneList1<-(gene…

Bioconductor

updated 3.2 years ago • gk13102603

div class="preformatted">I am trying to DNA sequence of the upstream regulatory region of a number of genes using the biomaRt package. I start with a list of EntrezGene...and export the data in .FASTA format. I have found that this works well when I search for one gene at a time. But when I input a list of entrez gene ids to the getSequence function it gives me back sequences but the sequen…

biomaRt biomaRt

updated 14.1 years ago • mpg33@drexel.edu

missing from tx2gene when using the Homo_sapiens.GRCh38.113.chr_patch_hapl_scaff.gtf file. **Most of what I have seen documented uses the Homo_sapiens.GRCh38.113.gtf file, but is it not better to use the chr_patch_hapl_scaff...in files with read_tsv 1 2 3 4 5 6 7 8 9 10 transcripts missing from tx2gene: 25650 summarizing abundance summarizing counts summarizing length > &g…

tximport

updated 7 months ago • Nicholas

no" (default), "scaledTPM", or "lengthScaledTPM", for whether to generate estimated counts using abundance estimates scaled up to library size (scaledTPM) or additionally scaled using the average transcript length over...using scaledTPM or lengthScaledTPM, then the counts are no longer correlated with average transcript length, and so the length offset matrix should not be used.</p> </td…

tximport deseq2 TPM RPKM rnaseq

updated 7.7 years ago • tangming2005

by=6)) m.dil <- new.env() m.dil$match <- list(ind[1]) m.dil$match <- c(m.dil$match, ind[2:length(ind)]) m.dil <- as.list(m.dil) length(m.dil$match) # [1] 146637 id.dil <- hgu95av2probe$Probe.Set.Name[ind] dil.cdf <- buildCdfEnv.matchprobes...arrays=640x640 features (6405 kb) cdf=new.dil.cdfenv (12453 affyids) number of samples=2 number of genes=1245…

ath1121501 cdf probe ath1121501 cdf probe

updated 21.2 years ago • Hee Siew Wan

I have transcript abundance data (obtained with kallisto), which I converted into non-normalized gene-level counts with tximport. In my resulting...count matrix, each gene has an Ensembl ID. Before running DESeq2, I converted the gene Ensembl IDs into gene symbols. However, it so happens that...a few different Ensembl IDs map to more than one gene symbol. Thus, in my count matrix, 16 genes are du…

DESeq2 RNAseq

updated 5.4 years ago • Nikolay Ivanov

I'm using the gometh() function of missMethyl and am trying to pull out the names of the differentially methylated genes from the output table, which looks like this: ``` go <- missMethyl::gometh(sig.cpg...metabolic process 146 5 1.082826e-07 0.0003628179 ``` The *DE* column is the number of genes that are differentially methylated. How do I find out, for example, the names of the…

missMethyl

updated 5.4 years ago • stewart999