Bioconductor Forum

Hi, I have a list of putative transcription factor binding sites, and in order to continue with the most relevant ones for further analyses i would like to filter on 'conservation score' (the assumption is that conserved sequences...are more likely to be functional than less/non conserved sequences). I have read on this, and found out that both ENSEMBL (GERP score) and UCSC Browser (multiz align…

Transcription biomaRt Transcription biomaRt

updated 17.1 years ago • Guido Hooiveld

to do comparative RNA-seq analysis with DESeq2. My purposes are: **1. combine transcripts into genes 2. detect gene expression difference under different conditions 3. obtain a summarized gene expression table** 1. combine...way? I found the sum of the gene expression of each sample is largely diverse.** 2. detect gene expression difference under different conditions I followed...c…

deseq2 tximport

updated 6.9 years ago • marie

myself using using Views and viewMeans, viewSums, etc. to calculate summary statistics by tiles on sequences. I have a GRanges object with 1-width ranges (but this should apply more generally), and metadata columns have some...this with something like: data <- Rle(NA, length=seqlengths(txdb)['chr1']) data[start(my_rngs)] <- my_rngs$variable # simple, since my features are are 1-wid…

GO convert GO convert

updated 11.6 years ago • Vince S. Buffalo

bioinformatics.org] Sent: Thursday, July 29, 2004 2:34 PM To: Lapointe, David Subject: [BiO News] Gene name errors introduced by Excel A research article at BMC Bioinformatics: BACKGROUND: When processing microarray data...sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default...format co…

Microarray Microarray

updated 21.5 years ago • David Lapointe

Hi forum, I am trying to batch blast several protein sequences stored in a R data frame, and then filter the results based on homology eg. more than 80%. I can see that there are several...packages that seemingly would be applicable, but would most likely receive any feedback from anyone who have had similar challenge and overcome it successfully. Thanks, Andres

R blast blastsequences protein

updated 8.0 years ago • andres.susrud

div class="preformatted">Hi all, Is there a way to fetch genomic sequences via Bioconductor directly? (Using galaxy, but I would like to automate) I tried rtracklayer and biomaRt - rtracklayer...doesn't seem to have an interface for fetching sequences, and biomaRt only seems to fetch sequences from a subset of gene ID's, while I just to need to fetch sequence from a genomic...range. fetchSe…

biomaRt rtracklayer biomaRt rtracklayer

updated 16.0 years ago • Johannes Waage

at normalization data step and i obtained normalized counts matrix, How can I change Ensemble gene name into common gene name

tcgabiolinks grch38 annotation

updated 8.2 years ago • salvocomplicazioni1

Hello, My name is Ruba I am a PhD student and I am analyzing Bulk RNAseq data for my project. I wanted to use the heatplot to visualize KEGG...enrichment analysis and see the connection between genes and related pathways, at first the plot showed all the genes overlapping so I couldn't differentiate any of it similar...support.bioconductor.org/p/9148486/#9148486 I tried the proposed solution but…

enrichplot stringr heatplot

updated 2.4 years ago • ruba-mahmoud

gt; refgene = ahub[['AH5040']] > table(table(mcols(refgene)['name'])) 1 2 3 4 5 6 7 8 9 10 12 13 19 45178 568 102 27 74 52 171 129 13 19 1 1 5</pre> The vast majority of genes has only one transcript, but 1,162...width(.) ) %>% subset(., select=c(name, chrom, start, end, transcript.len…

refseq annotationhub

updated 10.3 years ago • dalloliogm

from Ensembl (v72 hg19) for one of those genes and I do not understand why there should be one of length 0. Here you are a histogram of all exons showing very short exons...006"; gene_id "ENSG00000001460" Now the gtf used as input for dexseq_prepare_annotation.py for that gene searching for the start and end coordinate: grep 'ENSG00000001460' hg19.ensGene_withGeneName.gtf | grep '24683489...resu…

Annotation DEXSeq Annotation DEXSeq

updated 11.9 years ago • Jose M Garcia Manteiga

Hello: I am using DESeq2 to analyze gene expression data (counts from Illumina sequencing).  I have samples from two different populations (A & B) reared in...I set my reference population to A and my reference rearing condition to X, but because of the nature of populations and treatments (which occur in the wild and are not lab/drug manipulations) these are somewhat arbitrary…

deseq2 deseq

updated 11.2 years ago • evakfisch

<div class="preformatted"> Hello, I have three sequenced (Illumina) genomic libraries (L1,L2,L3) and three biological replicates for each (A,B,C). I'd like to compare those libraries, and find which are the most similar (e.g. is "L1" more similar to "L2" or to "L3", based on the profile of mapped short reads in the libraries). I've generated windows...div class="preformatted"> Hello, …

updated 12.2 years ago • Guest User

the top 20 categories using topGO. My question is, is there a way to generate a list of the gene names in the returned "DE" column, for examples the 71 genes reported in the below example?  Term         &nbsp

missmethyl

updated 8.8 years ago • r.clifford

gt; hg19_tx <- extractTranscriptsFromGenome(Hsapiens, hg19txdb) #Create DNAStringSet with names associated with each probe > probeset <- DNAStringSet(probelist$sequence) > names(probeset)<-probelist$probenames...then the names appear to be lost: > ps_pdict1<-PDict(probeset, max.mismatch=1) > txmatches1 <- matchPDict(ps_pdict1, h…

probe probe

updated 14.5 years ago • Ian Henry

div class="preformatted">Dear All, Does anyone know that how to extract intragenic sequences from the genome. Like in the genescan, it is mentioned that the predictions are based on transcriptional, translational...and donor/acceptor splicing signals as well as the length and compositional distributions of exons, introns and intergenic regions. But I am not sure which function I should

updated 14.1 years ago • Yating Cheng

My project's goal is to understand how DNA sequence specifies gene expression changes in a fungus under stress. There are 2 datasets, count data contains rna_count...My project's goal is to understand how DNA sequence specifies gene expression changes in a fungus under stress. There are 2 datasets, count data contains rna_count and metadata contains data information. The metadata consists of temp…

rnaseqGene

updated 7 months ago • Ferdinand David

Dear all, If I understand correctly, TMM-normalisation has two assumptions: 1. Most genes are not DE 2. Comparable number of DE genes that are up/down-regulated (i.e. symmetry) According to my analysis (I am using

limma edger normalization

updated 6.5 years ago • mikhael.manurung

lt;- 10 # remove read tails with quality lower than this seqs <- sread(reads) # get sequence list qual <- PhredQuality(quality(quality(reads))) # get quality score list as PhredQuality > length(qual) #my length...is positive [1] 39145018 > myqual_mat <- matrix(charToRaw(as.character(unlist(qual))), nrow=length(qual), byrow=TRUE) Error in .Cal…

charm charm

updated 12.7 years ago • Sam McInturf

Steve Lianoglou; bioconductor List Oggetto: Re: [BioC] R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) On Sat, Jun 27, 2009 at 1:42 AM, <mauede@alice.it> wrote: > What is the attribute correspondent to the miR name...to see what attributes are available if you are ever in doubt. > > > I have to link the gene information (actually right now I …

miRNA Biophysics Homo sapiens biomaRt PING miRNA Biophysics Homo sapiens biomaRt PING

updated 16.6 years ago • mauede@alice.it

Hi, I am trying to tximport Salmon output from the Galaxy, which is a quantification file in tabular format (instead of .sf format?). I have followed the Users manual of DESeq2 to tximport but got the error at txi step. > files1 <- list.files( pattern = ".txt",full.names = TRUE) > names(files1) <- paste0("samples", 1:4) > all(file.exists(files1)) […

deseq2 tximport

updated 7.0 years ago • saddamhusain77

can offer some advice about where i am going wrong with a loop i'm trying to write for cleaning up UNIPROT data names.  Basically, the name i have from proteomics analysis is something like tr|A0A02DLI66|A0A02DLI66\_MYTGA

Proteomics Transcriptomics Loops batch-processing R

updated 7.3 years ago • laural710

I have gene list of 10,000 DNA sequences. I need to translate the DNA sequence to protein sequence using Biostrings package. I am able...the function: > translate ( DNAStrings ("ATG") ) But the problem is that I have to do one sequence at a time. I want to run the entire FASTA file at a time and get the output as protein sequence file. Please help me to find

Biostrings Bioconductor R

updated 6.0 years ago • mahasish.shome

FALSE, stringsAsFactors=FALSE) > rn<-paste(data[,1], sep="") > P_values=data[,-1] > names(P_values)<-rn > myGOdata=P_values > relevant.genes <- factor(as.integer(all.genes %in% myGOdata) + ) > names(relevant.genes...annFUN.org, geneSel = topDiffGenes, nodeSize=10, mapping = "org.Hs.eg.db",ID = "symbol") Building most specifi…

topGO

updated 9.0 years ago • hollinew

nLinesToRead <- NULL if (!is.null(n)) { nLinesToRead <- n - length(txt) } dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") geoDataTable <- new("GEODataTable", columns = cols, table = dat3[1:(nrow...1] 0 17 Browse[3]> dat3 [1] ID ORF [3] SPOT_ID Species Scientific Name [5] Anno…

Annotation PROcess Annotation PROcess

updated 9.8 years ago • James W. MacDonald

I have a bulk RNASeq dataset which has already been TMM normalised, and further normalised by gene length. Raw counts are not available for this dataset. I want to perform DGE analysis with DESeq2, or edgeR. Since, edgeR internally

edger deseq2 tmm normalised values

updated 8.2 years ago • Saumya001

but readDNAStringSet will not process it. I've tried it with other data and with different kinds of sequences (amino acid) and received the same error message -- I'm sure I must be missing something. My R output is below. Thanks so...much for any help! -- output of sessionInfo(): > genes<-keggLink("ath00906") > head(genes) [,1] [,2] [,3] [1,] "p…

PROcess KEGGREST PROcess KEGGREST

updated 12.4 years ago • Guest User

I’m working on RNAseq data using the DEseq2 R package. Most of the analysis will be differential expression between 2 (or 3) groups of samples with at least 3 biological replicates...Before displaying the differential expression of certain groups of genes, I would like to plot a heatmap of the, for example 50, most expressed genes in group 1 and showing the expression of those...genes in the othe…

deseq2 R rnaseq

updated 10.0 years ago • Bstn132

div class="preformatted">Estimated colleagues, I am working with CGH data. I have the name of the BAC clones and their chromosome locations. They correspond to OncoBAC arrays (?). Is there any way in bioconductor...to determine which genes belong to each BAC clone? Thank you in advance, Best, Federico PD: I am a newbie in R and Bioconductor </div

CGH BAC CGH BAC

updated 19.0 years ago • Federico Abascal

pre> lcpm<-cpm(y, log=T) head(lcpm)</pre> gives first row with unique numeric ID which denotes gene names from data file. How to export name of genes into \*.csv file but with name of genes ? <pre> write.csv(highly_variable_lcpm...row.names=y2$genes, file = "highly_variable_lcpm_genes.csv")</pre> I get error message as below <pre> Error in write.table(hig…

edgeR lcpm bioconductor

updated 7.8 years ago • Björn

Hi, Is there a way in SingleR to know for each cell in the test data- which genes (in the test cell) are mostly correlated with the genes in the predicted cell from the reference dataset? For example, for...X in test data that is predicted to be cell Y in the reference data - what are the highly correlated genes in X with Y? Thanks! Liron

singlecell singleR

updated 5.4 years ago • lirongrossmann

map ~50,000 loci from an RRBS experiment back to the fragments which resulted from the restriction enzyme digestion, but I am not sure how to do it. My goal is to determine if the loci are localized to a particular fragment size...each fragment (ends, middle). I have found many software packages that do in silico restriction enzyme digestions on the whole genome, but so far I am only able to e…

in silico digestion

updated 6.8 years ago • emb13

Hello dear List, when I save a cpm table (ordered by p-value for example) , I do not have the names of the genes but only cpms. How can I add them ? I used : > o <- order(lrt$table$PValue) > tab <- cpm(y) > write.table(tab, file="CPM.txt

updated 13.2 years ago • Guest User

Hi, I am planning to import the transcript abundances from tsv files that were calculated by kallisto using Quantseq 3' mRNA seq data. But I don't know what is a correct...Tx, txOut = FALSE, #determines whether your data represented at transcript or gene level countsFromAbundance = "no", ignoreTxVersion = TRUE

tximport tximportData Quantseq kallisto

updated 3.2 years ago • duhwa.lee

I am looking for and compare algorithms which can calculate "distance" or "similarity" between two gene lists with different lengths. Any paper, any implementation in R and any suggestion is welcome! Thanks, -- Weiwei Shi, Ph.D

updated 17.4 years ago • Weiwei Shi

the process correctly): when I am exporting the csv file, there are duplicate entries for some gene names (i.e. ESR1). I am under the impression that RMA and the process I am using (target = 'core') summarizes at the gene level, so...I am not sure why I am getting duplicate entries for certain (not all) genes after writing the expression file. I have gone through this process with some mouse ar…

Annotation PROcess Annotation PROcess

updated 11.8 years ago • Guest User

the Affymetrix web site (Version 29 according to my notes). It occasionally contains more than one gene per probeset separated by 3 slashes. For example for the probeset 1415691_at the genes listed are Dlg1 /// LOC100047603...Presumably this is because the probeset is found to have specificity for more than one gene. When I get the annotation from mouse4302.db (version 2.2.11) using the…

Annotation Cancer mouse4302 annaffy ASSIGN Annotation Cancer mouse4302 annaffy ASSIGN

updated 15.7 years ago • Richard Friedman

<div class="preformatted">Hello BioC users, My question is pretty vague, so please bear with me. I want to do Gene set enrichment analysis (GSEA) on zebrafish agilent array data. I read the user guide and vignettes but still it is not quite...preformatted">Hello BioC users, My question is pretty vague, so please bear with me. I want to do Gene set enrichment analysis (GSEA) on zebrafis…

Annotation GO zebrafish probe Annotation GO zebrafish probe

updated 15.7 years ago • Neel Aluru

analysis is usually performed by aligning trimmed miRNA reads against the genome or miRBase mature sequences with optimized parameters for aligners such as `bwa aln -n 0`, `bowtie --best --strata`, `bowtie2 --very-sensitive-local` or...with mature miRNA coordinates from miRBase to assign features and when aligned to miRase sequences directly one can use `samtool idxstats`. Established pipelines t…

tximport DESeq2 edgeR swish miRNA

updated 3.6 years ago • BG

Is there a way to get latest gene Symbol or Entrez ID in R? I am using `AnnotationDbi` and org.Hs.eg.db but it seems to give old gene name. For entrez ID 64755...the new gene name is [RUSF1][1] but it gives C16orf58 ```r library(AnnotationDbi) library(org.Hs.eg.db) AnnotationDbi::select(org.Hs.eg.db...pkgconfig_2.0.3 [13] memoise_2.0.0 ``` [1]: https://www.genecards.org/cgi-bin/carddisp.pl…

AnnotationDbi org.Hs.eg.db

updated 4.8 years ago • sgupt46

googling I have found the the bedtools 'nuc' command will give me the GC content with ranges and the length. Providing that I have a bed file of the hg38 genome. What I need to make sure of is that I am calculating the length and the...makeTranscriptDbFromUCSC(genome='hg38',tablename='refGene') tr_by_gene <- transcriptsBy(txdb,'gene') library(Rsamtools) r_AE89_m <- readGAlignmen…

RNASeq edgeR cqn RNASeq edgeR cqn

updated 11.8 years ago • Matthew Thornton

Hey everyone, I've been using biomaRt to access the sequences of genes for downstream analysis. Last night, the below snippet of code was functioning just fine, but today it's no...Hey everyone, I've been using biomaRt to access the sequences of genes for downstream analysis. Last night, the below snippet of code was functioning just fine, but today it's no longer working. Does anyone have …

biomaRt

updated 6.0 years ago • tabbott

Hi all, I ran into an edge case situation of kallisto not processing GENCODE transcript identifiers correctly, and this currently propagates into tximport. Ideally this should be fixed upstream in kallisto, but we should harden tximport against this situation. Here's an example kallisto run aligned against GENCODE that is problematic: https://share.steinbaugh.com/kallisto-gencode-dmso.tar.gz C…

tximport

updated 2.9 years ago • Michael Steinbaugh

<div class="preformatted">Hi everyone in the list!!! I have an 3'UTR list of sequence obtained from a Zebra Fish embryos gene list (500 cases) and we want to predict miRNAs targeting to each UTR in the mentioned list. I already have used the miRWalk webpage, which permit predict miRNA against a short gene list (no more than 200 cases) but the result was not good as this tool has not been cr…

miRNA miRNA

updated 12.5 years ago • Martin Leonardo

I usually use biomaRt to convert gene ids to symbols. However, this time the ensembl IDs I have (dog) do not match the ensembl ids of biomart dataset "clfamiliaris_gene_ensembl...tried to use the ensembl web portal, the dog dataset is called ROS_Cfam_1.0 there. Looks like my genes do not match the genes from their dataset. My genes look like this: "ENSCAFG00000045440" "ENSCAFG00000000001" "ENS…

biomaRt ensembldb

updated 4.1 years ago • 4214811

I had a question regarding analysis of differential abundance of OTUs using DESeq2. As an example, I have 3 samples with 3 replicates each, and want to compare whether selected...I had a question regarding analysis of differential abundance of OTUs using DESeq2. As an example, I have 3 samples with 3 replicates each, and want to compare whether selected OTUs...nbsp;   &…

deseq2

updated 11.1 years ago • jport

gio 02/07/2009 8.11 A: Miichael Watson; Sean Davis; Steve Lianoglou Oggetto: Help with symbol names mapping between miRecords and BioMart I extracted some VALIDATED miRNAs and *hopefully* I paired them with their respective...VALIDATED genes 3utr sequence. I am NOT sure about my mapping between BioMart and miRecords objects name. Clearly the output of my algorithm...ensembl_gene_id','exter…

GO Homo sapiens biomaRt GO Homo sapiens biomaRt

updated 16.5 years ago • mauede@alice.it

I find that I have various uses for a function that generates "gene model" GRanges on the basis of a symbol.  Even more detailed gene model computations are performed in Gviz. Do other...that should be available?  Should the GenomicFeatures package include a function of this nature? > genemodel function (sym, genome = "hg19", annoResource = Homo.sapiens,  &a…

granges gene models GenomicFeatures

updated 10.4 years ago • Vincent J. Carey, Jr.

pathway hsa01100, however on trying to plot this with pathview it does not seem to highlight the genes. I found a similar post previously but its not exactly what I wanted since I'm interested in the actual gene --> perhaps...to enzyme showing up on the map. https://support.bioconductor.org/p/54410/ Is this possible? Perhaps converting my 50 or so genes...to enzyme first? Here is …

kegg pathway pathview

updated 6.9 years ago • Ahdee

warning Warning message: The vector of geneIds used to create the GOHyperGParamsobject was not a named vector. If you want to know the probesets that contributed to this result either use a named vector for gene Ids, or pass...a vector of probeset IDs via sigProbesets. And then most GO terms go missing in the result. What do I do in such a case? </div

GO GO

updated 12.2 years ago • Sandy

everyone, I am trying to generate the RPKM values using the rpkm() function. But it is asking for gene lengths. How can I get the gene lengths of mm10 ? Apoorva

rpkm gene length count matrix edgeR rsubread

updated 10.2 years ago • AB

we want to build receiver operating characteristics for patients from two separate cohorts (using gene expression from sequencing as predictor), and I was thinking about how to compare them in this context. I did vst() (`varianceStablilizingTransformation...glm_response_scores <- predict(fit_glm, test_data, type="response") I chose the most highly DE gene in both datasets as the first e…

deseq2 pROC ROCR glm

updated 6.6 years ago • sebastian.lobentanzer

kg.mouse = kegg.gsets("mouse")</pre>   I get the following error:   <pre> Error in names(kg.sets) = paste(species, ks.names, sep = "") :    'names' attribute [1] must be the same length as the vector [0] </pre> or   <pre

gage pathview

updated 10.3 years ago • martin.hoelzer

not running the same code for a month or so, I am getting a new error. I am now getting NAs as probe names for datasets using getGEO(). It seems that datasets obtained with platform GPL4372 are having issues but platform GPL2700...issue. ```r #This is returning NAs gset <- getGEO("GSE33000", GSEMatrix =TRUE, AnnotGPL=TRUE) if (length(gset) > 1) idx <- grep("GPL4372", attr(gs…

GEOquery

updated 4.0 years ago • John

I am using getBM to get start/end positions for my genes, however biomart ensembl hgnz\_names only contain 19869 out of my 29581 genes. In my dataset, gene names are HUGO, too. It...seems that around 10,000 of my genes are not included in the total of 36,713 genes in biomart ensembl. I checked to see if there is a potential naming difference

biomart ensemblbiomart

updated 8.3 years ago • spr

<div class="preformatted">Bioconductor, I am working on an RNA sequencing experiment on Arabidopsis (Illumina, 100bp, single end), using tophat to map the reads, and R for the rest. Many (9 of 12...div class="preformatted">Bioconductor, I am working on an RNA sequencing experiment on Arabidopsis (Illumina, 100bp, single end), using tophat to map the reads, and R for the rest. Many …

Sequencing convert Sequencing convert

updated 12.6 years ago • Sam McInturf

Goseq doesn't support it so I perform the GO Analysis 'by hand' like this: >de\_data=read.table("gene\_exp.diff", header=T) >diff\_express=de\_data$dea >names(diff\_express)=de\_data$gene >length=de\_data$length >cat\_map...read.csv("godata.txt", header=FALSE, sep="\\t") >pwf=nullp(diff\_express, bias.data=length, plot.fit=TRUE) &gt…

r goseq go

updated 9.5 years ago • tobal.92

div class="preformatted">Hi, I wanted a table that provides a mapping from Entrez Ids to Gene Symbols for homo sapiens. To do this, I used the following code: ---------------------------------------------------------------------- -------------------------------------------- library(org.Hs.eg.db) x <- org.Hs.egSYMBOL # Get the gene...symbol that are mapped to an entrez gene ide…

Cancer Homo sapiens convert Cancer Homo sapiens convert

updated 16.9 years ago • Tim Smith

for your post. To answer your questions: #1: The UPC probability represents the "probability the gene is expressed at a level above the background." So it really depends on how confident you want to be. If being 50% confident...that the gene is active in the sample (i.e. the gene is most likely expressed) is good enough confidence for you, then 50% is fine. If you are...okay with "the gene might…

hgu133a2 safe SCAN.UPC hgu133a2 safe SCAN.UPC

updated 13.2 years ago • W. Evan Johnson

of affy vignete and ?ReadAffy) indicates that I need to set up a character array with both the file-names and the desired sample- names. This is what I have used (limited data-set for testing): > samples.files <- c( "RAE230A_043003_IM01T_LH.CEL...ignore.case, extended) : invalid argument In addition: Warning message: sampleNames not same length as filenames. Using …

affy affy

updated 22.5 years ago • Paul Boutros

Hello! I am analyzing a proteomic dataset of Solanum lycopersicum that is organized as following: Treated vs Control - in a time series: TP1; TP2; TP3... On this dataset I need to perform a GeneOntology enrichment analysis in R, and I was planning to do it with clusterProfiler. I want to perform both ORA, (enrichment analysis on over represented tomato proteins in treated vs control on eac…

clusterProfiler AnnotationHub GO uniprot

updated 4.9 years ago • Alberto