Bioconductor Forum

Hello, The data is related to [my previous post][1]. We decided to remove 3 genes from the sample count matrix as they were also present in negative controls in very high count. When running ```DESeq(dds...size factors Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means ``` It is worth no…

RNASeq DESeq2

updated 2.1 years ago • Karthik

Hi! I'm using tximport to summarize transcript-level abundance to get gene-level abundance. I didn't specify a value to 'abundanceCol', and I got an abundance matrix in the result. I

tximportData

updated 5 months ago • Jinghua

gives the error: Error in library(pdn, character.only = TRUE) : 'package' must be of length 1 Also, when printing rawData gives this error: ExpressionFeatureSet (storageMode: lockedEnvironment) assayData...48803 features, 13 samples element names: exprs protocolData: none phenoData: none featureData: none experimentData: use 'experimentDat…

oligo biobase GEOquery

updated 6.2 years ago • salamandra

conversions. 2. this solution was the fastest one in my case: writeFASTA <- function(dna, desc=names(dna), file=stdout()) { if (is.null(desc)) desc <- paste(seq(along=dna)) fasta = character(2 * length(dna)) fasta[c(TRUE, FALSE)] <- paste("&gt...dna) writeLines(fasta, file) } The downside of this function is it does not wrap long sequence lines. So I came…

Cancer BSgenome PROcess GLAD BSgenome Cancer BSgenome PROcess GLAD BSgenome

updated 15.5 years ago • Michael Dondrup

statistics or bioinformatics__. All three positions are related to projects applying single-cell sequencing techniques to questions of human biology, namely vascular biology and haematological cancer, for which we need

single-cell job computational biology Job

updated 7.1 years ago • Simon Anders

since i didn't find satisfactory response on the searching the mailing lists. I want mismatch sequences for a chip say human (95/133). Is there a nifty function which can be easily applied on the entire chip and outputs mismatch...sequences in the same format as PM sequences! Any pointers/ideas?! Thanks, Hrishi</div

updated 20.9 years ago • hrishikesh deshmukh

After finishing the DESeq2 analysis I generated a volcano plot and saw that almost all the genes with the lowest padj actually have very low log2foldChange values. All replicates in the DPS condition have zero counts...for those genes while DVM replicates mostly show quite high values. I would think that comparing between zeros in the DPS condition

DESeq2

updated 4.3 years ago • RitaB

<div class="preformatted">Hello All, Problem: I would like to obtain the genomic sequence that is upstream (~500 bp) of a specific bacterial gene. I want to get this sequence for all bacteria genomes that have the gene. On EcoCyc I see that many (> 100) bacteria have the gene but I do not know how to get all of the sequence in a high-throughput manner so I was going to use biomaR…

Alignment biomaRt genomes Alignment biomaRt genomes

updated 15.1 years ago • Noah Dowell

mskInd, 0)), collapse = "") + mskCh <- Biostrings:::.insertSpaces(mskCh) + } + names <- names(ch) + ch <- sapply(ch, Biostrings:::.strChop, chopsize = 55, simplify = FALSE) + if (hasMask) { + mskCh <- .strChop(mskCh, chopsize...55) + ch <- c(list(Mask = mskCh), ch) + } + …

Biostrings

updated 3.9 years ago • Charles Plessy

object (a bunch of them, in a list), and I'd like to use alphabetFrequency on their RepeatMasked sequences. I've found an inefficient way to do this, but I wondered (a) whether I'm missing a better way to do it, or (b) whether it'd be...possible for you to implement some version of Views that returns masked sequences, rather than dropping the masks My inefficient (in memory) version is to create…

updated 10.2 years ago • Janet Young

div class="preformatted"> Hello, I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous...Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how? Anyway examples of a s…

updated 16.0 years ago • Ruppert Valentino

based mode (using the –eB arguments, bypassing the merging and assembling steps, simply quantifying genes found in the reference). We then supply to prepDE.py the abundances output from this to get counts. Over a series of analyses...etc. I have consistently found the results to show very few, if any, differential expressed genes, and those that I do identify are of very small magnitude fold-cha…

prepDE STRINGTIE deseq2

updated 7.3 years ago • Nancy

genome. It seems like bigwig saved with the official NCBI/Ensembl/flybase mitochorion chromosome name dmel_mitochondrion_genome breaks the importation of a previously saved files (which does not break when loaded in...nt reads along the chromosomes in either orientation density <- 0.05 nreads <- sapply(chr.info$length, function(length) round(rnorm(1,length*density,(length*density)*0…

Coverage Annotation rtracklayer Coverage Annotation rtracklayer

updated 11.6 years ago • Marco Blanchette

function from GenomicFeatures package for hg19 reference genome, and no matter which gene database I use, I always get the following error: <pre> hg19.refseq.db <- makeTxDbFromUCSC(genome="hg19", table="knownGene...https://genome.ucsc.edu/cgi-bin/&quot">https://genome.ucsc.edu/cgi-bin/&quot</a>;) Error in names(trackIds) <- sub("^ ", "", nms[nms != …

software error genomicfeatures

updated 7.7 years ago • rmendez

Enter the body of text here Code should be placed in three backticks as shown below ```r features <- binGenome( genomeTxDb ) 64 genes were dropped because they have exons located on both strands of the same reference sequence or on more than one reference sequence, so cannot be represented by a single genomic range. Use 'single.strand.genes.only=FALSE' to get all the g…

ASpli

updated 21 months ago • Sebashish

using the getURL method of the annaffy package: The method works fine when my aafGO object is of length > 1: > ann.affy.go[[4]] An object of class "aafGO" [[1]] An object of class "aafGOItem" @id "GO:0000139" @name "Golgi membrane" @type "Cellular...Component" @evid "IEA" [[2]] An object of class "aafGOItem" @id "GO:0003827" @name "alpha-1,3-mannosylglycoprotein 2-beta-N-acet…

annaffy annaffy

updated 18.0 years ago • Ana Rodrigues

Hello, I have been using txImport to add up all the transcripts to get one TPM value per gene. When I add up genes manually in excel, it doesn't always match tximport's TPM value. Why is tximport TPM's sometimes less than...file.path("C:", "Users", "cathe", "Desktop", "GSM3106294_MEF_1_quant2.sf.txt", fsep="/") #get gene ID's and TX names library(TxDb.Mmusculus.UCSC.mm10.knownGene) txdb &l…

tximport R

updated 3.7 years ago • cthangav

problem with the array. But later I was corrected by the biologists that it was expected as many genes are not expressed in blood macrophages. Thus most of the 95 % Absent were due to not expressed genes ... Apparently this is...question is how does one normalize this kind of data ? The assumption in two-colour cDNA data of "most of the genes are not differently expressed" does not hold here. Med…

Normalization Normalization

updated 22.6 years ago • Adaikalavan Ramasamy

Dear all, I have been using DESeq2 for RNA sequencing-based gene expression analysis, and recently trying to use ARACNe for regulatory network inference. Are rlog...as the input for ARACNe analysis? Considering that rlog does not normalize expression values by gene lengths, should I simply use RPKM/FPKM values for that kind of analysis? Thanks, Tom

deseq2

updated 6.4 years ago • Tom

I am trying to find ORFs in a file of fasta sequences. I get the error " Iter Models Start Motif Fold Init UpsNt Term RBS Auto Stop Genes 1 1Error in lenScr[w] <- (lenScr[w[1L...1] - lenScr[w[1L] - 2]) * seq_along(w) + : replacement has length zero" Code should be placed in three backticks as shown below ```r > S_wt_orfs <- FindGenes(S_wt_fa…

ORF DECIPHER ORFhunteR

updated 4.6 years ago • tkirkland

331 332 333 334 335 336 337 338 339 340 341 transcripts missing from tx2gene: 36704 summarizing abundance summarizing counts summarizing length Error: all(names(aveLengthSampGene) == rownames(lengthMat)) is not TRUE In...3: In rowsum.default(abundanceMatTx * lengthMatTx, geneId) : missing values for 'group' 4: In names(aveLengthSampGene) == rownames(lengthMat) : longer object length…

tximport salmon R bioconductor

updated 8.5 years ago • macmanes

for other lab members and for publication. We have deposited our RNA-Seq reads, as obtained from the sequencing machine, directly in GEO. Most of the tutorials I've found start from a table with counts and explain differential...the very beginning of reading the data from GEO/SRA reads and processing them to get to gene counts? Ideally using entirely Bioconductor packages 2. If not, …

Bioconductor Workflow

updated 4.0 years ago • sarastew1994

HGU133PLUS2 package (which corresponds to my data). While some probesets are still associated with genes in NetAffx (online and when I download database) and in hgu133plus2.db, I can't see them associated with gene names. &nbsp...For instance, I can use two methods to get gene names: <pre> biocLite(hgu133plus2.db) biocLite(annotate) r=rownames(df_rma) head(r) [1] "1053_at"&am…

Annotation hgu133plus2

updated 8.9 years ago • benoit.tessoulin

<div class="preformatted">Hello bioC users, as you can see below, this was posted over a year ago. Unfortunately I tried the same today and for some mysterious it is not working correctly any more. What I have is the same data.frame: > dat id flybasename_gene flybase_gene_id entrezgene 1 1616608_a_at Gpdh FBgn0001128 33824 2 1622892_s_at CG3…

GO GO

updated 13.4 years ago • Assa Yeroslaviz

the bsseq package' after going through the protocol with the example. I realise this may not be the most suitable for my data as I have used a targeted approach but I thought the general procedure would be very similar.&nbsp...pre> Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': chrX - in 'y': chrMT …

methylation DMR analysis

updated 10.8 years ago • parker

RNA-seq data from TCGA using edgeR. The results of differential expression analysis has NAs under Gene names and Gene symbols. The EntrezID corresponding to it doesn't give a valid Gene name. What could be wrong? The following...command was run for annotating the gene expression data with Entrez ID. ``` > gnsOXP <- select(org.Hs.eg.db, keys=rownames(matrix_OXP),columns=c("SYMBO…

edger differential expression TCGA Gene ID

updated 5.8 years ago • fawazfebin

Dear goseq developers：       Hello ,my name is Xu,ZhengZheng ,a student of Beijing Institute of Gennomics ,Chinese Academics of Science. After I use edgeR to do difference...Gene expression ,I happy to known that I can use R package(goseq) to do GO analysis, thanks for your good job for this package...Question2:when I use goseq(pwf,"hg38","refGene"),i…

goseq

updated 10.4 years ago • xdzperfect

Hi, I was wondering I'm going about this in the correct way. I need to test if there are coding sequences or exons in hg19 which match a string of 100bp "D" i.e. [A,G or T]. However I'm getting a strange result. I get a hit on chr7, using...the 100bp search however when I search with 60bp sequence of "D" I don't get any hits. library("BSgenome") library("Biostrings") library("BSgenome.Hsapiens.…

Alignment BSgenome BSgenome Alignment BSgenome BSgenome

updated 14.8 years ago • Amos Folarin

I have a big data frame dim(df)= 29 20664. It contains mutation and copy number variation data for genes. The column names are actually gene names. The column names currently are gene1, gene2 .... gene20664. __How can I&nbsp...change the column names from 1 to 19862 as gene1\_mut, gene2\_mut ... gene19862\_mut __ __and from 19863 to 20664 as gene19863\_CNV, gene19864..._CN…

annotation

updated 10.1 years ago • qurrat.ulain

has the NM_transcript identifiers, they don't appear to have unique delimiters for the corresponding gene ID's that I need to create the 2 column data.frame. Therefore, I ran the following from the EquCab3.0 gtf file: library(rtracklayer...LOC111772506 For reference, each of my 10 quant.sf files look like this: head quant.sf Name Length EffectiveLength TPM NumReads NM_001…

Salmon ignoreTxVersion tximport

updated 4.4 years ago • robeaumont

div class="preformatted">Dear Steve, The genes that contribute most to the roast result are the same genes that are ranked at the top in a standard genewise DE analysis...exact equivalent for romer, but obviously looking at the genewise DE results will still reveal the most important genes. Best wishes Gordon > Stephen Hoang stephen.a.hoang at gmail.com > Fri Jan 31 21:58:26 C…

updated 12.0 years ago • Gordon Smyth

<div class="preformatted">Hi All, For a while I have been using these lines below to normalize a large Affy data set with customCDFs and gcrma. I can't seem to make it work anymore. For some reason, in the affinity adjustment of gcrma, there is a switch to the affy CDF which gives an error later. Its a bit puzzling. I am confident the cdf name is correct and I even updated the custom CDFs …

cdf affy gcrma cdf affy gcrma

updated 14.8 years ago • Yair Benita

from a common ancestor about 50 million years ago. We added a line of code that takes into account gene length differences between species. Since DESeq2 was not initially intended for such analyses, we wonder about some...general considerations - 1. Is there an assumption in DESeq2 that most gene expression is not different in the two samples? Can this be overcome if the mean and variance of …

deseq2

updated 5.5 years ago • adi.saxena

the same query has worked with a smaller number of ENST...many times for me too. But if the length of the filtering vector is really the problem then I feel confused. Some months ago I was extracting one UTR sequence...opened for several hours. At that time biomaRt was cutting me off after some time (of variable length). I posted my question to Bioconductor asking why biomaRt was kicking…

biomaRt biomaRt

updated 15.6 years ago • mauede@alice.it

div class="preformatted">Hi, I am trying to feed the mouse Gene 1.0 ST array into our R pipeline (affy -> gcrma->limma). At the first I got an error during the gcrma step: Adjusting for optical...for optical effect....Done. Computing affinitiesLoading required package: AnnotationDbi Error: length(prlen) == 1 is not TRUE I google on the error and it seem there is some problem …

probe gcrma probe gcrma

updated 14.3 years ago • Yupu Liang

forum is more appropriate. I have a 16S dataset from gut mucosa and want to analyse differential abundance according to a factor. I have 200 samples: 10 cases and 190 controls. Q1: is it valid to use DESeq2 to compare differential...abundance with DESeq2 with such an a large imbalance between the numbers of cases and control? I know that Deseq2 is designed

deseq2

updated 6.6 years ago • dr.aj.scott

we can do is to add another parameter to featureCounts to let the function reduce the read to its 5' most base or its 3' most base, before carrying out read counting. With this parameter, you can get the start or end position of a...read and then use 'readExtension5' and 'readExtension3' to extend the read to an arbitrary length (the readExtension options will be applied to the single base p…

SubRead featureCounts

updated 3.2 years ago • Leon

<div class="preformatted">Hi, I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR) for a gene, alongwith the start and end positions for each exon. My short script is: ========= library(biomaRt) ## Example gene: MTOR; ensembl id "ENSG00000198793" mySequence <- getSequence(id="ENSG00000198793",type="ensembl_gene_id", seqType="gene_exon",mart=ensembl) gb <…

updated 12.0 years ago • Tim Smith

releases of databases. danRer6" is available and is not working for me. I am using GOSeq for gene ontology analyses and had the error as below:   <pre> Can't find danRer10/ensGene length data in genLenDataBase... Loading...rtracklayer Trying to download from UCSC. This might take a couple of minutes. Error in getlength(names(DEgenes), genome, id) : The gene names specifie…

genelendatabase goseq

updated 9.9 years ago • Mehmet Ilyas Cosacak

Hello all - We've recently been looking at some of the abundant RNASeq data from the NIH's cancer genome atlas to find differentially expressed genes in biological replicates...sample number has been giving low FDRs/q values. We're only interested in a small set (~30) of genes of interest vary between conditions. I was wondering, then, what are the statistical pitfalls of excluding ot…

deseq2 SAMseq differential expression

updated 9.9 years ago • aylj

div class="preformatted"> Hi, Do you have any ideas? I have converted gene names many times, but still does not fit, same problem... -- output of sessionInfo(): Loading hg19 length data... Error in getlength...names(DEgenes), genome, id) : The gene names specified do not match the gene names for genome hg19 and ID knownGene. Gene names given...uc021rmd.1, uc001qvk.1, uc010sgm.2, …

updated 13.3 years ago • Guest User

FASTA files (for which I cannot trace the origin).  When comparing the FASTA index files, the sequence/chromosome names as well as the lengths of the individual sequences agree, although in a different order.  Next...I'd like to compare the actual sequences.  I could read these in using the FaFile class of </span>Rsamtools and do pairwise comparison in me…

Biostrings FASTA sequencing

updated 9.9 years ago • Henrik Bengtsson

I have been studying the papers: I. Rocke and Durbin, (2001), `A model for measurement error for gene expression arrays', J. Comput. Biol, 8, 557--569 and II. Durbin, Hardin, Hawkins, and Rocke, (2002), `A variance- stabilizing transformation...for gene-expression microarray data', Bioinformatics, 18, S105-S110. Their variance stabilization seems to work well for most of...the genes I replicat…

Microarray Microarray

updated 20.7 years ago • E Motakis, Mathematics

pub/geo/DATA/SeriesMatrix/GSE462/GSE462 _series_matrix.txt.gz' > ftp?data?connection?made,?file?length?171874?bytes > ???URL > downloaded?167?Kb > File?stored?at: > C:\DOCUME~1\happy\LOCALS~1\Temp\RtmpCu25Ed/GPL5.soft gse462...is a list of length 1. class(gse462) length(gse462) And each member of the list is an ExpressionSet. class(gse462[[1]]) >&…

Biobase GEOquery Biobase GEOquery

updated 16.1 years ago • Sean Davis

you could build your color scale based on that. Li > Hi, > > I am making DAG graphs for gene ontologies using GOGraph. > I color the nodes (elliptic) with different colors depending on the p-values > that I get...lt;- list() > node_color <- c(rep(NA,length(nodes(basic_graph)))) node_color[1:length(goid)] <- heat.col…

graph graph

updated 19.0 years ago • Li.Long@isb-sib.ch

div class="preformatted">Dear Members, I am trying to download the 3'UTR sequences of all human genes from Ensembl Biomart using the package biomaRt. Ideally, after retrieving I want to save these...in FASTA format. When I am using the code given below to get 3'UTRs of genes in chromosome 1, 2 and 3 (not sure if this is the best way to achieve what I want), I am getting an error: "Error in …

biomaRt biomaRt

updated 13.5 years ago • Karthik K N

Enter the body of text here Code should be placed in three backticks as shown below ```r features <- binGenome( genomeTxDb ) # include your problematic code here with any corresponding output features <- binGenome( genomeTxDb ) 64 genes were dropped because they have exons located on both strands of the same reference sequence or on more than one reference se…

ASpli

updated 21 months ago • Sebashish

line of reasoning is correct.  I'm using DESeq2 to test for log2 fold differences in microbial gene abundances across two habitats sampled using metagenomics. I have reason to believe that the average genome sizes in...the two habitats are different.  Average genome size differences influence differential abundance tests where gene counts have been converted into the ratio of t…

deseq2 normalization metagenomics

updated 10.3 years ago • jessawbryant

I am exploring DEGs using a portion of the TCGA dataset of 151 patients. 7 of which contain a fusion gene. A part of this fusion gene is not normally expressed (in the cancer of interest) and therefore we expect to see it as one of...if not the most deferentially expressed gene. I ran the 7 fusion-gene patients vs the other 144 and was surprised to find that the padj...value for this gene was…

deseq2 tcga

updated 6.2 years ago • ndjayne

mRNA data. I use the function `PlotsPositiveHousekeeping` to check the variation of housekeeping genes (hkg). According to the code of this function (see below), counts for hkg are scaled using a size factor calculated from the...is located # # header: a logical value(TRUE or FALSE) indicating whether the file contains the names of the variables as its first line. # # designs: data…

NanoStringDiff

updated 4.8 years ago • YM

Sorry for the delay. On 01/19/2012 07:33 AM, Yuval Itan wrote: > Dear Herve, > > My name is Yuval, I am a postdoc at the Rockefeller University. I am trying to use Bioconductor for analyzing my RNA-seq data, and...your advice as my R level is a bit basic and I got stuck. I need to count the number of reads per gene and my fastq data was aligned to chromosomes named "1", "2" …

Cancer TranscriptDb convert Cancer TranscriptDb convert

updated 13.9 years ago • Hervé Pagès

Will Rsubread's Subjunc perform correctly on genes, such as KIT, which have variable-length exons? ![KIT Isoform Structure][1] I would like to count the reads supporting the Q...KIT Isoform Structure][2] Similarly, there is a GNNK inclusion or exclusion elsewhere in the gene. [1]: /media/images/6dba4881-4271-43c4-a4b5-a9956147 [2]: /media/images/66c9f975-7da4-48cb-8079-50faa6d5

Rsubread

updated 21 months ago • Dario Strbenac

listInputBam, genomeName="hg19")</code> `` Get UCSC ensGene annotations. `` <code>Error in names(trackIds) <- sub("^ ", "", nms[nms != "new"]) :<br/>   'names' attribute [210] must be the same length as the vector [209]<br/> In addition: Warning...genome(session) <- "hg38"</code> <code>> track_names <- trackNames(sessio…

rtracklayer ucsc ensgene

updated 7.7 years ago • chrisamiller

Dear All, We have been using DESeq2 on our RNA-seq data to look for differential expression of genes and it works well. One issue that keeps on cropping up is the allocation of the EnsemblIDs per row in results(dds), frequently...pre> Obviously this interfers with annotation so have split it by + and annotated both for gene names etc. However with many of them per dataset I wondered how be…

rnaseq deseq2 ensembl

updated 8.7 years ago • Nicholas Owen

Hello, In R, I previously used this piece of code to look up Ensembl IDs for lists of genes beginning with ENSG000... .  In this example, my\_df is a dataframe where the rownames are the gene IDs 9e.g. ENSG...):   <pre...hgnc_symbol[ idx ]</pre>   I'd now like to use this on a dataframe where the input row names are transcript IDs (e.g. ENST000...). I'm no…

rnaseq r biomart

updated 10.2 years ago • kmuench

div class="preformatted">An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070911/ f28ef9ca/attachment.pl</div

updated 18.4 years ago • Bennie Bruno

<div class="preformatted">Hello Everyone, I need to get the most specific common ancestor for several GO terms. I get the ancestors for the GO-terms (using mget(goTerms, GOBPANCESTOR) ) and...div class="preformatted">Hello Everyone, I need to get the most specific common ancestor for several GO terms. I get the ancestors for the GO-terms (using mget(goTerms, GOBPANCESTOR...problem, b…

GO GO

updated 20.3 years ago • Lourdes Peña Castillo

troubleshooting options. Any ideas or help would be much appreciated! ``` ### Generate vector with names of all genes detected in our dataset ALL.vector <- c(filtered_Pverr.annot$gene_id) ### Generate length vector for all...genes LENGTH.vector <- as.integer(filtered_Pverr.annot$length) ### Generate vector with names in just the contrast we are...set names #weight gene v…

goseq

updated 18 months ago • Danielle

<div class="preformatted"> Hi, I am trying to import a bunch of SMD files(64 files) which are from a single mocroarray chip. The following is what I did and what I got: 1. I started R in the directory that has all the .xls files 2. library("marray") 3. read.SMD("18195.xls"), returned: > read.SMD("18195.xls") Generating target sample info from all files Reading ... ./18195.xls An o…

updated 20.0 years ago • Wu, Zhuang

_by default, `` runPCA `` performs PCA on the log-counts using the 500 features with the most variable expression across all cells_. I am wondering how the most variable expression is determined, and how the names...of features (genes) can be extracted. Thanks

scater pca features

updated 7.1 years ago • jws