The source file associated with this post can be downloaded here. The last time I talked about how to read a GOA gene_associations file into a Python dictionary data structure. Our goal was to find all genes that are annotated as hydrolases in the GOA gene_associations file. The tricky part is, most enzymes are not […]
After hearing Jonathan Eisen and Nikos Kyripdes talk about GEBA in various meetings, it is great to see the paper finally come out, and under a CC license too. Good move for everyone. GEBA is the Genomic Encyclopedia of Bacteria and Archaea. The idea is simple: we have >1000 prokaryotic genomes in GenBank as of […]
Sequencing centers keep pumping large amounts of sequence data into the omics-sphere (will I get a New Worst omics Word Award for this?) There is no way we can annotate even a small fraction of those experimentally and indeed most annotations are automatic, done bioinformatically. Typically function is inferred by homology: if the protein sequence […]
GOA, the Gene Ontology Annotation, provide Gene Ontology annotation to proteins in UniProt. It also provides GO annotations to several genome projects: Chicken, Arabidopsis, Fly, Human, Mouse, Rat and Cow. Anyone working on any of those genomes, or on UniProt and is interested in annotation, would most likely need to query GOA once in a […]
Warren DeLano passed away suddenly and at a young age at his home Nov 3, 2009. He was the author of PyMol, a very popular molecular visualization program, and a strong advocate of open source software. The family of Warren Lyford DeLano has created a “In Memorium” page and blog. Also, a memorial award is […]
The first bioinformatics meeting I went to was in 1996 at the Nachsholim resort, north of Tel Aviv. I received a fellowship for the duration, and shared a room with the brilliant Golan Yona, then a grad student at the Hebrew University. I was doing biochemistry at the time and knew next to nothing about […]
Glimmer is a program that predicts ORFs in bacterial and archeal genomes. The input is the assembled genome FASTA file, the output are several files of the predictions in different stages. The terminal output file is the .predict file. which looks something like this: >NODE_1_length_38001_cov_935.551880 orf00001 481 362 -2 1.45 orf00002 451 567 +1 0.59 […]
As resident bioinformatician in many places over the years, I got many of requests to help. Anything from a short blast run to a full-fledged collaboration. I love that. I always like learning about new problems, and those requests may blossom into full research collaborations. So yes, drop me an email or step into my […]
First, a short glossary. Homologous genes are descended from a common ancestral gene. There are two types of homology: Orthology is homology due to a speciation event. So if there is a gene A’ in humans and A” in mice, and they are obviously similar in sequence, we infer that they homologous. We usually also […]
CLARIFICATION: the events described here have not happened. Yet. We are a few years into the future. Whole human genomes can be sequenced relatively cheaply and accurately. Direct to Consumer Genomics companies offer true genomic analyses now, not just marker analyses. They BLAST* your sequence against known genotype & disease databases, looking for known genotypic […]
I will try to maintain a weekly poll on BsB, for matters biologick, bioinformatick, generally scientifick or otherick. As in any poll, if read too much into its questions or answers, you should seriously chill. That being said, comments are most welcome. The poll is on the sidebar that’a’way.—> (Scroll a bit down if you […]
Before the 20th century biology was, to a large extent, “Natural History”. It was an observational rather than the experimental science it is considered to be today. At that time, the typical biologist, a natural historian, was going about the (European colonized) world, collecting specimens of new and fossilized species, classifying and recording them for […]