Archive

Posts Tagged ‘stupid’

Real programmers use…

January 3rd, 2010 3 comments

A nice take on the vi / emacs wars

vim rules, emacs drools

Also, real programmers browse the web using the vimperator.

Total waste of time, ep. 1

May 14th, 2009 2 comments

Warning: frivolously geeky and technical  post, which can be best defined as “science methodology esoterica”, and from which you can learn absolutely nothing useful.  If you don’t get what’s going on, then it’s probably for the best, because this is a complete waste of time.

Specific Aim 1:  find the longest word in English composed of the Protein 20-letter alphabet.

Method: I like gawk for quick & dirty text processing:

gawk 'BEGIN {daword="a"} \
/[BbJjOoUuXx]/ {next} \
length($1) > length(daword) {daword=$1} \
END  {print daword}' /usr/share/dict/web2

acetylphenylhydrazine

OK, this kinda sucks. I want a real word in English, not a chemical portmanteau. Let’s see what a top 10 list looks like:

gawk 'BEGIN {for(i=1;i<=10;i++) daword[i]="a"} \
/[BbJjOoUuXx]/ {next} \
{for (i in daword) {if (length($1) > length(daword[i])) {daword[i]=$1;break}}} \
END  {for (i=1;i<=10;i++) print length(daword[i]), daword[i]}' \
/usr/share/dict/web2 | sort -nr

And the result:

21 pentamethylenediamine
21 acetylphenylhydrazine
20 paraphenylenediamine
20 metaphenylenediamine
20 interparenthetically
19 transcendentalistic
19 semiantiministerial
19 platymesaticephalic
19 peripachymeningitis

19 misapprehensiveness

Interparenthetically. How lovely if you do your bioinformatics in Lisp.

Specific Aim 2: Lets BLAST this

Method: NCBI TBLASTN:

>
emb|CAK04910.1| Gene info novel protein similar to vertebrate Hermansky-Pudlak syndrome
3 (HPS3) [Danio rerio]
Length=1041

 GENE ID: 563666 LOC563666 | similar to LOC398456 protein [Danio rerio]

 Score = 30.3 bits (64),  Expect =    22
 Identities = 9/10 (90%), Positives = 10/10 (100%), Gaps = 0/10 (0%)

Query  2    NTERPARENT  11
            NTERPAR+NT
Sbjct  505  NTERPARKNT  514

>
ref|XP_664219.1| Gene info hypothetical protein AN6615.2 [Aspergillus nidulans FGSC A4]
 sp|Q5AYL5.1|SEC16_EMENI  RecName: Full=COPII coat assembly protein sec16; AltName: Full=Protein
transport protein sec16
 gb|EAA58144.1| Gene info hypothetical protein AN6615.2 [Aspergillus nidulans FGSC A4]
Length=1947

 GENE ID: 2870538 AN6615.2 | hypothetical protein [Aspergillus nidulans FGSC A4]
(10 or fewer PubMed links)

 Score = 30.3 bits (64),  Expect =    22
 Identities = 10/13 (76%), Positives = 10/13 (76%), Gaps = 0/13 (0%)

Query  1   INTERPARENTHE  13
           INTE PARE T E
Sbjct  61  INTESPAREETAE  73

>
ref|XP_001707965.1| Gene info hypothetical protein [Giardia lamblia ATCC 50803]
 gb|EDO80291.1| Gene info Hypothetical protein GL50803_14341 [Giardia lamblia ATCC 50803]
Length=247

 GENE ID: 5700874 GL50803_14341 | hypothetical protein
[Giardia lamblia ATCC 50803] (10 or fewer PubMed links)

 Score = 30.3 bits (64),  Expect =    22
 Identities = 9/12 (75%), Positives = 10/12 (83%), Gaps = 0/12 (0%)

Query  4    ERPARENTHETI  15
            ER ARE THE+I
Sbjct  221  EREAREKTHESI  232

>
ref|YP_002191813.1| Gene info conserved hypothetical protein [Streptomyces clavuligerus ATCC
27064]
 gb|EDY50943.1| Gene info conserved hypothetical protein [Streptomyces clavuligerus ATCC
27064]
Length=565

 GENE ID: 6836469 SSCG_04068 | hypothetical protein
[Streptomyces clavuligerus ATCC 27064]

 Score = 29.5 bits (62),  Expect =    39
 Identities = 12/20 (60%), Positives = 13/20 (65%), Gaps = 1/20 (5%)

Query  1    INTERPARENTHETICALLY  20
            I  ERP R +T E I ALLY
Sbjct  219  ITAERPQRTDT-EAIGALLY  237

Interesting, but the e-values are insignificant. PSI-BLAST, BLASTP against metagenomic sequences in CAMERA all came up with zip.

Conclusion: I totally wasted my time doing this, and yours reading this. Therefore, I need more funding to check the other words on the list.

Categories: funny Tags: , ,