This afternoon, an article entitled “Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora” showed up in my RSS reader. It describes a new gold-standard corpus for named entity recognition of Human Phenotype Ontology (HPO). The article also presents results from evaluating three automatic HPO term recognizers, namely NCBO Annotator, OBO Annotator and Bio-LarK CR.
I thought it would be a fun challenge to see how good an HPO tagger I could produce in one afternoon. Long story short, here is what I did in five hours:
- Downloaded the HPO ontology file and converted it to dictionary files for tagger.
- Generated orthographic variants of term by changing the order of sub terms, converting between Arabic and Roman numerals, and constructing plural forms.
- Used the tagger to match the resulting dictionary against entire Medline to identify frequently occurring matches.
- Constructed a list of stop words by manually inspected all matching strings with more than 25,000 occurrences in PubMed.
- Tagged the gold-standard corpus making use of the dictionary and stop-words list and compared the results to the manual reference annotations.
My tagger produced 1183 annotations on the corpus, 786 of which correspond to the 1933 human annotations (requiring exact coordinate matches and HPO term normalization). This amounts to a precision of 66%, a recall of 41%, and an F1 score of 50%. This places my system right in the middle between NCBO Annotator (precision=54%, recall=39%, F1=45%) and the best performing system Bio-LarK CR (65% precision, 49% recall, F1=56%).
Not too shabby for five hours of work — if I may say so myself — and a good reminder of how much can be achieved in very limited time by taking a simple, pragmatic approach.