In this practical, you will be using the well-known tool ABNER, which relies on a statistical machine-learning method called conditional random fields (CRFs) to recognize entities in text.
To install ABNER, simply download abner.jar from the website. To run it, either double-click on the jar file or type java -jar abner.jar on the command line). As ABNER is a Java program, it requires that you have Java installed on your computer. If not, you need to also download the installer and install it. If you use a Mac, you likely need to go to System Preferences and then Security & Privacy to allow ABNER to be run.
Retrieve the title and abstract of the publication “Novel ZEB2-BCL11B Fusion Gene Identified by RNA-Sequencing in Acute Myeloid Leukemia with t(2;14)(q22;q32)” from PubMed. Use ABNER to annotate named entities according to both the NLPBA and BioCreative probabilistic models models.
- Do the two models (NLPBA and BioCreative) annotate the same proteins in the text?
- Does ABNER distinguish between genes and proteins?
- If so, how can it tell when a name to a gene and when it refers to its protein product?
- Can ABNER identify which gene/protein a given name refers to?
- Does ABNER identify any other named entities than genes and proteins in this abstract?