Exercise: The machine-learning approach to named entity recognition

In this practical, you will be using the well-known tool ABNER, which relies on a statistical machine-learning method called conditional random fields (CRFs) to recognize entities in text.

To install ABNER, simply download abner.jar from the website. To run it, either double-click on the jar file or type java -jar abner.jar on the command line). As ABNER is a Java program, it requires that you have Java installed on your computer. If not, you need to also download the installer and install it. If you use a Mac, you likely need to go to System Preferences and then Security & Privacy to allow ABNER to be run.

Retrieve the title and abstract of the publication “Novel ZEB2-BCL11B Fusion Gene Identified by RNA-Sequencing in Acute Myeloid Leukemia with t(2;14)(q22;q32)” from PubMed. Use ABNER to annotate named entities according to both the NLPBA and BioCreative probabilistic models models.

Questions

  • Do the two models (NLPBA and BioCreative) annotate the same proteins in the text?
  • Does ABNER distinguish between genes and proteins?
  • If so, how can it tell when a name to a gene and when it refers to its protein product?
  • Can ABNER identify which gene/protein a given name refers to?
  • Does ABNER identify any other named entities than genes and proteins in this abstract?

One thought on “Exercise: The machine-learning approach to named entity recognition

  1. Pingback: Exercise: Web services | Buried Treasure

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s