Exercise: The dictionary-based approach to named entity recognition

In this practical, you will be using a newly developed tool EXTRACT, which shares many parts with the better known tool Reflect. Unlike ABNER, these tools identify named entities by matching a large dictionary against the text.

For this exercise, please install the pre-release version of EXTRACT2 into your web browser’s bookmark bar. If you have problems, e.g. enabling the bookmark bar in your browser, please check the FAQ.

Open in your web browser both the abstract and the full-text version of the article “Novel ZEB2-BCL11B Fusion Gene Identified by RNA-Sequencing in Acute Myeloid Leukemia with t(2;14)(q22;q32)”. Run EXTRACT on both and inspect the results. You can find more details by either click on an annotation or by selection a text region and clicking the bookmarklet again.


  • Does EXTRACT distinguish between genes and proteins?
  • If so, how can it tell when a name to a gene and when it refers to its protein product?
  • Can EXTRACT identify which gene/protein a given name refers to?
  • Does it identify any other named entities than genes and proteins in this abstract?
  • Which, if any, additional types of named entities do EXTRACT find in the full-text article?
  • Where in the full-text article do you find most cases of wrong EXTRACT annotations?

1 thought on “Exercise: The dictionary-based approach to named entity recognition

  1. Pingback: Exercise: Web services | Buried Treasure

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s