In this practical, you will be using a newly developed tool EXTRACT, which shares many parts with the better known tool Reflect. Unlike ABNER, these tools identify named entities by matching a large dictionary against the text.
Open in your web browser both the abstract and the full-text version of the article “Novel ZEB2-BCL11B Fusion Gene Identified by RNA-Sequencing in Acute Myeloid Leukemia with t(2;14)(q22;q32)”. Run EXTRACT on both and inspect the results. You can find more details by either click on an annotation or by selection a text region and clicking the bookmarklet again.
- Does EXTRACT distinguish between genes and proteins?
- If so, how can it tell when a name to a gene and when it refers to its protein product?
- Can EXTRACT identify which gene/protein a given name refers to?
- Does it identify any other named entities than genes and proteins in this abstract?
- Which, if any, additional types of named entities do EXTRACT find in the full-text article?
- Where in the full-text article do you find most cases of wrong EXTRACT annotations?