We are currently releasing the second version of Dexter, documentation may be incorrect. We are working to update it! In the meanwhile you can browse (and play) with the new REST API documentation :)
You can use Dexter in two different ways:
Click on this link for downloading Dexter.
The archive requires around 2 Gigabytes, and contains the
Dexter binary code (dexter.jar
) and the
model used by Dexter for annotating.
The current model is generated from the 04/03/2013 English Wikipedia dump, available here. (we plan to release update models for English and other languages).
Once the download is finished, untar the package, and from the directory dexter
, just run
java -Xmx3000m -jar dexter.jar
(you will need at least 3G of ram and Java 7). The framework should be available in few seconds at the address:
http://localhost:8080/
First query will take a bit because Dexter will have to load all the model in main memory.
Currently Dexter supports these functions:
Method | Params | Description | Example | |
---|---|---|---|---|
rest/annotate | text the text to annotate n (optional, default=5) the maximum number of entities to annotate | Performs the entity linking on a given text, annotating maximum n entities | example | |
rest/get-desc | id the wiki-id of an entity title-only (optional, default=false), if set to true returns only the label of the entity | Given the Wiki-id of an entity, returns the label and a snippet containing some sentences that describe the entity (the snippet is retrieved from the lucene index if present, otherwise calling the Wikipedia API) | example | |
rest/spot | text the text to spot | It only performs the first step of the entity linking process, i.e., find all the mentions that could refer to an entity | example | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the entities whose the correspondent Wikipedia page contains a link to the given entity. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Maradona | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the entities whose the correspondent Wikipedia page contains is linked by the given entity. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Maradona | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the categories of the given entity. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Maradona | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the entities belonging to the given category. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Category 1982 FIFA World Cup players | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the parent categories for the given category. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Category 1982 FIFA World Cup players | |
rest/graph/ | wid the id of the entities asWikiNames=true|false (optional) | Returns all the child categories for the given category. By default it returns the wikiIds, if asWikiNames is set to true, returns the titles of the pages | example for Category 1982 FIFA World Cup |
Download the dexter source code:
git clone https://github.com/diegoceccarelli/dexter cd dexter git submodule init git submodule update
the project is built using Maven so in order to compile it you will have to go in the main folder of the project (dexter) and run the command:
mvn install
Once you performed the installation, you will have to add to your maven project the dependency:
<dependency> <groupId>it.cnr.isti.hpc</groupId> <artifactId>dexter-webapp</artifactId> <version>1.0.0</version> </dependency>
Then will be able to call the REST api from your have project using the DexterRestClient as in the following example:
DexterRestClient client = new DexterRestClient( "http://dexterdemo.isti.cnr.it:8080/rest"); AnnotatedDocument ad = client .annotate("Dexter is an American television drama series which debuted on Showtime on October 1, 2006. The series centers on Dexter Morgan (Michael C. Hall), a blood spatter pattern analyst for the fictional Miami Metro Police Department (based on the real life Miami-Dade Police Department) who also leads a secret life as a serial killer. Set in Miami, the show's first season was largely based on the novel Darkly Dreaming Dexter, the first of the Dexter series novels by Jeff Lindsay. It was adapted for television by screenwriter James Manos, Jr., who wrote the first episode. "); System.out.println(ad); SpottedDocument sd = client .spot("Dexter is an American television drama series which debuted on Showtime on October 1, 2006. The series centers on Dexter Morgan (Michael C. Hall), a blood spatter pattern analyst for the fictional Miami Metro Police Department (based on the real life Miami-Dade Police Department) who also leads a secret life as a serial killer. Set in Miami, the show's first season was largely based on the novel Darkly Dreaming Dexter, the first of the Dexter series novels by Jeff Lindsay. It was adapted for television by screenwriter James Manos, Jr., who wrote the first episode. "); System.out.println(sd); ArticleDescription desc = client.getDesc(5981816); System.out.println(desc);
If you downloaded the framework and you started it on your machine you can also call your service changing the server url:
DexterRestClient client = new DexterRestClient( "http://localhost:8080/rest");
You can install the java project checking out it from github:
git clone https://github.com/diegoceccarelli/dexter cd dexter git submodule init git submodule update
the project is built using Maven so in order to compile it you will have to go in the main folder of the project (dexter) and run the command:
mvn install
The compilation should terminate with no errors.
You will still need the model 'data' folder provided in the dexter.tar,
you can put where it where you want, but you will have to indicate its position in the files project.properties
contained in the subfolders dexter-code
and dexter-webapp
.
Dexter is organized in several submodules, in the following we will briefly describe them:
(see the javadoc)
Json Wikipedia contains code to convert the Wikipedia XML dump in a [JSON][json] dump.
java target/json-wikipedia-1.0.0-jar-with-dependencies.jar it.cnr.isti.hpc.wikipedia.cli.MediawikiToJsonCLI -input wikipedia-dump.xml.bz -output wikipedia-dump.json[.gz] -lang [en|it]
or
./scripts/convert-xml-dump-to-json.sh [en|it] wikipedia-dump.xml.bz wikipedia-dump.json[.gz]
produces in `wikipedia-dump.json` the JSON version of the dump. Each line of the file contains an article of dump encoded in JSON. Each JSON line can be deserialized in an Article object, which represents an _enriched_ version of the wikitext page. The Article object contains:
(see the javadoc)
The core implements the pipeline for generating the entity linking model from a wikipedia dump. It also provides all the tools needed to write an entity linking method.
The most important objects to understand are:
Spot
object, which represents a mention of one or more candidate entities; Entity
object, which represents an entity;SpotMatch
object, which represents a particular mention in a given text, EntityMatch
which represents a particular match of an entity in a document. It defines also some important interfaces for performing the linking:
spot
that given a text returns a list of SpotMatches
It is possible to write new Spotters or Disambiguators and use them in dexter, putting the jars
in the folder dexter/libs
, or inside the folder dexter-webapp/src/main/webapp/WEBINF/lib
, and then
selecting them from the project.properties file, e.g.,
disambiguator.class=it.cnr.isti.hpc.wikiminer.Wikiminer
By default, Dexter ships with one spotter (based on the dictionary of the anchors in Wikipedia) and one Disambiguator, implementing the Okkam's Razor principle, resolving the ambiguity for a spot using the entity with the largest probability to be represented by the spot (this probability is called commonness and it is computed as the ratio between the links that point to the entity (using the spot as anchor) and the total number of links that have the spot as anchor.
TODO
(see the javadoc)
Finally you will able to start the web-app with the interface and rest api, going into the folder dexter-webapp
and running
mvn jetty:run -DskipTests
TODO
(see the javadoc)
TODO