Hexis API (1.0.0)

Introduction

Hexis API is a RESTful API that provides access to language analysis tools built at our lab. Currently, this includes a powerful classification model for automatic moderation of text messages.

Using the API

Getting Started

If you have not yet created a user account, you can sign up here. Once you have verified your email address, you are unlocked to use the API and can create an access token.

The API is straightforward to use and follows common standards of data exchange. You can use the programming language of your choice to to interact with the API. For testing, a tool like Postman, Insomnia, or cURL on the command line, can also be useful.

To get started, this documentation provides code examples in Javascript, Python, PHP, and Unix shell. They illustrate how to post requests to and receive responses from the API endpoint.

API Endpoint

This API uses a base path of https://api.hexis.ai. It is only available via a SSL-secured HTTP connection.

Exchange Format

The HTTP content type is application/json and the payloads exchanged between a client and the API endpoint are valid JSON objects. Things to note in the text content: Double quotes (") are escaped (\") and line breaks are replaced with |LBR|.

Classification Score

A score is an indication of probability, not severity. Higher numbers represent a higher likelihood that the patterns in the text resemble patterns in comments that people — laypeople as well as subject-matter experts — have tagged as offensive. Scores are intended to let developers pick thresholds after which to automatically accept, review or reject text messages.

Rate Limiting

The API allows for a maximum of 10 requests per second by default. Once this limit is reached, it will start returning errors with an HTTP status code of 429. The maximum request body size is 10 kilobytes. Requests larger than this trigger a status code of 413. Please note that the maximum input size for the classification models is 120 words. Any request larger than this will be split into 120-word sized parts and counted seperately.

Demo Endpoint

There is a demo version for each model which is free and anonymous to use. It can be used to test API integrations or for anything else, as long as there is a low volume of requests. The number of requests is limited to 20 requests every 1 hour. To switch to the demo version prepend /demo to any of the paths documented below, e.g. /demo/mod-x/en/v1. Users of the demo version agree to the collection of anyonymous usage information in order to improve our services.

Customizing the API

In case you want to build a custom model with us, this is the specification for the training data.

Annotation Scheme

Currently two types of models are implemented: Binary Classification, where training data are labeled as either OFFENSE or OTHER, and Multi-Task Classification, where data are labeled as PROFANITY, INSULT, or ABUSE in the case of offensive language and OTHER in the case of harmless texts. In the multi-task setting it is also possible to submit other categories — e.g. data marked as SPAM —, which are then included in the same classification model.

File Format

The data should be placed in a UTF-8 encoded plain text file. One training example per line, in the following format:

<TEXT> tab <LABEL-BINARY> tab <LABEL-MULTITASK>

Line breaks in the text are replaced with |LBR|.

Authentication

Bearer

Security Scheme Type HTTP
HTTP Authorization Scheme bearer
Bearer format "JWT"

Binary Classification

The binary classification model is the easiest and under some circumstances most effective solution to filter abusive language. It combines the data categories PROFANITY, INSULT, and ABUSE into one positive class OFFENSE which is contrasted with the negative class OTHER. There is not much to tune but also not much interpret, thus streamlining your moderation automation efforts. The model is reachable via the path /mod-1.

Authorizations:
path Parameters
language
required
string

Language code. Possible values are en, de

version
required
string

API version. Possible values are v1

Request Body schema: application/json
text
required
string

Input text

Responses

200

OK

Response Schema: application/json
scores
object

List of scores

400

Bad Request

401

Unauthorized

413

Request Entity Too Large

429

Too Many Requests

post/mod-1/{language}/{version}
/mod-1/{language}/{version}

Request samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text": "string"
}

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "scores":
    [
    ]
}

Multi-Task Classification

The multi-task classification model allows for the creation of fine-grained filtering systems. It includes all data categories from the offensive language detection task (PROFANITY, INSULT, and ABUSE) and is jointly trained for spam detection (SPAM). Both tasks are contrasted with the negative class OTHER. The multi-task model benefits from the interaction of the tasks involved, leading to richer and more robust internal representations. This allows for a fine-grained classification process and different classification thresholds, specifically tuned for your use case. The model is reachable via the path /mod-x.

Authorizations:
path Parameters
language
required
string

Language code. Possible values are en, de

version
required
string

API version. Possible values are v1

Request Body schema: application/json
text
required
string

Input text

Responses

200

OK

Response Schema: application/json
scores
object

List of scores

400

Bad Request

401

Unauthorized

413

Request Entity Too Large

429

Too Many Requests

post/mod-x/{language}/{version}
/mod-x/{language}/{version}

Request samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "text": "string"
}

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "scores":
    [
    ]
}