Hexis API (1.0.0)


Hexis API is a RESTful API that provides access to language analysis tools built at our lab. At the moment this includes a high-performance text classification model for offensive language detection.

Using the API

Getting Started

If you have not yet created a user account, you can sign up here. Once you have verified your email address, you are unlocked to use the API and can create an access token.

The API is straightforward to use and follows common standards of data exchange. You can use the programming language of your choice to to interact with the API. For testing, a tool like Postman, Insomnia, or cURL on the command line, can also be useful.

To get started, this documentation provides code examples in Javascript, Python, PHP, and Unix shell. They illustrate how to post requests to and receive responses from the API endpoint.

API Endpoint

This API uses a base path of https://api.hexis.ai. It is only available via a SSL-secured HTTP connection.

Exchange Format

The HTTP content type is application/json and the payloads exchanged between a client and the API endpoint are valid JSON objects. Things to note in the text content: Double quotes (") are escaped (\") and line breaks are replaced with |LBR|.

Classification Score

The Score is an indication of probability, not severity. Higher numbers represent a higher likelihood that the patterns in the text resemble patterns in comments that people have tagged as offensive. Scores are intended to let developers pick thresholds and automatically accept, review or reject text messages based on those threshold. Although the numbers are not a score of how offensive a particular comment is, the threshold can be set according to the use case at hand.

For a high-recall use case where it is favorable to deal with false positives instead of false negatives, one can choose a point around 0.5 or above. While it is possible to mistakenly flag a message that is only mildly toxic (ideally there is a manual review process in place), this setting will catch the less salient, implicit cases.

On the other hand, there are high-precision use cases where the priority is to automatically filter only definitive cases of offensive language. Here it's safe to choose a point around 0.9 or above.

As a simple heuristic: With manual moderation in place, a sensible threshold value would be 0.5. Alternatively, just the raw classification scores can be used to rank items for review. It is also possible to operate the system without manual moderation. Using a threshold value of 0.9 (and be potentially overblocking in a few cases) or 0.99 (and be potentially underblocking in a few cases).

Rate Limiting

Currently, the API allows for a maximum of 10 requests per second by default. Once this limit is reached, it will start returning errors with an HTTP status code of 429. The maximum request body size is 10 kilobytes. Requests larger than this trigger a status code of 413. Please note that the maximum input size for the classification model is 120 words. Any request larger than this will be split into 120-word sized parts and counted seperately. If service for a given account is temporarily suspended (e.g. no trial credits left and no payment information on file), a status code of 401 will be issued.

In practice, one often does not need to think about the exact timing in between requests, as small bursts of excessive requests are still being processed by the endpoint. The exception to this is bulk processing, where a 100ms pause in between requests should be implemented.

Customizing the API

In case you want to build a custom model with us, this is the specification for the training data.

Annotation Scheme

The annotation scheme largely follows the guidelines of the GermEval Shared Task on the Identification of Offensive Language as described e.g. in the Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language. The primary task is Task 1: Coarse-grained Binary Classification, where postings are labeled as either OFFENSE or OTHER. Optionally, it is possible to also annotate for Task 2: Fine-grained 4-way Classification, where postings are labeled as either PROFANITY, INSULT, ABUSE, or OTHER. We extend the second task by another class, namely THREAT.

File Format

The data should be placed in a UTF-8 encoded plain text file. One training example per line, in the following format:


Line breaks in the text are replaced with |LBR|.



Security Scheme Type HTTP
HTTP Authorization Scheme bearer
Bearer format "JWT"

Offensive Language Detection

path Parameters

Language code. Possible values are en, de


API version. Possible values are v1

Request Body schema: application/json

Input text




Response Schema: application/json

List of scores


Bad Request




Request Entity Too Large


Too Many Requests


Request samples

Content type
Expand all Collapse all
  • "text": "string"

Response samples

Content type
Expand all Collapse all
  • "scores":