Semantria is an asynchronous API. This means:
- You submit content to us and retrieve content separately.
- You can scale your content submission rates as you are not waiting on us to hand data back before you can submit more.
- You may receive data back in a different order than you submitted it. Batches of content are not necessarily preserved
- If you use the callback retrieval mechanism, the batches remain the same, but the order might be different
- If you use auto response or polling, the batch membership may also change
- If you have multiple machines sending and receiving content, one machine may receive a processed document that was submitted by another.
- Every piece of content is processed by a Semantria configuration. If you don't specify one, your default configuration will be used.
There are several types of failures during the submission and processing of content.
- The submission is itself invalid in some way such as invalid JSON. In this case no documents are queued, and no API credits are deducted. You need to correct the errors and resubmit. You will know the submission is invalid if you receive anything other than a 200-series HTTP status response.
- The submission is valid but the content itself is failed. In this case you will receive the document back, with a FAILED status and an error message stating why it was failed. Credits are deducted for this. In this case, you should not resubmit the piece of content that was failed, as it will simply fail again. The most common cases of document failure are submitting content to the wrong language (sending Arabic content to an English config for instance) and content that does not have enough text to analyze (such as ASCII art and the like)
Because order and batch are not preserved on the Semantria side, it is up to you to keep track of what you submitted and received. There are several ways for you to identify your content.
- Each document can have a unique id associated with it. This is returned to you by Semantria when you receive the processed data. You can use this id to update the status on your side. Additionally, you can request the status of a document via its id.
- Each document can also have a tag field. This is a string field you can fill in with additional information you might use to keep track documents, such as a project ID. You can check on your side to see that you submitted 1,000 documents for tag "my_project" and received 1,000 documents back with that tag. You cannot request status on a tag.
- You can submit and retrieve by job_id. This is a string value you can set when you submit and retrieve documents. it is intended to allow you to separate out processing streams of content for routing or failover purposes on your side, not as a unique ID per batch of content.
- If you submit by a job_id, you must retrieve via that same job_id. Retrieving by config_id will not retrieve documents submitted with a job_id. This does not prevent you from setting the config_id when submitting with a job_id, you just cannot retrieve by that config_id.
- The total number of unique job_ids you use during a 24hr period must not exceed 100.
Duplicate Document ID
It is possible to send two documents with the same document id, as long as you send them to different configurations. If you send two documents with the same document id to the same configuration, the latter document sent will overwrite the former. Data loss may occur.
If a user tries to process a document or analysis that has already been sent and processed but has not yet been retrieved, the server will override the previous analysis, change its status to QUEUED, and process the newer document.
There are four types of processing a document or group of documents in a Semantria analysis.
Queue: submit a document or batch of documents for Detailed analysis or submit a collection for Discovery analysis.
- Queue with a POST method and the server will return with an HTTP status.
- For example, queuing documents for analysis is like lining up bottles to be filled by the milkman.
Request: determine the status of a certain document with its document ID.
- Request with a GET request and the server will respond with either the processed results of the current status: QUEUED, PROCESSED, or FAILED.
- For example, requesting a document is like asking the status of a specific bottle-- the milkman will either give you the filled bottle or tell you why you can't have it yet.
- In a request call, if the server responds with a "PROCESSED" status, it will also return the corresponding processed data.
- In a request call, if the server responds with a "QUEUED" or "FAILED" status, a corresponding reason or error will accompany it.
Retrieve: return all processed documents.
- Retrieve with a GET request and the server will return the results of all documents that have been processed. It will return nothing if no documents are processed.
- For example, retrieving documents is like asking the milkman for any and all full bottles.
Cancel: delete a queued document if Semantria has not processed it yet.
- This is a DELETE request.
- For example, cancelling a document is like removing the empty bottle before it has been filled by the milkman.
Queuing: submitting a document or batch of documents for Detailed analysis.
Users must queue documents into the API for processing. A document can be processed with a specific configuration (by using a particular config_id) or with the default configuration (by passing nothing in the config_id field). Single documents under 2KB in size should come back in a few seconds.
For individual documents, the URL is https://api.semantria.com/ [document.json] | [document.xml] .
The config_id parameter should be submitted as part of the url.
The request body should contain an XML or JSON object with three fields: the document ID, the text to be analyzed, and an optional tag in the POST request.
After submitting documents to be queued, each document will be analyzed independently of the others. Semantria API will return an analysis for each document.
The server will process each document independently of any other processes or documents. Documents will not influence each other in processing.