Automatic resource cleanup and normalization

All the resources uploaded to Globalese undergo a thorough technical cleanup, leaving only language- and content-related tasks to the resource managers.

When uploading a corpus, inline and formatting tags are stripped from the text and special characters are normalized.

When training an engine, the content of the corpora is filtered according to various criteria: whether the source text is the same as the target text, whether there is a length mismatch, and so on.

repository