Engines

Engines are trained from corpora and are used to translate files. They are unidirectional, i.e. have one source language and one target language.

Engine types

Domain-adapted engines

Domain-adapted engines are engines trained by using Globalese’ proprietary automated in-domain adaptation technology. By selecting your important in-domain TM(s) as “Master” training data, the engine will be focusing on the style and the terminology of those TM(s). You can choose to add generic stock data to extend the engine in case the volume of the in-domain data is not enough. You have also the option to add your own auxiliary data.

Use case

The typical use case for domain-adapted engines is where adhering to a particular terminology and style is very important. Some examples: product documentation, end-user manuals or software documentation, where it is essential to use the right terminology and style consistently.

Required training data

The following table shows the minimum and recommended number of segments.

Includes stock corpora? Minimum volume (segments) Recommended volume (segments)
Yes 15,000 master 100,000+ master
No 15,000 master
200,000 total
100,000+ master
1,000,000+ total

Master corpora

The core of the engine. The engine will perform best on source texts similar to this training material, so make sure you select material as master corpus that is related to what the engine will be used for.

Auxiliary corpora

Auxiliary corpora, just like stock corpora, will be used to enrich the master corpora. A bigger pool of auxiliary corpora means a bigger selection base for the training process.
Only the content most closely related to the master corpora will eventually be used for training the engine, so feel free to add any material that has good linguistic value.

Typical training time

The typical training time for domain-adapted engines is between 10 and 24 hours.

Stock engines

You can also use pre-trained stock engines for certain language combinations.

Stock+ engines

Stock+ engines are engines trained by extending a pre-trained stock engine with you own master data. The selected master data will be part of the engine. If there is new content in the master corpora, the engine will learn it. However, you should not expect changes in terminology and style preferences in the engine based on the master data added.

Use case

The typical use case for stock+ engines is where it is important to use a generic engine trained on a large data set, which is however incorporating your own training data too. You can also use this option if you the size of your own training data is not enough to train a domain-adapted engine. Some examples: annual reports, user generated content, web pages.

Required training data

There is no minimum requirement.

Typical training time

The typical training time for stock+ engines is between 10 minutes to 4 hours.

Creating an engine

Training