Uploading corpora

You can upload two types of files to use as corpora in Globalese: CAT tool files and delimited files (CSV/TSV).

You can upload files individually or together in a zip file.

The maximum number of uploads allowed in one go is 20, and no file can be larger than 600 MB.

Uploading CAT tool files

The source and target languages are automatically detected.

The following file formats are accepted:

  • .mqxliff/.mqxlz
  • .mxliff
  • .sdlxliff
  • .tbx
  • .tmx
  • .txlf
  • .xliff/.xlf
  • .xlz

To upload one or more CAT tool files:

  1. Go to Corpora.
  2. Choose Upload → CAT tool files.
  3. Select at least one group to assign the uploaded file(s) to.
  4. Optionally specify any metadata.
  5. Select at least one file to upload.
  6. Click the Upload button.

Uploading delimited files

Delimited files must be bilingual text files where the source and target segments are on the same line, separated by a tab character (.bi, .tsv), a semicolon (.csv) or a comma (.csv).

Since there is no way to automatically detect the languages, you must specify them before uploading the files.

The following file formats are accepted:

  • .bi
  • .csv (using comma or semicolon as delimiter)
  • .tsv

To upload one or more CAT tool files:

  1. Go to Corpora.
  2. Choose Upload → Delimited files (CSV/TSV).
  3. Specify the source language – the language in the first column of the uploaded files.
  4. Specify the target language – the language in the second column of the uploaded files.
  5. Select at least one group to assign the uploaded file(s) to.
  6. Optionally specify any metadata.
  7. Select at least one file to upload.
  8. Click the Upload button.