Tokenization for Occitan (Gascon and Lengadocian)

2019-07-08T22:36:45Z (GMT) by Marianne Vergez-Couret

A perl programme to tokenise texts in Occitan.

The programme is an adaptation from the perl programme to tokenize texts in French made by Tanguy et Hathout (2007) in its extended version (that is to say with a list of exceptions).

To launch the programme, execute the following instruction:

perl segmenteur_occitan.pl exceptions_occitan.txt output

This tool was developed in the context of the RESTAURE project, funded by the French ANR.