Abstract:
In this paper the authors compare by classification quality different types of stylometric features: low-level features that include character-based and word-based ones, ...Show MoreMetadata
Abstract:
In this paper the authors compare by classification quality different types of stylometric features: low-level features that include character-based and word-based ones, and high-level rhythm features. The authors classified texts into centuries with each feature type separately and their combinations applying four classifiers: Random Forest and AdaBoost meta-algorithms, a LSTM neural network, and a GRU neural network. The experiments with three text corpora in English, Russian, and French languages showed that combining rhythm features and low-level features significantly improved quality of classification by centuries. Besides, classification results allowed to compare the styles of writing in different languages from a point of view of structure of sentences.
Date of Conference: 07-09 September 2020
Date Added to IEEE Xplore: 02 October 2020
ISBN Information:
Print on Demand(PoD) ISSN: 2305-7254