VerifyMe - Style Explorer · Seán O'Sullivan

Description

This graph contains 4,493, samples written by 995 authors from The Project Gutenberg Corpus. Each ≈10K characters.

This plot includes both the training and testing author partitions of the corpus.*

Each point represents a cleaned text passage that was converted into a stylometric embedding, normalized, and processed through my authorship verification model. The embeddings were then reduced to three dimensions using UMAP, resulting in the plot you see.

Project code

Note: Click the dropdown to colour samples by author.