WASHINGTON, Aug. 20, 2024 – Statistical analysis of classic literature has shown that the way punctuation breaks up text obeys certain universal mathematical relationships. James Joyce's tome "Finnegans Wake," however, famously breaks the rules of normal prose through its unusual, dreamlike stream of consciousness. New work in chaos theory, published in the journal Chaos, from AIP Publishing, takes a closer look at how Joyce's challenging novel stands out, mathematically.
Researchers have compared the distribution of punctuation marks in various experimental novels to determine the underlying order of "Finnegans Wake." By statistically analyzing the texts, Stanisz et al. found the tome exhibits an unusual but statistically identifiable structure.
"'Finnegans Wake' exhibits the type of narrative that makes it possible to continue longer strings of words without the need for punctuation breaks," said author Stanisław Drożdż. "This may indicate that this type of narrative is less taxing on the human perceptual and respiratory systems or, equivalently, that it resonates better with them."
As word sequences run longer without punctuation marks, the higher the probability that a punctuation mark appears next. Such a relationship is called a Weibull distribution. Weibull distributions apply to anything from human diseases to "The Gates of Paradise," a Polish novel written almost entirely in a single sentence spanning nearly 40,000 words.
Enter "Finnegans Wake," which weaves together puns, phrases, and portmanteaus from up to 70 languages into a dreamlike stream of consciousness. The book typifies Joyce's later works, some of the only known examples to appear to not adhere to the Weibull distribution in punctuation.
The team broke down 10 experimental novels by word counts between punctuation marks. These sets of numbers were compiled into a singularity spectrum for each book that described how orderly sentences of different lengths are proportioned. "Finnegans Wake" has a notoriously broad range of sentence lengths, making for a wide spectrum.
While most punctuation distributions skew toward shorter word sequences, the wide singularity spectrum in "Finnegans Wake" was perfectly symmetrical, meaning sentence length variability follows an orderly curve.
This level of symmetry is a rare feat in the real world, implying a well-organized, complex hierarchical structure that aligns perfectly with a phenomenon known as multifractality, systems represented by fractals within fractals.
"'Finnegans Wake' appears to have the unique property that the probability of interrupting a sequence of words with a punctuation character decreases with the length of the sequence," Drożdż said. "This makes the narrative more flexible to create perfect, long-range correlated cascading patterns that better reflect the functioning of nature."
Drożdż hopes the work helps large language models better capture long-range correlations in text. The team next looks to apply their work in this domain.