Dataset Information

Probing the topological properties of complex networks modeling short written texts.

ABSTRACT: In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well-many informative discoveries have been made this way-but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.

SUBMITTER: Amancio DR

PROVIDER: S-EPMC4342245 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Probing the topological properties of complex networks modeling short written texts.

Amancio Diego R DR

PloS one 20150226 2

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well-many informative discoveries have been made this way-but it raises an unc ...[more]

PMID: 25719799

Dataset Information

Probing the topological properties of complex networks modeling short written texts.

Publications

Probing the topological properties of complex networks modeling short written texts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Probing the statistical properties of unknown texts: application to the Voynich Manuscript.
| S-EPMC3699599 | biostudies-literature

Topological Properties of Neuromorphic Nanowire Networks.
| S-EPMC7069063 | biostudies-literature

Topological Strata of Weighted Complex Networks.
| S-EPMC3689815 | biostudies-literature

Scaling in topological properties of brain networks.
| S-EPMC4845066 | biostudies-literature

Co-expression networks: graph properties and topological comparisons.
| S-EPMC2804297 | biostudies-literature

Temporal-topological properties of higher-order evolving networks.
| S-EPMC10090145 | biostudies-literature

Topological motifs populate complex networks through grouped attachment.
| S-EPMC6107624 | biostudies-literature

Enabling Controlling Complex Networks with Local Topological Information.
| S-EPMC5854593 | biostudies-literature

Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis.
| S-EPMC10907945 | biostudies-literature

Topological estimation of signal flow in complex signaling networks.
| S-EPMC5869720 | biostudies-literature