Dataset Information

Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

ABSTRACT: We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

SUBMITTER: Fariss CJ

PROVIDER: S-EPMC4587949 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Fariss Christopher J CJ Linder Fridolin J FJ Jones Zachary M ZM Crabtree Charles D CD Biek Megan A MA Ross Ana-Sophia M AS Kaur Taranamol T Tsai Michael M

PloS one 20150929 9

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique te ...[more]

PMID: 26418817

Dataset Information

Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Publications

Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Human rights' interdependence and indivisibility: a glance over the human rights to water and sanitation.
| S-EPMC6408851 | biostudies-literature

Dynamically generating T32 training documents using structured data.
| S-EPMC6579602 | biostudies-literature

Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.
| S-EPMC3567135 | biostudies-literature

FlowCal: A User-Friendly, Open Source Software Tool for Automatically Converting Flow Cytometry Data from Arbitrary to Calibrated Units.
| S-EPMC5556937 | biostudies-literature

Improving global gross primary productivity estimation by fusing multi-source data products.
| S-EPMC8956891 | biostudies-literature

Mandatory COVID-19 vaccination and human rights.
| S-EPMC8700276 | biostudies-literature

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems.
| S-EPMC5310197 | biostudies-literature

Nationalism and human rights: A replication and extension.
| S-EPMC6707544 | biostudies-literature

Dataset on human rights awareness in Northwest Nigeria.
| S-EPMC8601984 | biostudies-literature

Health and Human Rights in Karen State, Eastern Myanmar.
| S-EPMC4550474 | biostudies-literature