Unknown

Dataset Information

0

Unicode-8 based linguistics data set of annotated Sindhi text.


ABSTRACT: Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis.

SUBMITTER: Dootio MA 

PROVIDER: S-EPMC6139473 | biostudies-literature | 2018 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Unicode-8 based linguistics data set of annotated Sindhi text.

Dootio Mazhar Ali MA   Wagan Asim Imdad AI  

Data in brief 20180522


Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morpholog  ...[more]

Similar Datasets

| S-EPMC7481818 | biostudies-literature
| S-EPMC7057657 | biostudies-literature
| S-EPMC9849450 | biostudies-literature
| S-EPMC8005322 | biostudies-literature
| S-EPMC8599474 | biostudies-literature
| S-EPMC6190742 | biostudies-other
| S-EPMC6236664 | biostudies-literature
| S-EPMC6111892 | biostudies-literature
| S-EPMC3956585 | biostudies-other
| S-EPMC6811530 | biostudies-literature