Ontology highlight
ABSTRACT:
SUBMITTER: Kastrati Z
PROVIDER: S-EPMC6950834 | biostudies-literature | 2020 Feb
REPOSITORIES: biostudies-literature
Kastrati Zenun Z Kurti Arianit A Imran Ali Shariq AS
Data in brief 20200103
In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in th ...[more]