Unknown

Dataset Information

0

Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.


ABSTRACT: BACKGROUND:Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components. METHODS:Job title and task-based classifiers were developed by comparing job descriptions to multiple sources linking job and task descriptions to SOC codes. An industry-based classifier was developed based on the SOC prevalence within an industry. These classifiers were used in a logistic model trained using 14?983 jobs with expert-assigned SOC codes to obtain empirical weights for an algorithm that scored each SOC/job description. We assigned the highest scoring SOC code to each job. SOCcer was validated in 2 occupational data sources by comparing SOC codes obtained from SOCcer to expert assigned SOC codes and lead exposure estimates obtained by linking SOC codes to a job-exposure matrix. RESULTS:For 11?991 case-control study jobs, SOCcer-assigned codes agreed with 44.5% and 76.3% of manually assigned codes at the 6-digit and 2-digit level, respectively. Agreement increased with the score, providing a mechanism to identify assignments needing review. Good agreement was observed between lead estimates based on SOCcer and manual SOC assignments (? 0.6-0.8). Poorer performance was observed for inspection job descriptions, which included abbreviations and worksite-specific terminology. CONCLUSIONS:Although some manual coding will remain necessary, using SOCcer may improve the efficiency of incorporating occupation into large-scale epidemiological studies.

SUBMITTER: Russ DE 

PROVIDER: S-EPMC4871757 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Computer-based coding of free-text job descriptions to efficiently identify occupations in epidemiological studies.

Russ Daniel E DE   Ho Kwan-Yuet KY   Colt Joanne S JS   Armenti Karla R KR   Baris Dalsu D   Chow Wong-Ho WH   Davis Faith F   Johnson Alison A   Purdue Mark P MP   Karagas Margaret R MR   Schwartz Kendra K   Schwenn Molly M   Silverman Debra T DT   Johnson Calvin A CA   Friesen Melissa C MC  

Occupational and environmental medicine 20160421 6


<h4>Background</h4>Mapping job titles to standardised occupation classification (SOC) codes is an important step in identifying occupational risk factors in epidemiological studies. Because manual coding is time-consuming and has moderate reliability, we developed an algorithm called SOCcer (Standardized Occupation Coding for Computer-assisted Epidemiologic Research) to assign SOC-2010 codes based on free-text job description components.<h4>Methods</h4>Job title and task-based classifiers were d  ...[more]

Similar Datasets

| S-EPMC10324641 | biostudies-literature
| S-EPMC10625577 | biostudies-literature
| S-EPMC7272088 | biostudies-literature
| S-EPMC6829803 | biostudies-other
| S-EPMC9449451 | biostudies-literature
| S-EPMC8994771 | biostudies-literature
| S-EPMC6093321 | biostudies-literature
| S-EPMC5382319 | biostudies-literature
| S-EPMC9642882 | biostudies-literature
| S-EPMC5553787 | biostudies-literature