OpenResume: Advancing Career Trajectory Modeling with Anonymized and Synthetic Resume Datasets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Despite substantial advancements in various fields of AI, computational research in career and job domains has been significantly hindered by a critical lack of accessible datasets. This limitation is mainly due to the proprietary nature of job platforms, which restrict the sharing of job-domain datasets with the research community. The scarcity is particularly pronounced for career trajectory and resume datasets, severely constraining academic researchers in developing and evaluating new models. In this paper, we address the crucial issue of resume dataset unavailability in the job domain, identified through our comprehensive comparison of existing job-domain machine learning studies. To the best of our knowledge, we introduce OpenResume, the first publicly available, anonymized, and structured resume dataset, specifically designed for job-domain downstream tasks. This dataset aims to catalyze advancements in AI and foster new markets for machine learning and data science within career trajectory modeling. OpenResume is comprehensively processed from real-world resume data. We anonymize and substitute personal identifiers and company names, normalize job titles into ESCO-based ones (i.e., one of the most common occupation taxonomies), and employ differential privacy techniques on temporal features to ensure open accessibility and privacy protection. Additionally, we augment OpenResume with a synthetically generated resume dataset derived from the post-processed real-world data, extending its diversity and utility. To demonstrate that OpenResume retains challenges and properties similar to real-world job datasets, we benchmark OpenResume on state-of-the-art job-domain prediction models across four prevalent downstream tasks: (1) next job title prediction, (2) next company prediction, (3) turnover prediction, and (4) link prediction. Our experimental results show that these job-domain models perform comparably on OpenResume and the original data across all tasks, demonstrating OpenResume as a valuable career trajectory dataset for both academic research and practical applications. We also indicate the OpenResume applicability for the other eight downstream tasks. Our datasets are available at: https://tinyurl.com/OpenResumeData.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6697-6706
Number of pages10
ISBN (Electronic)9798350362480
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024

Conference

Conference2024 IEEE International Conference on Big Data, BigData 2024
Country/TerritoryUnited States
CityWashington
Period12/15/2412/18/24

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'OpenResume: Advancing Career Trajectory Modeling with Anonymized and Synthetic Resume Datasets'. Together they form a unique fingerprint.

Cite this