Inventor name disambiguation for a patent database using a random forest and DBSCAN

Kunho Kim, Madian Khabsa, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Inventor name disambiguation is the task that distinguishes each unique inventor from all other inventor records in a patent database. This task is essential for processing person name queries in order to get information related to a specific inventor, e.g. a list of all that inventor's patents. Using earlier work on author name disambiguation, we apply it to inventor name disambiguation. A random forest classifier is trained to classify whether each pair of inventor records is the same person. The DBSCAN algorithm is use for inventor record clustering, and its distance function is derived using the random forest classifier. For scalability, blocking functions are used to reduce the complexity of record matching and enable parallelization since each block can be run simultaneously. Tested on the USPTO patent database, 12 million inventor records were disambiguated in 6.5 hours. Evaluation on the labeled datasets from USPTO PatentsView competition shows our algorithm outperforms all algorithms submitted to the competition.

Original languageEnglish (US)
Title of host publicationJCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages269-270
Number of pages2
ISBN (Electronic)9781450342292
DOIs
StatePublished - Sep 1 2016
Event16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016 - Newark, United States
Duration: Jun 19 2016Jun 23 2016

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2016-September
ISSN (Print)1552-5996

Other

Other16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
Country/TerritoryUnited States
CityNewark
Period6/19/166/23/16

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Inventor name disambiguation for a patent database using a random forest and DBSCAN'. Together they form a unique fingerprint.

Cite this