Link Search Menu Expand Document

Wordnik Etymology Search

Partner: Erin McKean, Wordnik Society, Non-Profit

Overview

Project Description

In this project, we hope to build an etymology search tool and API for Wordnik users. We’ll digitize an out-of-copyright etymological dictionary and pull data from Wiktionary, and create an appropriate datastore. We’ll experiment with ways to use classifiers to connect words to etymons. Users of the tool will be able to find groups of words that are etymologically related, including words that share languages of origin or intermediate forms.

Expected Deliverable

Digitized data, recommendations for correct data store, models for classifier.

What would a successful semester look like to you?

A successful project would result in clean data in a useful datastore, and a beta version of the classifier. Wordnik can create APIs from those inputs.

Additional Skills from ideal candidates

It would be great if students had experience cleaning OCR and familiarity with tools like jq.

Data

Models

Conclusion