Wordnik Etymology Search
Partner: Erin McKean, Wordnik Society, Non-Profit
Overview
Project Description
In this project, we hope to build an etymology search tool and API for Wordnik users. We’ll digitize an out-of-copyright etymological dictionary and pull data from Wiktionary, and create an appropriate datastore. We’ll experiment with ways to use classifiers to connect words to etymons. Users of the tool will be able to find groups of words that are etymologically related, including words that share languages of origin or intermediate forms.
Expected Deliverable
Digitized data, recommendations for correct data store, models for classifier.
What would a successful semester look like to you?
A successful project would result in clean data in a useful datastore, and a beta version of the classifier. Wordnik can create APIs from those inputs.
Additional Skills from ideal candidates
It would be great if students had experience cleaning OCR and familiarity with tools like jq.
Data
Models
Conclusion