Link Search Menu Expand Document

Wordnik Hyphenation Project

Partner:Erin McKean, Wordnik Society, Non-Profit

Overview

Project Description

Wordnik provides a hyphenation API, with data licensed from traditional dictionaries. However, more than half of the unique words of English aren’t in any dictionary (see http://www.sciencemag.org/content/331/6014/176). To provide hyphenation for unknown words, we’d like to implement the Liang algorithm (https://www.tug.org/docs/liang/) and combine it with a model (based on known dictionary hyphenations) that provides a confidence metric for the hyphenations provided for unknown words.

Expected Deliverable

Create actionable alerts for our maintenance teams, create additional Splunk dashboards for our Engineering and Maintenance partners, and figure out how to utilize our Oracle Analytics Cloud Platform. Any of these deliverables would help us out.

What would a successful semester look like to you?

Being able to put into production an API able to handle novel words with high levels of confidence in their hyphenation. (Wordnik will handle deployment.)

Additional Skills from ideal candidates

Curiosity! :)

Data

Models

Conclusion