Wordnik Hyphenation Project
Partner:Erin McKean, Wordnik Society, Non-Profit
Overview
Project Description
Wordnik provides a hyphenation API, with data licensed from traditional dictionaries. However, more than half of the unique words of English aren’t in any dictionary (see http://www.sciencemag.org/content/331/6014/176). To provide hyphenation for unknown words, we’d like to implement the Liang algorithm (https://www.tug.org/docs/liang/) and combine it with a model (based on known dictionary hyphenations) that provides a confidence metric for the hyphenations provided for unknown words.
Expected Deliverable
Create actionable alerts for our maintenance teams, create additional Splunk dashboards for our Engineering and Maintenance partners, and figure out how to utilize our Oracle Analytics Cloud Platform. Any of these deliverables would help us out.
What would a successful semester look like to you?
Being able to put into production an API able to handle novel words with high levels of confidence in their hyphenation. (Wordnik will handle deployment.)
Additional Skills from ideal candidates
Curiosity! :)
Data
Models
Conclusion