Link Search Menu Expand Document

Ancient World Computational Analysis

Partner: Adam Anderson, D-Lab, DH & BIDS, Academic

Overview

Project Description

Last year we worked on building a JupyterBook workflow (with python Jupyter Notebooks) that takes in a large collection of PDFs and OCRs the files in preparation for building language models. So far these models include: Topic Modeling (LDA), Word Embedding (W2V & D2V), and Deep Learning (BERT). This workflow is mostly complete, although we may look at ways of optimizing the Deep Learning component. In the coming year we hope to work on building better segmentation into the raw text, both in preparation for Deep Learning (with encoder / decoder values), but also in terms of parsing the citations for Author, Title, Date, and Publication info. These will then be used as features for the Node List, as the Language Models already build Edge Lists for Network Graphs (see our AWCA webpage for more details http://digitalhumanities.berkeley.edu/ancient-world-computational-analysis-awca). Lastly, we will work on building a Web app that uses D3.js for large network visualizations. This will allow users to run through these tools with any dataset they choose to upload.

Expected Deliverable

Jupyter Notebooks & Web app

What would a successful semester look like to you?

Updated python Jupyter Notebooks and progress toward a Web app

Additional Skills from ideal candidates

Familiarity with Jupyter Notebooks in a Google Colab environment would be beneficial as well.

Data

Models

Conclusion