Ancient World Computational Analysis
Partner: Adam Anderson, D-Lab, DH & BIDS, Academic
Overview
Project Description
Last year we worked on building a JupyterBook workflow (with python Jupyter Notebooks) that takes in a large collection of PDFs and OCRs the files in preparation for building language models. So far these models include: Topic Modeling (LDA), Word Embedding (W2V & D2V), and Deep Learning (BERT). This workflow is mostly complete, although we may look at ways of optimizing the Deep Learning component. In the coming year we hope to work on building better segmentation into the raw text, both in preparation for Deep Learning (with encoder / decoder values), but also in terms of parsing the citations for Author, Title, Date, and Publication info. These will then be used as features for the Node List, as the Language Models already build Edge Lists for Network Graphs (see our AWCA webpage for more details http://digitalhumanities.berkeley.edu/ancient-world-computational-analysis-awca). Lastly, we will work on building a Web app that uses D3.js for large network visualizations. This will allow users to run through these tools with any dataset they choose to upload.
Expected Deliverable
Jupyter Notebooks & Web app
What would a successful semester look like to you?
Updated python Jupyter Notebooks and progress toward a Web app
Additional Skills from ideal candidates
Familiarity with Jupyter Notebooks in a Google Colab environment would be beneficial as well.
Data
Models
Conclusion