INCA: Infrastructure for content analysis

Abstract

We present INCA (short for INfrastructure for Content Analysis), a Python module for collecting, storing, processing, and analyzing a wide variety of media content, including but not limited to news, political debates, social media, forums, and customer reviews. Using Elasticsearch as a database backend and Celery for task management, it makes automated content analysis scalable. INCA’s main objective is to enable and promote an integrated workflow. INCA focuses on re-usability of data, processors, and analyses; making all steps of automated content analysis (ACA) accessible to social scientists, without requiring advanced programming skills. Here, we present the aim, implementation and recommended workflow for INCA. https://doi.org/10.1109/eScience.2018.00078

Publication
2018 IEEE 14th International Conference on e-Science (e-Science)