In the SEMIA project, we investigate the affordances of visual analysis and data visualization for exploring large collections of digitized moving images. In doing so, we try to accomplish a number shifts in relation to other Digital Humanities projects of the past few years that use similar technologies. As we mention in the project description on our About page, our efforts to develop, and repurpose, concepts and tools address a wider audience than just scholars: we also target artists, and users in the creative industries. But there’s other ways in which SEMIA differs from such image analysis and visualization projects, both past and on-going. Below, we list some of the most crucial ones.
First, SEMIA is distinctive in its overall objectives. The project seeks to develop concepts and tools, not in order to meet very specific film or media studies requirements, but rather to enable users to intuitively explore visual connections between database objects. In the past, and today still, image analysis was often applied – and the tools for it developed – with an eye to answering specific research questions. (For instance, in the context of digital film history, questions about developments in film style. Within our European network, such questions have been leading, among others, in the Vienna Digital Formalism project, now-completed, and the Zürich Film Colors: Technologies, Cultures, Institutions project, still on-going.) While there is certainly a kinship between SEMIA and projects that take such an approach, and while we build on results obtained in the process, we simultaneously try to move on from them. Ultimately, the explorations we seek to enable may lead to concrete research questions; however, we always act in the assumption that the users’ explorations will serve to trigger, rather than to answer those (compare e.g. Masson 2017, 33-34). Therefore, we also do not seek to generate tools for the analysis of specific bodies of work, but rather more robust ones, that allow users to navigate within large corpora of (visually) extremely diverse materials.
(Human-made) selection of images and image series with visually similar features
Second, SEMIA does not make use of semantic descriptors in the same way, or to the same extent, as other projects do. In developing concepts and tools, we don’t rely as much, either on metadata ingested with the archival objects themselves (e.g. as catalogue entries) or on the sorts of manually produced annotations that are key to other projects involving image analysis. On the one hand, because we see this project as an effort to find out how collections can be explored, precisely, in the absence of metadata – archival assets that are both scarce, and very time-consuming to produce. In other words, we actually hope that SEMIA will generate ideas on how users can circumvent the general problem of scarcity of metadata in navigating archival collections. On the other hand, and as implied above, we do so because we want to be able to explore collections in other ways than have been tried so far, and on the basis of other connections between discrete items than those that underpin the sorts of search activities that we commonly perform in navigating them. Connections, indeed, that inevitably involve some form of interpretation of the discrete objects, and that are heavily inflected by the dominance in metadating of either classic filmographic or technical information (often added manually or through crowd-sourcing), or identifications of semantic (often object) categories (increasingly done automatically).
A third shift our project seeks to accomplish is to rely more radically on deep-learning techniques – and so-called ‘neural networks’ – as opposed to feature engineering. We intend to release another post on this topic soon, but in essence, this means that the SEMIA Informatics team will not design task-specific algorithms in order to extract pre-defined features from the images in our database (whether image sections, single images, or strings of images/fragments) and subsequently compare them, but rather tweak, and repurpose, networks that have been trained with techniques for automatic feature-learning (that is, the learning of data representations). (The aforementioned repurposing – and retraining – is necessary, because the networks we can rely on were built for different tasks, and different corpora, than we intend to use them for. Once again, we shall elaborate on this in a future post.) Overall, this means that in developing tools for analysis also, we rely to a lesser extent on human intervention. But we do not seek to avoid such human intervention altogether – if only because we need to be able to minimally understand and evaluate the sorts of similarities the system relies on in making connections between discrete objects. Moreover, as it is our aim to support users in their (unique, or at the very least individual) explorations of collections, we would like our tools to adapt to their specific needs and behaviours. Here, human intervention in the form of man-machine interaction will still be desirable.
Finally, SEMIA also explores new principles for data visualization (but ‘new’, in this context, in relation to such practices in the Digital Humanities more broadly – rather than image analysis specifically). Considering the abovementioned goal of enabling users to freely explore collections and make serendipitous connections between objects – yet potentially as a first step toward asking highly precise research questions – the representation of the data we extract in the analysis process need not directly serve an evidentiary purpose. (For more on this topic, see Christian Olesen’s blog post on Moving Image Data Visualization.) This means in turn that we can take inspiration also from artistic projects in image data visualization – an area of practice that is currently gaining a lot of traction. At the same time, users will need to be able to assess what it is they see, if only to be able to determine what they can or cannot use it for (for instance, in the context of their scholarly research). As this is a key area of attention for SEMIA, we will be sure to reflect on it further in later posts, as our team discussions on the topic evolve.
– Eef Masson
Masson, Eef. 2017. “Humanistic Data Research: An Encounter between Academic Traditions”. In The Datafied Society: Studying Culture through Data, ed. Mirko Tobias Schäfer and Karin van Es, 25-37. Amsterdam: Amsterdam University Press.