1. Introduction
Digital languages and instruments are not only powerful tools for simplifying and enhancing the work of humanists and social scientists, they also create new cultural representations and self-representations that transform both the epistemology and the practice of research.
In particular, digital representations can influence and shape our cultural artefacts and everyday experiences in various ways. For example, we are influenced in the way we represent everyday digital objects, such as the Windows folder as a metaphor for ‘document box’. This is at the level of the interfaces that ‘produce users through benign interactions […]. That is, as ideology create subjects, interactive and seemingly real-time interfaces create users who believe they are the “source” of the computer’s action’ (Chun 2011: 66-68). We are also influenced by the ways in which we write software or encode a document through specific languages (e.g. Python or HTML). However, it is easy to demonstrate that such a distinction is artificial (and often damaging). From a semiotic point of view, both representations (visible interface and invisible coding) are ‘modelling systems’ (Uspenskij et al. 1973) that cast their influence on overlapping political, social, cognitive and epistemological domains. In the first case we are talking about an influence mainly on practices and processes (social and cognitive dimensions), and in the second case we are dealing with the theory and interpretation of information structures (linguistic, hermeneutical and epistemological dimensions).
In our paper we will focus on this second aspect, and show some examples of how code and encodings are shaping the way we conceive and practise the work of reconstruction, conservation and representation of information structures and cultural artefacts. To this aim, we will discuss three encoding tools widely used in the Humanities and Social Sciences communities: HTML, the de facto standard for encoding World Wide Web documents and pages, Unicode, an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers, and XML (eXtensible Markup Language), which defines a set of rules for encoding documents.
2. HTML War
The evolution of HTML (Hypertext Markup Language) as a structural language and application standard of the World Wide Web has been largely shaped by forces outside of the World Wide Web Consortium (W3C), the international community assembled by Tim Berners-Lee ‘to lead the World Wide Web to its full potential by developing protocols and guidelines that ensure the long-term growth of the Web.’ (http://www.w3.org/Consortium/)
Even before it became a W3C Recommendation on October 28, 2014, HTML5 had already reached in 2009 the status of a de facto standard on the Web. This was promoted and encouraged by the Web Hypertext Application Technology Working Group (WHATWG), a community established by Apple, the Mozilla Foundation and Opera Software (joined later by Google) that was in open opposition to the development of the language of the Web as envisioned by the W3C and ‘concerned about the W3C’s direction with XHTML, lack of interest in HTML and apparent disregard for the needs of real-world authors.’ (https://wiki.whatwg.org/wiki/FAQ#What_is_the_WHATWG.3F)
As in the days of the browser wars between Netscape Navigator and Microsoft Internet Explorer (Sordi 2010), tensions collide (and versions are released) around the code of the World Wide Web that appear to be inspired by benefits for the authors and the users of the Web, but in in fact were tied to strategies to control a market that Web 2.0 apps, social media and the eternal promise of the Semantic Web have once again enhanced (Zeldman - Marcotte 2010).
Controlling the development of HTML means controlling the competition between the different software used to access the Web (Ford 2014); managing the scheduling and release of mobile applications that can now compete at the same level as desktop applications, without being forced to adopt or implement competing solutions; and to determine how texts and information on the Web can be searched and placed interrelated, making the definitive migration from a Web based on document search to a system for collecting user data. In other words, behind the HTML war there is a growing obsession by industry and governments with our digital traces, habits and personal information (Lyon – Bauman 2013; Bowker 2013).
3. The cultural and political biases of XMLA markup language permits us to formally describe the structure of a text, and to analyse the data in depth. Its utility will be in proportion to how much information it can set out, include and preserve. The word ‘markup’ itself shows its original bias: ironically, like other computing languages, XML is one the most faithful successors to the Gutenberg model; its basic aim is to imitate and preserve structured information as laid out in modern printed books. But is hierarchical and structured information an inherent and universal feature of texts (let alone writing)? The bias inherited from print has forced us to think of a text as a stable product, but if we either look at the different historical representations of a given text or at its documented writing stages, it is clear that there is not one text, but as many different texts as there are mechanisms of writing, material production, intertextual paths and methodologies of reconstruction (Fiormonte – Schmidt – Martiradonna 2010). Not only there is a potential conflict between the linear and hierarchical nature of current markup languages, and the intrinsic dynamic nature of the writing process, current text encoding tools and methodologies seems to constitute the most serious obstacle to the development of an independent theory of digital text.
A typical example of the overlapping between geopolitical settings and technical choices can be found in organisations like the TEI (Text Encoding Initiative), an international consortium that defines guidelines for encoding cultural heritage documents in XML. However, the practice of defining encoding standards to allow electronic documents to be processed ultimately by shared software at the presentation level, does not work nearly as well for the encoding of historical primary sources (Schmidt 2014). Attempts to declare ‘standard’ names for textual features overlook significant variations in the interpretation, selection and application of those codes by different groups, individuals and cultures. Once again, the ‘practice’ of code is the result of the ‘theory’ reflected by certain groups and interests. Encoding tools and methodologies become examples of ‘symbolic capital’ (Bourdieu 1984) used by the TEI community to export a universalising, western-centric approach to the representation of cultural artefacts.
4. Universalizing the typography: Unicode
The Unicode standard aims at constituting a universal and inclusive mapping of all graphemes from existing and past writing systems. Yet, some assumptions that underlie the standard are shaped by the culture-specific standardisation of the graphical representation of language generated by Gutenberg’s invention of moveable type printing.
The Unicode writing system, now forced upon all scripts, is based on a discrete ordered sequence of individual characters flowing only in one direction and dimension. However, in many writing systems based on handwriting, including contemporary Arabic and some Indic scripts, some graphemes ‘orbit’ around a central grapheme and can be written not only after (left or right, depending on the main direction of the script), but also before, above or under the ‘main’ grapheme. This is the case of ancient Greek ‘hypogegrammenon (subscribed) iota’, written before another vowel if lowercase, but after that vowel if uppercase. A wider case study is provided by the right-to-left Indic scripts where some vowels, although pronounced after a consonant, may be written before it (The Unicode Consortium 2014, chapter 12.1; Perri 2009; Constable 2001, II, par. 6.3).
Another assumption underlying the Unicode model is a one-to-one correspondence between phoneme and grapheme. The European medieval handwriting conventions based on the Greek or Latin alphabets are an interesting case study in this respect: ligatures, brevigraphs (sometimes represented by apparent diacritics such as macrons) and logographs are systematic and ‘regular’ here, and the underlying model is incompatible with the Unicode/Gutenberg one-to-one correspondence between a phoneme and a character in a linear sequence. However, the XML/TEI Guidelines for the transcription of pre-modern primary sources simply recommend the use of Unicode for this purpose (TEI P5 Guidelines, chapter 5).
In all these cases, the current version of Unicode imposes the Gutenberg model (based on the Latin alphabet used in Europe in the Modern Age) upon writing systems based on different models, and all discrepancies are treated as ‘exceptions’ to be dealt with at the presentational rendering level through software workarounds.
5. Conclusions
These case studies show that digital ‘standards’ always reflect a cultural bias (Carey 2009), and that the level of encoding is never neutral, but tends to assume (and overlap with) universalising discourses that are usually invisible at the surface of technology (Galloway 2012).
Digital Universalism (Chan 2014) is deeply intertwined with the question of language, which in its turn controls our encoding practices. Code imperialism and linguistic imperialism (Phillipson 2009) are two sides of the same coin. We are facing a codex universalis, based on the English lingua franca, used to articulate and exert a power of standardisation and control.
But codes and encoding(s) before or along ‘semiotic’, ‘hermeneutic’ or ‘cultural’ are also political phenomena. Looking at boards, offices and steering committees reveals that consortia, associations and organisations like Unicode, ICANN , TEI or the W3C are informed by Anglophone hegemonies, and their decisions are shaped by their commercial interests, ideologies and cultures. So apparently ‘neutral’, ‘technical’ decisions, as can be observed in Unicode, TEI or other organisations, tend to oversimplify and standardise the complex diversity of languages and cultural artefacts.
As Friedrich Kittler put it: ‘Codes—by name and by matter—are what determine us today, and what we must articulate if only to avoid disappearing under them completely. … Today, technology puts code into the practice of realities, that is to say: it encodes the world’ (Kittler 2008: 40). So, is it still in our power to code (encode reality), or rather is code imposing on us its biases and constraints?
In the conclusion of our presentation we will try to propose solutions to avoid or reduce the impact of universalistic encoding, and report on alternative experiences and experiments that try to resist the effects of ‘colonial computing’ (Ali 2014).References
Ali, Mustafa (2014). “Towards a decolonial computing”, CEPE 2013: Computer Ethics: Philosophical Enquiry, 1-3 July 2013, Lisbon, Portugal, International Society of Ethics and Information Technology, pp. 28–35.
Bauman, Zygmunt and Lyon, David (2013). Liquid Surveillance. A Conversation. Cambridge: Polity Press.
Bourdieu, Pierre (1984). Distinction: A social critique of the judgement of taste. Cambridge, MA: Harvard University Press.
Bowker, Geoffrey Y. (2013). “Data Flakes: an afterword to ‘Raw Data’ is an oxymoron”. In Gitelman, Lisa (ed.) ‘Raw Data’ is an oxymoron. Cambridge (MA): MIT Press, pp. 167-171.
Carey, James W. (2009). Communication as Culture. Essays on Media and Society, Revised Edition. New York and Abingdon: Routledge.
Chan, Anita S. (2014). Networking Peripheries: Technological Futures and the Myth of Digital Universalism. Cambridge (MA): MIT Press.
Constable, Peter G. (2001). Understanding Unicode: a general introduction to the Unicode Standard. NRSI: Computers & Writing, 2001 (http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter04b).
Fiormonte, Domenico, Martiradonna, Valentina, Schmidt, Desmond (2010). “Digital Encoding as a Hermeneutic and Semiotic Act: The Case of Valerio Magrelli”, Digital Humanities Quarterly, Vol. 4, N. 1 (http://digitalhumanities.org/dhq/vol/4/1/000082/000082.html).
Ford, Paul (2014). “On HTML5 and the Group That Rules the Web”, The New Yorker, November 20, 2014, (http://www.newyorker.com/tech/elements/group-rules-web).
Galloway, Alexander R. (2012). The interface effect. London: Polity Press.
Kittler, Friedrich (2008). “Code (or, How You Can Write Something Differently)”, in Matthew Fuller, Software Studies: A Lexicon. Cambridge (MA): MIT Press, pp. 40-47.
Perri, Antonio (2009). “Al di là della tecnologia, la scrittura. Il caso Unicode”, Annali dell’Università degli Studi Suor Orsola Benincasa, Vol. II, pp. 725-748.
Phillipson, Robert (2009). Linguistic imperialism continued. New York and London: Routledge.
Schmidt, Desmond (2014). “Towards an Interoperable Digital Scholarly Edition”, Journal of the Text Encoding Initiative [Online], Issue 7 (http://jtei.revues.org/979).
Sordi, Paolo (2010). “Leggere il codice del Web”, Testo e Senso, 11, 2010 (http://testoesenso.it/article/view/1).
TEI Consortium, eds. (2007). TEI P5: Guidelines for Electronic Text Encoding and Interchange, TEI Consortium (http://www.tei-c.org/Guidelines/P5/, last retrieved: February 27, 2015).
The Unicode Consortium (2014). The Unicode Standard, Version 7.0.0. Mountain View (CA): The Unicode Consortium.
Uspenskij, Boris Andreevich; Ivanov, Vyacheslav Vsevolodovich; Piatigorskij, Alexander; Lotman, Juri M. (1973). “Tezisy k semiotičeskomu izučeniju kul’tur (v primenenii k slavjanskim tektstam)”, in M. R. Mayenowa (ed.), Semiotyka i Struktura Tekstu. Studia święcone VII międz. Kongresowi Slawistów, Warszawa, pp. 9-32. Trad. en. “Theses on the Semiotic Studies of Culture (As applied to Slavic Texts)”, in Jan van der Eng and Mojmír Grygar (eds.), Structure of texts and semiotics of culture, The Hague: Mouton, 1973, pp. 1-28.
Zeldman, Jeffrey and Marcotte, Ethan (2010). Designing with Web Standards (3rd edition). Berkeley, New Riders.
WHATWG (https://whatwg.org).
W3C (http://www.w3.org).