Patrice Riemens on Fri, 27 Mar 2009 05:36:23 -0400 (EDT) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Ippolita Collective: The Dark Face of Google, Chapter 4 (second - and last - part) |
NB this book and translation are published under Creative Commons license 2.0 (Attribution, Non Commercial, Share Alike). Commercial distribution requires the authorisation of the copyright holders: Ippolita Collective and Feltrinelli Editore, Milano (.it) Ippolita Collective The Dark Side of Google (continued) Chapter 4 Algorithms or Bust! (second part) >From 'Brand Identity' to 'Participative Interface' Search, archiving and retrieval of data are procedures so complex that understanding them fully requires an amount of knowledge and explanation that are beyond the scope of this book. We will however look in detail a bit further down at some aspect of their functioning. And we should {in any case} have a closer look at the interface because that is the element which is fronted and managed by Google as representing its core image, whereas algorithm performances and the database architecture are components that remain invisible to the user. {In the case of Google,} [T]he interface is mostly the 'blank box'[*N8], the empty window where the user puts down his/her query, or 'search intention' on Google's universal homepage, which is designed in such a way as to exude welcome, reassurance, closeness. Google's homepage universal functionality stems from it being iterated across 104 languages and dialects, customizable in 113 different countries as per today [2007, -TR]. In some [all?] of those, the interaction model remains the same and unifies all search behaviours into one single, homogeneous format. Going to Google's homepage, one first notices a linear interface with key elements, each with a very specific and universally recognizable function. This frame will accept search queries of various nature and complexity, from simple key words (e.g. 'Ippolita') to more complex assemblage of words in brackets (e.g. "authors collective"), or it will enable to narrow the search more precisely: to a particular site, or a specific language, or a particular domain, or only to documents in a specified format, and so forth, depending on the level of specificity one is aiming at. We can take it as an example of a successful interface, in so far that it manages to fulfil the ambitious goal of assigning a positive value to an otherwise white space in a page. The interface presents itself without any adornment, almost empty, or rather, filled with one empty element, the 'blank box', which reassures the user, and induces her/ him into activity, warding of loss of attention {and her/ him leaving the site} due either to an absence of handles [i.e. something to hold on], or, conversely, because there are too many visual stimuli. This way, the confusion is avoided which often go together with pages filled to the brim (suffering apparently from the 'horror vacui' syndrome), trying to be attractive with a flurry of banners, graphics, animations, etc, only to communicate anxiety to the user in the process. Actually, surfing is not really possible on a Google page: all different components there have a purely functional purpose. Their goal is to have the user access a service, not to lead her/ him on a journey; their usage engenders behaviours which subsequently turn into routines of search, and become a default mode within a very short time. The interface is designed in such a way as to make usage, behaviour dynamics, and expectation of the average user iterative. Thus, even after allowing for the 'personalisation' of an user, search comportments remain basically identical, so much so that one can speak of a 'universal tool'. The organisation of texts and images is linear. It uses recurrent graphic elements, notably primary colors, and the images used are qualitatively homogeneous. The interface's display style is sober, to the point of austerity, despite the 'brand (and corporate) identity' the design reverberates [*N9]. It is informed by a specific aesthetic, which appeals to very elementary, yet in their simplicity, very effective, properties of perception. >From this almost instantaneous visual identification stem a facility of use far above that of Google's competitors' search engines. The level of ergonomy achieved by Google is mind boggling: it doesn't even needs to present itself as a basket of services in its interface, its visual architecture screams that message already. The different services' interfaces are all autonomous, separate and largely independent from each other, {and} they all carry the 'blank box' as hallmark, /and they are not directly linked to each other/. It is for instance not necessary to go through various complicated steps to reach the code.google.com service dedicated to technicians of all levels, when you come from the site images.google.com, which addresses a much larger public. You need only to 'go deeper' in the google.com site, and know how to search. Despite this fragmentation, we are all able to recognize the network of services Google offers; moreover, visitors are able to make use of the information sources in a combined and complementary manner. And this equally holds true for the 'browse-only' types, as for those who have developed a mild - or stark - addiction to Google's services (a.k.a. 'Google-totally-addicted', joyfully jumping on the bandwagon of each and every Google novelty)[*N10]. This decentralisation of services results in a particular relational mechanism, as Google users do not discover these new sectors {not so much} through Google itself, but {rather} by way of the informal network of {other} users, on other sites, where Google visitors tell of their habits and discuss their tastes and preferences. Users then automatically 'localise' themselves within the extensive gamut of Google services, something that happens as soon as they log in for a new service: for instance the appropriate language will immediately be offered according to geographic area {of the user's IP address}. It also becomes easy {for Google} to approximate the sort of users to which a particular service is directed, to evaluate what level of technical knowledge it will require, or to what extent there exist an affinity with other users {of the same service}. Thus, the ear-say mechanism becomes akin to a 'relationships-based {informal} PageRank' {system}. A first approximation would be to say that there exists a local relational dimension, where the ear-say, {'by word of mouth' communication}, concerns friends and acquaintances, together with a typological dimension of relationship, which is about particular classes of users, which can be identified by means of statistical parameters (age, sex, occupation, etc.), and who use a particular service and thereby kick {a particular type of} relational economy into being. It would appear that Google too [also?] does [not] escape {falling victim to} the ten problems relating to the use of websites discussed by Jakob Nielsen, one of the most prominent specialist of user interfaces [*N11]. Although written in [{standard?}] HTML language, Google's site is completely outside standards, and yet manages to be fully readable [i.e. compatible] with all browsers, whether graphic or linear, in use today[*N12]. [The second sentence seems to indicate a logical flaw in the first, so I'd tend to say that JN's critique does _not_ apply to Google, something the content of *N11 appears to confirm - French text blues ...] The neat graphic design of the {Google home-}page is further enhanced by an excellent visual organisation of its commercial aspects. No advertisement link whatsoever on the home page or in the documentation/ information pages. At Google's, ads are only on display together with the query returns, but clearly separated from these, although they are related to the matters the query was about. One can therefore say that Google is able to arrive, in the agencement of its interfaces, at an acceptable compromise between the respect due to its users and the necessity of economic returns. Advertisement, Google's main source of income, are displayed in such a way as not to be invasive and not distract users in their usage of Google's services. Advertisement links are sponsored in a dynamic fashion, adjusting to the user's trajectory within the search site first, and on the Internet [in general?] second. Commercial links are thus not static, they move along with the users' searches. This is made possible by the RSS-feed (for RDF Site Summary, or Really Simple Syndication), one of the most used formats for the distribution of web contents, and also thanks to the many different digital information (re)sources (dailies, weeklies, press bureaus etc.) Google is using to dynamically modify its home page, when it has been personalised by a user. This as Google lets registered users completely configure their Google start page thought the addition of a RSS-feed, making it possible to have the weather forecast for cities of one's choice, or to go through the history of previous searches, and all this at one's fingertips. Bookmark management, keeping track of the last incoming e-mails all become possible, but also checks on one's not web-related /computer/ files thanks to the 'Google Desktop' application. The commercial promotion mechanism [i.e. ads], the services and sophisticated profiling of users appear to constitute a coherent whole, both at the aesthetic and at the content level. And as they clamour themselves, sponsored links are nothing more than suggestions, though they are graphically compatible and conceptually cogent with the search operation in progress. Google's economy is so well integrated with its interface that it can vanish without harm being done from the vantage point of users who are not interested, while generating handsome profit from users who do show interest in the suggested commercial link-ups. Yahoo! and many other search engines and portals offer the same sort of facilities to personalise their home pages. Yet the quality and quantity of what Google has to offer remains unchallenged till the day of today. The configurations are rather simple, yet they do require some familiarity with Web interfaces and need some time to be put to work [by whom? users? Google itself? unclear in French text]. The treshold of attention on the Web is notoriously low, pages are visualised and then left within a very short time-span, {often} just a few seconds, and thus a user who 'invest'a couple of or even several minutes {in a website}, reveals /through her/ his choices/ a lot about her/ himself and her/ his habits as a consumer. These informations are then carefully memorised by the company {owning the search engine} (whether that is Google!, Yahoo! {or another firm}) and represent a true source of wealth produced {cost-free} by the user her/ himself. They are essential to the sponsoring companies offering targeted products and services. Home page personalisation makes a site more attractive and intimate: the site itself becomes some sort of private tool in which the user goes on investing time by choosing colors, tweaking its outlook, and selecting her/ his {favourite} content. A recurrent [habitual] visitor who is able to configure her/ his start page participates in the construction of the web interface. Giving the user the freedom of choice and control over a few pages means transforming her/ him from a simple target of advertisement into an 'intelligent' consumer {that is one you can extract 'intelligence' from}. To foster interaction is surely the best and yet subtlest way to achieve 'fidelity'. This is why one sees the multiplication of participative interface environments, where ads are increasingly personalised, in order to let us all enter together into the golden world of Google. PageRank[TM] or the absolute authority within a closed world The algorithm that enables Google to assign value to the pages its 'spider' indexes is known as 'PageRank'[TM]. We have already seen that PageRank[TM]'s mode of functioning is based on the 'popularity' of a web page, computed on basis of the number of sites that link to it. Given an equal number of links, two different web pages will have a different PageRank[TM], according to the 'weight' of the linking pages: this constitutes the 'qualitative' aspect of sites. To take a concrete example: quite often, when one checks out the access stats of a site, one encounters an enormous number of link-ups coming from pornographic sites. This is due to the fact that Google assigns ranking according to accessing links which appear in public statistics. There are therefore programs exploiting the invasive aspect of this connexion and node evaluation logic in order to push up the ranking. And pornographic sites are well-known to be pioneers in {this kind of smart} experiments (they were the first on the Web with image galleries, and with on-line payment models). As a number of [spider?] programmes are looking for sites with the help of public access statistics, a very large number of links are actually established through bogus visits. These come from a fake link on another site, which is most often pornographic. This devious mechanism literally explodes the number of access to that site, causing its statistics to swell, and its (Google) ranking to lift up, which in last instance benefits the pornographic site issuing the fake link in the first place. It looks like a win-win situation, at least where visibility is concerned. And it is not an 'illegal operation' [;-)] either: nothing forbids linking up to an Internet site. This practice causes sites with public statistics to have a higher ranking {than non-public stats sites}. This mechanism illustrates how Google's ranking's 'technological magic' and 'objectivity' are actually connected to the 'underground' of the Net, and is {partially} grounded on less savoury practices. Other {perfectly} legit practices have been documented that exploit Google's approach to indexation, such as Search Engine Optimization (SOE), a suite of operations pushing up the ranking of a website in search returns. Getting to the #1 position, for instance, is often achievable through spamming /from out improbable addresses by automatic programmes, with stupendous effects/. "We register your site with 910 search engines, registries and web-catalogues! We bring your site in pole position on Google and Yahoo! Try Us! No risk, just US$299 instead of US$349! - one shot, no obligations!". Of course {, confronted to this,} Google still plays the transparency card: "nothing can guarantee that your site will appear as #1 on Google" [*N14]. Mathematically speaking, a feature of PageRank[TM], which is based on the analysis of links, is that the data base must be {fully} integrated, or with other words, that the search operations can only take place within a circumscribed, albeit extremely vast, space. That means that there is always a road that leads from one indexed web page to another indexed web page - in Google's universe, that is. Searches therefore, will tend to be functional, by avoiding 'broken links' as much as possible, and also returns that are substantially different from what had been archived in the 'cache memory'. But the ensuing problem is that users will be falsely made to believe that Internet is a closed world, entirely made up of {transparent} links, without secret paths and preferential trajectories, because it would seem that, starting from any given query, a 'correct' response will always be returned. This is the consequence of the 'Googolian' view of the Internet as exclusively made up of the journeys its own spider accomplishes jumping from one web link to the next. If a web page is not linked anywhere, it will never appear to a user since Google's spider had no opportunity find, weight and index it. But this does in no way mean that such things as 'data islands {on the Net}'do not exist! Dynamic sites are a good example of this as their functionalities are entirely based on the choices the user has made. A typical instance is the [abysmally faktap - TR] site http://voyages.sncf.com {owned by the French Railways}. Filling in the appropriate form gives you train times, onward connections, fastest itineraries etc. for any given destination in real time [or so they say - Good Luck! -TR (uses bahn.de or sbb.ch instead, esp. for domestic France trips)]. Google's system is unable to grasp these forms' queries and hence does not index the results that have been dynamically created by {the site} voyages.sncf.com . Only a human person can overcome this hurdle. The only solution Google is able to provide is to redirect the user to the rail companies' or airlines' own sites when an itinerary, time table or destination is asked for. This is the reason why the idea of the exhaustiveness of Google's data bases must be challenged and discounted, as these {falsely} conjure up the notion of one unique universe for all of us, which it is complete and closed {and is called 'Google'}. Quite the contrary is the case, as the act of mapping a trajectory in a complex network always means an exploration with {only} relative and partial results. The dream of a Google "which has a total knowledge of the Internet" is demagogic nonsense whose sole aim is to perpetuate the idea that the information provided is accurate and exhaustive, elevating Google, as it were, into a unique, truth dispensing service {- the Internet equivalent of the One Party System}. Such an absolute fencing-off works admittedly well in everyday searches, because it leads speedily to results. But in reality, within a complex networked system, there is no such thing as an absolute truth, but only a trajectory-induced evaluation, or even a time-induced one, depending on how long one wishes to spend on a (re)search. The quality {of a search} is {also} entirely dependent on the subjective perception we have of the returns, considered as acceptable, or less so. The networks we are able to explore, analyse and experience are complex objects whose nodes and linkages are constantly shifting. And if the decision as to find the results of a search acceptable {or not} depend on the user in last instance, then the exercise of critical faculties is essential, together with a sharp realisation of the subjectivity of her/ his own viewpoint. In order to generate a trajectory that is truly worth analysing, it is necessary to presuppose the existence of a limited and closed network, of a world made up only of our own personal exigencies, yet at the same time knowing full well that this is a subjective localisation, neither absolute, nor remaining the same in time. To explore the Net means to be able to carve the Net up in smaller sub-nets for the sake of analysis; it amounts to creating small, localised and temporary worlds [*N15]. It turns out that in everyday practice, chance linkages are of utmost importance: the emergence of new and unexpected relationships can by no means be predicted by the analysis of the web's separate elements, such as Google's ranking {system} suggests. These linkages fulfill the function of 'dimensional gateways' and allow for the lessening, or even the rank abolition, of distances between two nodes /in the network/. PageRank[TM]: science's currency? Contrary to common belief, the PageRank[TM] algorithm is not an original discovery by Google, but is based on the works of the Russian statistical mathematician Andrej Andreievich Markov, who analysed statistical phenomena in closed systems at the beginning of the 20th Century. Closed systems are understood as ones where each and every element is by necessity either the cause or the outcome of (an) other element(s) in that system [*N16]. Sergei Brin's and Larry Page's work must have been based on this /theory/, although the further advances they made therein have not entirely been publicly disclosed, aside from the Stanford patent [assuming that that is public - the French text, convoluted and unclear, would almost say the opposite]. Maybe the best way to understand the nature of this algorithm is to look at what happens between friends. In a community of friends the more one talks about one shared event or experience, the more it grows in importance, to the point of becoming something of common lore [here again, the French text lets me down, talks of ... "password between friends" ...] If the knowledge about this given event is confined to a narrow circle, it will not become very famous. The same logic applies to celebrities /in the show business/. The more they manage to be talked about, the more their ranking rises, the more famous they are, the more they become celebrities (this is the reason why there are so-many self-referential shows on television, like "Celebrity Farm" and others.) Google puts exactly the same mechanism to work in handling data. But Google is much more convincing in its image management by spreading the idea that Internet should be seen as a vast democracy, since the algorithm functions as if links were votes in favor of sites. And it doesn't matter whether the link speaks good or bad about a site, the important thing is to be spoken about {i.e. linked}. The deception inherent to this 'global democracy' arrived at by an algorithm is immediately obvious: as if democracy was something coming out of technology and not of the practices of human individuals! We have already stressed [*N17] that the cultural origins of such a worldview stem from the extremely elitist peer review system as practiced by scientific publications, where each researcher's contribution fits into a network of relationships, of evaluations and verifications enabling the communication and control of scientific research results. Google's 'global democracy' hence amounts to transfering the 'scientific method' /of publishing/ on the Web by way of the PageRank[TM] algorithm, functioning as 'technological referee' which is able to objectively weight the informations on the web and to order them according to the choices expressed through their links by the 'People of the Net'. The likeliness is striking: on one hand we have scientific publication which acquire influence and authority in accordance to their ranking within their particular discipline, and this ranking is obtained by way of citations {('quotes')}, that is by being cross referenced in the specialised literature. This is the way scientific research guarantees coherence: by ensuring that no new publication exits in a void, but function as the 'current art' within the long history of scientific endeavour. And then on the other hand, we have web pages whose links are taken by Google's spider as if they were as many 'citations' which increase the status, and hence the ranking of these pages. Scientific elitism, the prime mover of the awe which 'science' inspires, is curiously based on publication. Publishing by the way, i.e. making public, does by no means mean making 'accessible' or 'understandable' [*N18]. Indeed, it was the contention of sociologist Robert Merton in the seventies {of the previous century} that 'scientific discoveries', whether theoretical or experimental, cannot, will not, and should not be considered truly scientific unless they have been 'permanently integrated {into the body of scientific knowledge}'[*N19]. This statement might appear somewhat apodictic (after all, science in Antiquity was not at all 'publicly' transmitted - think of the Pythagorean school in ancient Greece, or of the distinction made between 'esoteric' and 'exoteric' writings, etc), but it does clearly evidence the eminently public character of modern day science. Communication hence is not a derived product of research, but the integral part of a form of knowledge based on accumulation and co-operation. Science, at least since the 16th Century, on one hand strives for new results which would constitute an augmentation of the cognitive capital, but recognises on the other previous research as the {necessary and unavoidable} departure point of those. One can therefore initiate a history of scientific communication which would develop in parallel with that of its media supports: from the voluminous correspondence scientists used to maintain with each others, through the periodical publications in scientific reviews up to the electronic communication carriers of today. And it is not fortuitous that the first Internet nodes were academic research centers /which had the need to communicate and share information/. Nonetheless, the evolution of carriers did not influence the basic tenets of the scientific method's mode of communication, which remains based on citations. Dubbed 'science's currency' in some quarters, citations function as tokens of honour given by scientists to the people who taught and/or inspired them. More concretely it links present to past research, whether from the same, or from different authors. And it makes indeed sense to consider that the number of citations {('quotes')} a certain piece of research has attracted reflects its importance, or at least its impact, on the scientific community. With time, this system has become itself the object of specific research: bibliometrical analysis is a discipline which uses mathematical and statistical models to analyse the way information is disseminated, especially in the field of publications. In fact bibliometry, and then especially its best-known indicator, the 'impact factor'[*N20] is being commonly used as an 'objective' criterion to measure an individual researcher's scientific output or that of an {academic} institution. A vast archive of bibliometric data has been put on line in 1993 - at Stanford precisely, the cradle of Google. The SPIRES Project (for Stanford Public Information Retrieval System) was born in 1974 out of the series of bibliographical notes about articles on high energy physics established by the library of Stanford University. Because its domain of analysis is limited {and well-defined}, SPIRES is an exhaustive, publicly accessible, and free database, making complex searches possible on the body of citations. It is likely that Brin and Page were able to study and emulate this methodology when developing their own PageRank[TM] system. But besides this algorithm itself, there are more adaptive features which have contributed to make Google a true 'global mediator' of the World Wide Web. END of Chapter 4 (to be continued) -------------------------- Translated by Patrice Riemens This translation project is supported and facilitated by: The Center for Internet and Society, Bangalore (http://cis-india.org) The Tactical Technology Collective, Bangalore Office (http://www.tacticaltech.org) Visthar, Dodda Gubbi post, Kothanyur-Bangalore (http://www.visthar.org) # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mail.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: nettime@kein.org