Tjebbe van Tijen/Imaginary Museum Projects on Tue, 27 Dec 2005 22:18:06 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Statistically Improbable Phrases and the 'real reader' |
Since a few months Amazon Books have introduced a new device: Statistically Improbable Phrases (shortened to SIP). To give an example, for the book Armstrong, David F. ()/2000, William C. Stokoe Jr ()/Wilcox, Sherman () "Gesture and the nature of language" 1995/Cambridge University Press The following SIPs are given: > Statistically Improbable Phrases (SIPs): (learn more) > primary sign languages, visible gestural, spoken language phonology, > language modular, visible gestures, signed languages, sublexical > level, sign language word, gestural approach, semantic phonology, > spatial syntax, grammar module, gestural theory, vocal gestures, deaf > signers, associationist theories, perceptual categorization, image > schemata, grammatical processing, primary consciousness, global > mappings, iconic gestures, modular theories, adaptive complex, order > consciousness By clicking on one of these 'phrases' a web page with other books with the same phrase and the number of occurrences of that particular SIP will be generated. The idea of SIP is explained on the Amazon site: > Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most > distinctive phrases in the text of books in the Search Inside! > program. To identify SIPs, our computers scan the text of all books in > the Search Inside! program. If they find a phrase that occurs a large > number of times in a particular book relative to all Search Inside! > books, that phrase is a SIP in that book. > SIPs are not necessarily improbable within a particular book, but they > are improbable relative to all books in Search Inside!. and a new Wikepdia entry reads: > Statistically Improbable Phrases is a system developed by Amazon.com > to compare all of the books they index and find phrases in each that > are the most unlikely to be found in any other book indexed. > The system is used to find the most unique portions of books for use > as a summary or keyword. This new device prompted me to the following reaction to the Amazon Book team: (1) Well statistics of what? is my first question... I suggest you supply basic statistics about the source of your SIPs: - how many books/titles have you indexed - are these full text indexes or just indexes of the 'inside the book' pages you do supply on the web - how many million words - how many sentences When this is not given it is like the manipulative percentages of a census or opinion poll without the total number of people that form the basis of these percentages. (2) Though it might seem stupid to say, I would like you also to state explicitly that these SIPs are generated <automatically> according a certain algorithm, also explaining in more detail what that algorithm entails. (3) As people have been trying to jump 'up' the list of Google's search machine rating, an unanticipated effect might be that writers, editors and publishers would check a new text before publication for occurrence of SIPs and make alterations to get a higher score. This might generate only statistically a more "outstanding" text. (4) We still need to value the most ourselves, us humans, because we are the only ones that can 'read' (though machines can process text alright, but there is no form understanding in the sense that each human reader becomes a re-writer when "processing" a text in her or his personal way). The reader's reviews on your website do give that kind of understanding and are often very helpful in learning about a book and its reception. Recently I started to archive some of the Amazon Books customer reviews in my bibliographical database. The on-line reader reviews are part of a very old tradition, like the Renaissance 'commonplace books' and the Greek/Roman 'hypomnemata' filled with quotations and remarks that students would make to keep for themselves make and show to each other. The value of readers comments lies in the rephrasing and synthesizing of the content of a book, something that can only be appreciated by 'reading'. The mechanisms of 'rating', choosing the top ten, hundred or whatever, are an undeniable a part of our market oriented culture, still - even in a pure commercial setting like Amazon Books - there can be a prominent place for personal exchange of opinions between 'real reader's, beyond any automated statistics. An exchange that allows for both praise and critique outside the realm of professional and commercial reviewing. (5) Sip-ratings can well develop into an useful search instrument, but let it be a well understood that it is just a product coming from ' machine processing', a secondary tool at most. Tjebbe van Tijen Imaginary Museum Projects dramatizing historical information http://imaginarymuseum.org # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: nettime@bbs.thing.net