Big data has to show that it is not like Big Brother
Sales of George Orwell’s Nineteen Eighty-Four have risen since Edward Snowden revealed how the National Security Agency of the US gains access to telephone records and data from technology companies. So far, if people do not exactly love Big Brother, they are prepared to accept some invasion of their privacy in return for security.
自爱德华•斯诺登(Edward Snowden)揭露美国国家安全局(NSA)是如何从技术公司获取电话记录和数据以来,乔治•奥威尔(George Orwell)《一九八四》(Nineteen Eighty-Four)的销量便一直在上升。迄今为止,为了换取安全保障,即便人们不那么喜欢“老大哥”,他们也做好了放弃部分隐私权的准备。
What about “big data”? Companies that hold rapidly expanding amounts of personal information are using new kinds of data analysis and artificial intelligence to shape products and services, and to predict what customers will want. Larry Page, Google’s chief executive, describes his ideal form of technology as “a really smart assistant doing things for you so you don’t have to think about it”.
那么“大数据”呢?一些公司正凭借手中规模迅速增长的个人信息,利用各种新型数据分析方法和人工智能,来进行产品和服务决策,以及预测客户的需求。谷歌(Google)首席执行官拉里•佩奇(Larry Page)表示,他眼中的理想技术就像“一名高度智能化的助手,为你做各种事情,免得你自己操心。”
The vision of living in a virtual Downton Abbey, with a computer to plan your day, suggest the best route to travel, the films you might want to watch and the best flight to catch – even to book it for you – has an allure. We are all pressed for time and want an easy life. Instead of being bombarded with information and forced to choose, it’s nice to get personal service.
But just as the NSA disclosures have taken people by surprise, although it has existed for 60 years, I doubt whether many grasp either the size of the data trail they create daily, or the advances in technology that are permitting a select group of big data enterprises to exploit it. The technology is evolving so quickly that what was unthinkable two years ago is routine.
“It is both a wonderful and scary future. Companies with huge amounts of data will know more about you than yourself. They will be able to predict what you might do next,” says Kai-Fu Lee, a Beijing-based investor and the former head of Google in China.
现居北京的投资人、前谷歌大中华区总裁李开复(Kai-Fu Lee)表示:“这是一幅既美好又可怕的前景。拥有海量数据的公司会比你自己还了解你。它们将能够预测出你接下来可能要做什么。”
In a column last week I compared Google to General Electric in the late 19th century – an innovative industrial enterprise riding a wave of new technology. The flip side of that is that Google, Amazon, Microsoft and other technology giants are amassing powers that need to be controlled carefully.
在最近的一篇专栏中,我将谷歌比作19世纪末的通用电气(General Electric)——那个创新型工业企业、新技术的“弄潮儿”。而另一方面,谷歌、亚马逊(Amazon)、微软(Microsoft)等技术巨擘正在积聚各种必须严加管控的力量。
The NSA and big data companies put their databases and computing power to different uses – one to identify spies and terrorists, and the others to match services to users. They have in common the use of very large databases and techniques such as pattern recognition and network analysis.
At the advanced end, this shades into artificial intelligence of the kind that, for example, intuits what you meant to search for even when you misspell the key words; can translate speech into another language in real time (as Microsoft demonstrated in China last year); or learns to recognise a photograph of a cat by viewing thousands of images.
The ability of computers to learn in a similar manner to humans is known as “deep learning” and it is notable that Google has hired several pioneers in the field, including the scientist and author Ray Kurzweil. Among the technology transfer offered by the NSA to private US companies are “cutting-edge machine learning technologies”.
计算机与类似人类的方式学习的能力被称为“深度学习”(deep learning)。令人瞩目的是,谷歌已聘请多位该领域的先驱人物,包括科学家兼作家雷•库兹韦尔(Ray Kurzweil)。美国国家安全局提出愿意移交给美国私营公司的技术中,有一项是“尖端机器学习技术”。
Such software can infer a lot from scraps of information, provided that it has enough of them, as shown by the NSA’s effort to analyse phone call metadata from Verizon (and perhaps other operators). President Barack Obama assured Americans that “no one is listening to your phone calls”, but this alone is a trove.
如美国国家安全局对来自Verizon(或许还有其他运营商)的通话元数据的分析所示,只要零散信息的数量足够大,此类软件便可从中推断出许多事实。美国总统巴拉克•奥巴马(Barack Obama)向美国人保证“没有人在偷听你的电话”,但这个保证本身也意外暴露了一些问题。
A study by Latanya Sweeney, a professor at Harvard University, found that 87 per cent of people can be identified simply by knowing their age, gender and postcode, if these are cross-checked against public databases. That is typical of the data collected by social networks and internet companies.
哈佛大学(Harvard University)教授拉塔尼娅•斯威尼(Latanya )的研究显示,只要知道一个人的年龄、性别和邮编,并与公开的数据库交叉对比,便可识别出87%的人的身份。社交网络和互联网公司收集的数据呈现出很强的身份特征。
The extraordinary power of big data companies comes from being able to combine the personal data of customers with observations about them, from which products they buy to where (as measured by global positioning satellite data from mobile phones) they are. That produces a set of “inferred data” about what they probably want.
大数据公司之所以非常强大,是因为它们能够将客户的个人信息与他们的行为特征结合起来,从他们购买了哪些商品,到他们身在何处(来自从手机上收集的全球定位卫星测量数据)。这可以生成一系列关于客户可能需求的“推测数据”(inferred data)。
If I search on an Android phone for “Taj Mahal” while standing in India, for example, Google will prioritise results for the shrine in Uttar Pradesh. If I do the same in Brick Lane, east London, it will suggest local Bangladeshi restaurants. How long before it offers to book a restaurant based on how I rated others as I walk around a foreign city at dusk?
例如,如果我在印度时用一部安卓(Android)手机搜索“泰姬陵”(Taj Mahal),谷歌将优先显示位于印度北方邦(Uttar Pradesh)的那座圣地。如果我在东伦敦砖块街(Brick Lane)进行同样的搜索,谷歌将列出位于那里的的孟加拉餐馆。当我在黄昏时分漫步在异国城市时,谷歌会根据我对其他餐馆的评价为我预订一家餐馆——这样的事情还要过多久才能变成现实?
At one level, I would be pleased if it did (as long as it was a good one) since it would save me doing the work myself. At another, as a World Economic Forum report on personal data put it: “Inferred data can feel like an all-knowing Big Brother watching the security camera.”
一方面,如果谷歌能帮我预定,我会很高兴(只要它预定的那家餐馆靠谱),因为这将省去我自己来的麻烦。另一方面,正如世界经济论坛(World Economic Forum)一份关于个人数据的报告所说:“推测数据可能像一个无所不知、盯着监控摄像头的‘老大哥’。”
One of the concerns that springs from this is that big data companies with such software are very difficult to compete with. The more data that I and other users provide them with, the better they are at predicting what we want. The machine brain becomes cleverer with use.
Another is trust. Social networks have been poor at protecting users’ data, and they hold only a fraction of the information on people’s behaviour, habits and intentions on the new generation of services. It is no wonder that the NSA turns to them – it has computing power and they have swaths of material.
A third is ownership. We each have rights over our own information, but what happens when it gets mixed up with that of others and combined into a vast database of intentions? If I change my mind, how can it be unscrambled?
Above all, we don’t know what this technology means because we are only at the beginning of the era of big data. There are plenty of aspects to admire but it will take some time to love.