User:Dr. Greywolf/有數學模型揭示創新係以乜規律出現嘅

出自維基百科,自由嘅百科全書

以下係恐狼博士譯嘅一篇文。篇文嚟自 MIT Technology Review。喺譯嗰陣,譯者唔係吓吓照字面譯嘅,而且亦加咗啲註解同副標題,目的係幫對呢方面唔係噉熟嘅讀者理解篇文。

原文拎:Mathematical Model Reveals the Patterns of How Innovations Arise.

粵文標題:有數學模型揭示創新係以乜規律出現嘅

粵文全文[編輯]

呢份研究可以創造出一套新嘅方法嚟研究乜嘢係可能嘅,以及可能性點樣跟隨已有嘅嘢出現。

背景[編輯]

創新(innovation)係我哋世界當中其中一股最有帶動性嘅力量。對廿一世紀嘅社會嚟講,持續噉創造新嘅諗頭以及將新諗頭變做科技同產品係一個好重要好基本嘅過程。事實係,好多大學同院校,仲有係好似矽谷等嘅地區,一路都好落力嘗試促進呢個過程。

之但係創新呢個過程又有少少神秘:有好多研究者-包括經濟學家及人類學家以至進化生物學家同工程師-都研究創新過程。佢哋目的係要理解創新係點發生嘅同埋由乜嘢因素帶動,為嘅係想將環境改造成更加有利於未來創新嘅樣。

但到目前為止,呢個做法所取得嘅成功有限。研究者有小心噉量度創新出現同消失嘅率。創新出現嘅率跟從一柞(科學家喺好多環境下都觀察得到嘅)規律,但係就冇人能夠解釋呢啲規律點樣出現,以及點解呢啲規律會左右創新。

到咗今日,呢一切都因為意大利羅馬大學(Sapienza University of Rome)學者 Vittorio Loreto 同佢嘅幾個同事嘅研究而改變。佢哋創出咗史上第一個能夠準確噉模擬創新所跟從嘅規律嘅數學模型。呢份研究打開咗一條路,通向一種新嘅方法嚟研究創新-(創新係指)乜嘢係可能嘅以及呢啲可能性點樣跟隨已有嘅嘢出現。

描述創新嘅數學模型[編輯]

「創新源於已有嘅嘢同有可能嘅嘢之間嘅相互作用」嘅諗頭係由複雜理論(complexity theory;一個將複雜性概念化嘅理論)家 Stuart Kauffmann 首次提出嘅。喺 2002 年,Kauffmann 提出「相近可能性」(adjacent possible)呢個概念嚟思考生物進化(註:Kauffmann 佢本人用呢個概念諗生物進化,但呢個概念查實仲可以攞嚟諗語言同文化等嘅進化)。

相近可能性係指一啲同已有嘅嘢衹差一步嘅嘢-「嘢」可以係諗法、字、歌曲、份子基因庫、同科技等等(註解:例如喺發明現代飛機之前,首先要發明簡單嘅飛機,所以「簡單嘅飛機」對於未有任何飛行機械嘅人類嚟講就係一個相近可能性)。相近可能性將一個現象嘅實現同未探索過嘅可能性嘅空間連接埋一齊。

但個諗頭(相近可能性)要用數學模型嚟模擬有啲難。「未探索過嘅可能性嘅空間」包括所有容易想像同預期嘅嘢,但又包括埋啲對人嚟講完全難以想像同預期嘅嘢。前者要用數學模型模擬有啲撈絞,而後者要用數學模型模擬就接近冇可能。

除咗噉,每一次嘅創新都會改變「未來可能性」包含啲乜。所以喺每一刻,未探索過嘅可能性嘅空間-相近可能性-都喺度變緊。

Vittorio Loreto 同埋佢同事噉講:「雖然相近可能性嘅創造力廣泛噉俾人當係趣聞軼事式噉欣賞,但我哋認為呢個概念喺科學文獻上嘅重要性俾人低估咗。」

Heaps 定律[編輯]

之但係,雖然創新有噉嘅複雜性,創新現象似乎係會跟從一啲容易預測同量度嘅規律,呢啲規律因為周圍都有而成為咗「定律」。其中一條噉嘅定律係 Heaps 定律,Heaps 定律話新嘢嘅數量以一個次線性(sublinear)嘅率上升。換句話講,創新現象由以下嘅冪定律(power law)主宰:V(n) = knβ,當中 β 數值喺 0 同 1 之間(註解:有 V 係指一個由 n 件同類物件組成嘅嘢當中,有幾多款唔同嘅嘢,而 k 以及 β 係某啲常數;例子可以睇跟住嗰兩段)。

呢條可以用嚟思考語言嘅進化(註解:語言嘅進化基本係啲人對字詞作出創新),而一隻語言係不斷噉隨住新字出現同舊字消逝進化緊嘅。

呢個進化過程(註解:指語言進化嘅過程)跟從 Heaps 定律。搵一段字,當中有 n 個字,呢段字當中有幾多個唔同嘅字(V(n))同 nβ 次方成正比。喺分析過現實世界嘅文章之後,β 嘅數值發現係喺 0.4 同 0.6 之間。

Zipf 定律[編輯]

另一個喺創新研究上重要嘅統計規律係 Zipf 定律,呢條定律一個創新嘅出現頻率同個創新嘅人氣相關。舉個例,喺一段字當中,出現得最多嘅字嘅出現率大約係出現得第二多嘅字嘅出現率嘅兩倍,而出現得第二多嘅字出現率係出現得第三多嘅字出現率嘅三倍,如此類推。喺英文入面,出現得最多嘅字係「the」,平均會佔一段字嘅 7%,而出現得第二多嘅字係「of」,平均佔一段字嘅 3.5%,出現得第三多嘅字係「and」,如此類推。

問題[編輯]

呢啲規律(註:指 Heaps 定律同 Zipf 定律)係實證定律(empirical law)-我哋知呢啲定律係因為,我哋可以直接量度呢啲定律所描述嘅現象。但至於「點解會有啲噉嘅規律」就係未知。雖然數學家可以將數字代入去呢啲式嗰度嚟模擬創新,但佢哋希望可以有一個數學模型,能夠由一啲基本原則嗰度推理出呢啲方程式(註解:即係有一個系統化嘅理論解釋成個現象)。

跟住就有 Loreto 同佢班同事(其中一個係康奈爾大學數學家 Steve Strogatz)。佢哋喺史上首次創出咗一個模型解釋點解會有啲噉嘅規律。

Pólya 缸[編輯]

開始嗰陣,佢哋用 Pólya 缸呢一個出名嘅數學模型:想像一個裝咗好多唔同顏色嘅波嘅缸;跟住隨機噉抽個波出嚟,睇吓佢,再將呢個波同若干個同色嘅波擺返入個缸嗰度,令到將來再抽到呢個色嘅波嘅機會率提升。

數學家會用呢個模型模擬富者愈富嘅效應同冪定律嘅出現,所以可以攞嚟做模擬創新嘅起點。不過,呢個模型就噉唔會產生出 Heaps 定律所預測嘅次線性增長。

噉係因為,(最基本嗰個)Pólya 缸模型容許創新嘅所有結果(發現一隻顏色),但冇諗到一個創新會點影響相近可能性嘅非預期結果。

改咗嘅 Pólya 缸[編輯]

所以 Loreto、Strogatz、同佢哋同事改咗吓 Pólya 缸模型,等個模型喺發現一隻新顏色嗰陣可以觸發完全非預期嘅結果。佢哋稱呢個模型做「有創新觸發嘅 Pólya 缸」(Polya’s urn with innovation triggering)。

喺呢個新模型當中,想像一個裝咗好多唔同顏色嘅波嘅缸;跟住隨機噉抽個波出嚟,睇吓佢,再俾第啲波取代-

如果呢隻色係之前抽過嘅,個人會將若干個同色嘅波擺返入個缸嗰度。但如果隻色係之前未抽過嘅,噉個人會將若干個有全新嘅色嘅波擺返入去個缸嗰度。

結果[編輯]

Loreto 同佢同事跟住就計吓,「抽到未抽過嘅色嘅次數」同其頻率分佈會點樣隨時間變化。結果係,個模型會重現出 Heaps 定律同 Zipf 定律喺現實世界所產生嘅結果-喺數學上係第一次。Loreto 同埋佢同事講:「有創新觸發嘅 Pólya 缸呢個模型係(創新研究相關嘅)數學上第一次由一個基本原則嗰度推導出實證上得到嘅觀察結果。」

呢班研究團隊亦都搵出,呢個模型預測到創新點樣喺現實世界出現。個模型準確噉預測啲人點編輯維基頁、social annotation systems 點樣出現 tag、同一段字入面嘅字串,以及人類點樣喺網上音樂清單嗰度發現新歌。

得意嘅係,呢啲系統包含兩種唔同嘅發現:一方面,世上有啲嘢不嬲都存在,但對搵到佢哋嘅人嚟講係前所未見嘅,例如係網上音樂清單入面嘅歌噉;另一方面,又有啲嘢係從來未存在過全新嘅,例如係維基百科上嘅編輯噉。

Loreto 同埋佢同事將前者(不嬲都存在,但對搵到佢哋嘅人嚟講係前所未見嘅嘢)稱為「新穎嘢」(novelty),將後者(從來未存在過全新嘅)稱為「創新」(innovation)。

令人好奇嘅係,同一個模型能夠解釋嗮兩種現象。似乎我哋發現新穎嘢背後嘅規律同由相近可能性出現嘅創新背後嘅規律係一樣嘅。

呢點引起咗一啲(除咗「點解會噉」之外)有趣嘅問題。而呢點亦都開放咗一種新嘅方法嚟思考有關創新同引致創新嘅事件。Loreto 同埋佢同事講:「呢個結果提供咗一個起點,嚟到更深入噉了解相近可能性同埋對生物、語言、文化、科技進化重要嘅事件嘅本質。」

我哋會期待睇到呢份研究會點影響創新研究。

原文[編輯]

The work could lead to a new approach to the study of what is possible, and how it follows from what already exists.

Innovation is one of the driving forces in our world. The constant creation of new ideas and their transformation into technologies and products forms a powerful cornerstone for 21st century society. Indeed, many universities and institutes, along with regions such as Silicon Valley, cultivate this process.

And yet the process of innovation is something of a mystery. A wide range of researchers have studied it, ranging from economists and anthropologists to evolutionary biologists and engineers. Their goal is to understand how innovation happens and the factors that drive it so that they can optimize conditions for future innovation.

This approach has had limited success, however. The rate at which innovations appear and disappear has been carefully measured. It follows a set of well-characterized patterns that scientists observe in many different circumstances. And yet, nobody has been able to explain how this pattern arises or why it governs innovation.

Today, all that changes thanks to the work of Vittorio Loreto at Sapienza University of Rome in Italy and a few pals, who have created the first mathematical model that accurately reproduces the patterns that innovations follow. The work opens the way to a new approach to the study of innovation, of what is possible and how this follows from what already exists.

The notion that innovation arises from the interplay between the actual and the possible was first formalized by the complexity theorist Stuart Kauffmann. In 2002, Kauffmann introduced the idea of the “adjacent possible” as a way of thinking about biological evolution.

The adjacent possible is all those things—ideas, words, songs, molecules, genomes, technologies and so on—that are one step away from what actually exists. It connects the actual realization of a particular phenomenon and the space of unexplored possibilities.

But this idea is hard to model for an important reason. The space of unexplored possibilities includes all kinds of things that are easily imagined and expected but it also includes things that are entirely unexpected and hard to imagine. And while the former is tricky to model, the latter has appeared close to impossible.

What’s more, each innovation changes the landscape of future possibilities. So at every instant, the space of unexplored possibilities—the adjacent possible—is changing.

“Though the creative power of the adjacent possible is widely appreciated at an anecdotal level, its importance in the scientific literature is, in our opinion, underestimated,” say Loreto and co.

Nevertheless, even with all this complexity, innovation seems to follow predictable and easily measured patterns that have become known as “laws” because of their ubiquity. One of these is Heaps’ law, which states that the number of new things increases at a rate that is sublinear. In other words, it is governed by a power law of the form V(n) = knβ where β is between 0 and 1.

Words are often thought of as a kind of innovation, and language is constantly evolving as new words appear and old words die out.

This evolution follows Heaps’ law. Given a corpus of words of size n, the number of distinct words V(n) is proportional to n raised to the β power. In collections of real words, β turns out to be between 0.4 and 0.6.

Another well-known statistical pattern in innovation is Zipf’s law, which describes how the frequency of an innovation is related to its popularity. For example, in a corpus of words, the most frequent word occurs about twice as often as the second most frequent word, three times as frequently as the third most frequent word, and so on. In English, the most frequent word is “the” which accounts for about 7 percent of all words, followed by “of” which accounts for about 3.5 percent of all words, followed by “and,” and so on.

These patterns are empirical laws—we know of them because we can measure them. But just why the patterns take this form is unclear. And while mathematicians can model innovation by simply plugging the observed numbers into equations, they would much rather have a model which produces these numbers from first principles.

Enter Loreto and his pals (one of which is the Cornell University mathematician Steve Strogatz). These guys create a model that explains these patterns for the first time.

They begin with a well-known mathematical sand box called Polya’s Urn. It starts with an urn filled with balls of different colors. A ball is withdrawn at random, inspected and placed back in the urn with a number of other balls of the same color, thereby increasing the likelihood that this color will be selected in future.

This is a model that mathematicians use to explore rich-get-richer effects and the emergence of power laws. So it is a good starting point for a model of innovation. However, it does not naturally produce the sublinear growth that Heaps’ law predicts.

That’s because the Polya urn model allows for all the expected consequences of innovation (of discovering a certain color) but does not account for all the unexpected consequences of how an innovation influences the adjacent possible.

So Loreto, Strogatz, and co have modified Polya’s urn model to account for the possibility that discovering a new color in the urn can trigger entirely unexpected consequences. They call this model “Polya’s urn with innovation triggering.”

The exercise starts with an urn filled with colored balls. A ball is withdrawn at random, examined, and replaced in the urn.

If this color has been seen before, a number of other balls of the same color are also placed in the urn. But if the color is new—it has never been seen before in this exercise—then a number of balls of entirely new colors are added to the urn.

Loreto and co then calculate how the number of new colors picked from the urn, and their frequency distribution, changes over time. The result is that the model reproduces Heaps’ and Zipf’s Laws as they appear in the real world—a mathematical first. “The model of Polya’s urn with innovation triggering, presents for the first time a satisfactory first-principle based way of reproducing empirical observations,” say Loreto and co.

The team has also shown that its model predicts how innovations appear in the real world. The model accurately predicts how edit events occur on Wikipedia pages, the emergence of tags in social annotation systems, the sequence of words in texts, and how humans discover new songs in online music catalogues.

Interestingly, these systems involve two different forms of discovery. On the one hand, there are things that already exist but are new to the individual who finds them, such as online songs; and on the other are things that never existed before and are entirely new to the world, such as edits on Wikipedia.

Loreto and co call the former novelties—they are new to an individual—and the latter innovations—they are new to the world.

Curiously, the same model accounts for both phenomenon. It seems that the pattern behind the way we discover novelties—new songs, books, etc.—is the same as the pattern behind the way innovations emerge from the adjacent possible.

That raises some interesting questions, not least of which is why this should be. But it also opens an entirely new way to think about innovation and the triggering events that lead to new things. “These results provide a starting point for a deeper understanding of the adjacent possible and the different nature of triggering events that are likely to be important in the investigation of biological, linguistic, cultural, and technological evolution,” say Loreto and co.

We’ll look forward to seeing how the study of innovation evolves into the adjacent possible as a result of this work.

睇埋[編輯]