分散運算

分散式運算fan1 saan2 sik1 wan6 syun3（英文：distributed computing），喺粵文當中又有叫做分佈式運算fan1 bou3 sik1 wan6 syun3，係電腦科學嘅一個子領域，專門研究分散式系統（distributed system）嘅設計。一個分散式系統有多部位於唔同－彼此之間有網絡嘅－電腦，呢啲電腦會互相通訊，靠呢啲通訊協調彼此嘅行動，用嚟達到某啲共同目的^[1]，由用家嘅角度睇望落好似係一部單一嘅電腦噉^[2]。

總括嚟講，一個分散式系統有幾個有用嘅特徵^[2]^[3]：

共用資源（shared resources）：例如 5 部電腦組成一個分散式系統，共用一部打印機，其中一部電腦唔使用打印機嗰陣可以讓俾第部電腦用，更有效率噉使用打印資源；
相異性（heterogeneity）：例如 5 部電腦各有唔同硬件，當中有部嘅顯示卡零舍勁，所以呢部就喺個系統入面專門處理圖像輸出嘅工作；
同時性（concurrency）：幾部電腦同一時間各自做自己嘅工作，喺實際應用上，噉做可以減少好多系統工作上嘅延誤（睇埋平行運算）^[4]^[5]；
可縮放性（scalability）：如果個系統嘅用家數量變多，需要做嘅多數都只係增加電腦嘅數量，而一部單一電腦就算硬件再勁都會有個上限；
容錯性（fault tolerance）：10 部電腦入面就算有 1 部壞咗，只要有機制教啲電腦點樣重新分配工作，其餘嗰 9 部都可以繼續正常噉運作^[6]。

分散式系統嘅呢啲特性喺實用上好有價值^[7]。其中一個例子係大型多人網上遊戲（MMOG）嘅伺服器，一隻 MMOG 嘅一個玩家會用佢部電腦－遊戲機或者個人電腦等－行佢個遊戲程式，然後部電腦將個玩家俾嘅訊號（撳咗邊個掣或者俾咗乜指令呀噉）傳去伺服器嗰度，伺服器喺每一個時間點都要接收成千上萬嘅玩家俾嘅輸入，進行運算，計出遊戲世界嘅狀態應該要點樣改變，再將個輸出傳返去玩家嘅電腦嗰度，所以隻遊戲嘅玩家數量愈多，個伺服器要處理嘅數據量就愈大。喺呢個情況下，分散式系統嘅可縮放性特質就好有用：如果遊戲伺服器係一個分散式系統，噉每當玩家數量增加嘅時候，隻遊戲嘅開發者只係需要增加個系統嘅電腦總數量就得^[8]^[9]。

概論

定義

多個獨立節點

定義上，一個分散式系統有多個節點（node）：一個分散式系統嘅每個節點可以係一部超級電腦，又可以係一啲細啲簡單啲嘅運算機械，節點之間喺空間上可以相距好遠^[10]，而每個節點唔一定會知嗮成個系統嘅狀態^[6]；重點係每一個節點都有能力獨立於第啲節點做運算；foreach 節點 $N_{i}$ ，個節點有既定嘅輸入（input）同輸出（output）。每一個節點都有獨立嘅記憶體；個系統嘅設計者想要用呢柞節點合作一齊做某啲特定嘅運算，所以將呢啲節點編寫成會彼此之間互傳訊號^[11]，一個節點會將自己嘅運算結果傳俾另一個，等後者去做進一步嘅處理^[12]^{:p. 3}^[13]。

舉個好簡單嘅例子說明，一個設計師可能想要個系統曉幫一個伺服器做日誌，記住個伺服器做過啲乜，一個簡單嘅做法係整一個兩個節點嘅系統，第一個節點 $N_{1}$ 係個伺服器，第二個節點 $N_{2}$ 係個日誌紀錄器，設計師做好程式編寫嘅工作，教 $N_{1}$ 做親乜都傳個訊號去 $N_{2}$ 嗰度（傳嘅訊號），個訊號帶有「做咗啲乜」同「何年何月何日幾點幾分幾秒做」等嘅訊息（例：task 00248 date 01Feb2020 time 0548pm 表示「喺 2020 年 2 月 1 號下晝 5 點 48 分做咗工作 248」），然後 $N_{2}$ 內部有個程式，令 $N_{2}$ 將 $N_{1}$ 傳嗰啲訊號冚唪唥記低（「記低」就係 $N_{2}$ 所做嘅進一步處理）－形成一個簡單嘅雙節點系統^[14]。

望落似單一

另一方面，有唔少分散式系統嘅設計師認為，分散式系統由用家嘅角度睇要似一個單一嘅系統。呢一點就噉睇落好似好簡單，但查實可以好撈絞：首先，一個分散式系統有陣時可能會有其中一部機出現故障，於是個系統由嗰部機負責嘅工序就會撠咗喺度唔郁，同時系統第啲部份負責嘅工序會繼續行，打破個系統嘅運行以及「成個系統係一體」嘅觀感。「系統嘅一部份壞咗」係所有複雜系統都要面對嘅問題，但喺分散式系統當中，呢種情況零舍難以掩飾，例如美國電腦科學家萊斯利·藍波特（Leslie Lamport）係噉描述分散式系統嘅^[12]^{:p. 4}：

「

原版英文："[...] one in which the failure of a computer you did not even know existed can render your own computer unusable."

粵文翻譯：（分散式系統係）...一個系統，喺呢個系統下，一部你唔知佢存在嘅電腦故障可以搞到你自己部電腦用唔到。

」

分散式系統嘅一個可能模型（睇藍波特時間戳^[15]）；
1. X 軸表示時間，
2. A、B 同 C 分別係唔同嘅處理過程，
3. A1、A2 同 C5 等係事件，
4. 每件事件下面嗰一個格仔表達嗰件事件嘅時間戳，箭咀表示過程之間嘅訊號傳達，
5. 藍色 cause 入面嘅係 B4 嘅起因，紅色 effect 入面嘅係 B4 引致嘅事件，淺藍同淺紅表示同 B4 獨立嘅事件。

中介軟體

中介軟體（middleware）係指提供超越作業系統所提供嘅服務嘅軟件，用途在於確保個分散式系統嘅唔同節點之間能夠有效噉通訊同管理數據^[16]。中介軟體邏輯上一般會置喺每部個別電腦嘅作業系統之上，由個系統嘅唔同節點嗰度攞輸入，並且做「由一個節點接收訊號，再將個訊號翻譯俾第個節點知」等嘅工作：簡單嘅例子可以想像一個分散式系統，個中介軟體會由各個節點攞輸入，將輸入變成目標節點睇得明嘅程式語言；再先進啲嘅中介軟體仲會有「俾一個節點執行第個節點嘅子程序」同「確保幾個一開始唔係設計嚟做一個系統嘅應用程式順利噉一齊行」等嘅功能^[17]。

同平行運算嘅拏褦

分散式運算同平行運算（parallel computing）好有關係。平行運算喺最廣義上係指有多個計算過程喺同一時間執行（ $t_{i}$ 當做個系統入面第 $i$ 個過程嘅發生時間，個系統入面有多個過程 $t_{1}=t_{2}=t_{3}...$ ）。分散式運算同平行運算有相當嘅重疊^[18]：當個系統有好多個節點（ $N_{1},N_{2},N_{3}...$ ）嗰時，理論上可以喺一個節點做完計算先到下個節點做；不過喺實際應用上，比較慳時間資源嘅做法係俾至少其中一啲節點同時做計算－分散式運算多數時間都係喺度做平行運算嘅^[6]。一般認為，分散式運算同平行運算兩者最大嘅差別在於記憶體嘅分散性^[19]：

喺一般嘅平行運算當中，所有處理器都能夠使用同一柞記憶體，允許處理器之間直接分享訊息（睇附圖嘅 (c)）；
喺分散式運算當中，每個處理器（節點 $N_{i}$ ）都有獨立、第啲處理器唔能夠直接使用嘅記憶體，只可以透過處理器之間嘅訊號傳遞嚟分享訊息。

想像以下呢幾個過程嘅日誌（Ty 代表件事喺時間點 y 發生）^[20]：

 P1
   T1：Read input a from memory
   T1：Read input b from memory
   T2：Compute function_01 using a as input
   T2：Compute function_02 using b as input

 P2
   T1：Node 1 reads input a from Node 1's memory
   T2：Node 1 reads input b from Node 1's memory
   T3：Node 1 computes function_01 using a as input
   T4：Node 1 sends b to Node 2
   T5：Node 2 reads function_02 using b as input

P1 反映咗一個平行運算，function_01 同 function_02 呢兩個運算過程可以直接使用同一柞記憶；而 P2 反映咗一個分散式運算，Node 2 呢個節點唔可以直接用 Node 1 呢個節點嘅記憶，而係要 Node 1 將件記憶用訊號傳俾佢先可以用。喺某啲情況下，平行運算可以搞到個系統出一啲齋用非平行分散式運算唔會出嘅錯^[20]。

睇埋

文獻

Andrews, Gregory R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison–Wesley, ISBN 978-0-201-35752-3.
Arora, Sanjeev; Barak, Boaz (2009), Computational Complexity – A Modern Approach, Cambridge, ISBN 978-0-521-42426-4.
Christian Cachin; Rachid Guerraoui; Luís Rodrigues (2011), Introduction to Reliable and Secure Distributed Programming (2. ed.), Springer, ISBN 978-3-642-15259-7
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990), Introduction to Algorithms (1st ed.), MIT Press, ISBN 978-0-262-03141-7.
Dolev, Shlomi (2000), Self-Stabilization, MIT Press, ISBN 978-0-262-04178-2.
Elmasri, Ramez; Navathe, Shamkant B. (2000), Fundamentals of Database Systems (3rd ed.), Addison–Wesley, ISBN 978-0-201-54263-9.
Ghosh, Sukumar (2007), Distributed Systems – An Algorithmic Approach, Chapman & Hall/CRC, ISBN 978-1-58488-564-1.
Lynch, Nancy A. (1996), Distributed Algorithms, Morgan Kaufmann, ISBN 978-1-55860-348-6.
Herlihy, Maurice P.; Shavit, Nir N. (2008), The Art of Multiprocessor Programming, Morgan Kaufmann, ISBN 978-0-12-370591-4.
Papadimitriou, Christos H. (1994), Computational Complexity, Addison–Wesley, ISBN 978-0-201-53082-7.
Peleg, David (2000), Distributed Computing: A Locality-Sensitive Approach, SIAM, ISBN 978-0-89871-464-7,

攷

↑ Peleg, D. (2000). Distributed computing. SIAM Monographs on discrete mathematics and applications, 5, 1-1.
↑ ^2.0 ^2.1 Tanenbaum, Andrew S.; Steen, Maarten van (2002). Distributed systems: principles and paradigms. Upper Saddle River, NJ: Pearson Prentice Hall.
↑ Key Characteristics of Distributed Systems 互聯網檔案館嘅歸檔，歸檔日期2019年5月28號，.. System Design Blog.
↑ Andrews, Gregory R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison–Wesley. p. 291 - 292.
↑ Dolev, Shlomi (2000), Self-Stabilization, MIT Press. p. 5.
↑ ^6.0 ^6.1 ^6.2 Peleg (2000), p. 4.
↑ Magnoni, L. (2015). "Modern Messaging for Distributed Sytems (sic)". Journal of Physics: Conference Series. 608 (1): 012038.
↑ Bharambe, A. R., Pang, J., & Seshan, S. (2005). A distributed architecture for interactive multiplayer games. School of Computer Science, Carnegie Mellon University.
↑ Distributed Gaming. gamedev.net.
↑ Lynch, Nancy A. (1996). p. 1.
↑ Andrews (2000), p. 291.
↑ ^12.0 ^12.1 van Steen, M., & Tanenbaum, A. S. (2016). A brief introduction to distributed systems. Computing, 98(10), 967-1009.
↑ Andrews (2000), p. 291–292.
↑ Burns, B., & Oppenheimer, D. (2016). Design patterns for container-based distributed systems (PDF). In 8th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 16).
↑ Ronsse, M. A., & Kranzlmuller, D. A. (1998, January). Rolt/sup MP/-replay of Lamport timestamps for message passing systems. In Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing-PDP'98- (pp. 87-93). IEEE.
↑ What is middleware?. Microsoft Azure.
↑ Bernstein P (1996) Middleware: a model for distributed system services. Commun ACM, 39(2): 87–98.
↑ Ghosh (2007), p. 10.
↑ Papadimitriou (1994), Chapter 15.
↑ ^20.0 ^20.1 Chapter 4: Distributed and Parallel Computing 互聯網檔案館嘅歸檔，歸檔日期2019年11月25號，..

拎

Distributed Computing，一本講分散式運算嘅期刊。
Journal of Parallel and Distributed Computing，又一本講分散式運算嘅期刊。

[1] Peleg, D. (2000). Distributed computing. SIAM Monographs on discrete mathematics and applications, 5, 1-1.

[basicdiscom-2] 2.0 ^2.1 Tanenbaum, Andrew S.; Steen, Maarten van (2002). Distributed systems: principles and paradigms. Upper Saddle River, NJ: Pearson Prentice Hall.

[3] Key Characteristics of Distributed Systems 互聯網檔案館嘅歸檔，歸檔日期2019年5月28號，.. System Design Blog.

[4] Andrews, Gregory R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison–Wesley. p. 291 - 292.

[5] Dolev, Shlomi (2000), Self-Stabilization, MIT Press. p. 5.

[peleg2000p4-6] 6.0 ^6.1 ^6.2 Peleg (2000), p. 4.

[7] Magnoni, L. (2015). "Modern Messaging for Distributed Sytems (sic)". Journal of Physics: Conference Series. 608 (1): 012038.

[8] Bharambe, A. R., Pang, J., & Seshan, S. (2005). A distributed architecture for interactive multiplayer games. School of Computer Science, Carnegie Mellon University.

[9] Distributed Gaming. gamedev.net.

[10] Lynch, Nancy A. (1996). p. 1.

[11] Andrews (2000), p. 291.

[vansteen2016-12] 12.0 ^12.1 van Steen, M., & Tanenbaum, A. S. (2016). A brief introduction to distributed systems. Computing, 98(10), 967-1009.

[13] Andrews (2000), p. 291–292.

[14] Burns, B., & Oppenheimer, D. (2016). Design patterns for container-based distributed systems (PDF). In 8th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 16).

[15] Ronsse, M. A., & Kranzlmuller, D. A. (1998, January). Rolt/sup MP/-replay of Lamport timestamps for message passing systems. In Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing-PDP'98- (pp. 87-93). IEEE.

[16] What is middleware?. Microsoft Azure.

[17] Bernstein P (1996) Middleware: a model for distributed system services. Commun ACM, 39(2): 87–98.

[ghosh2007p10-18] Ghosh (2007), p. 10.

[19] Papadimitriou (1994), Chapter 15.

[ch4-20] 20.0 ^20.1 Chapter 4: Distributed and Parallel Computing 互聯網檔案館嘅歸檔，歸檔日期2019年11月25號，..

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]