決策樹學習

決策樹學習(decision tree analysis)係機械學習嘅一種模型,一個決策樹(decision tree)會有若干個節點(node),每個節點代表咗一個考慮緊嘅變數,並且喺接收到一個個案嗰陣,先後噉用呢啲變數作出預測,例如係附圖入面嗰個決策樹噉樣,佢會接收一個個案嘅數據,然後先後噉按照個個案嘅各種變數(性別、年紀、同有幾多個親屬喺船上呀噉)預測嗰個鐵達尼號乘客有幾大機會生還。而一個用決策樹嘅機械學習演算法要做嘅嘢就係用手上嘅數據,砌出一個噉樣嘅決策樹[1]。一個建立決策樹嘅演算法步驟大致如下[2]:
例如係以下嘅虛擬碼噉(呢個演算法就係所謂嘅 ID3)[3][4]:
計吓成個數據庫嘅 information entropy(資訊熵;簡單啲講就係指柞數據有幾接近完全隨機)
For 每一個用嚟做預測嘅變數
計吓用咗佢分類之後嘅總 entropy
計吓用咗佢分類之後嘅總 entropy 同成個數據庫嘅 entropy 差幾多(information gain)
揀 information gain 最高嗰個變數嚟分類
For 每一個分咗嘅類,用嗰個類內嘅個案做數據庫,做多次上述嘅過程,直至用嗮所有用嚟做預測嘅變數,或者到咗指定嘅分枝數上限為止。
睇埋[編輯]
參考[編輯]
- James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2017). "Tree-Based Methods". An Introduction to Statistical Learning: with Applications in R. New York: Springer. pp. 303–336. ISBN 978-1-4614-7137-0.
攷[編輯]
- ↑ Decision Trees in Machine Learning. Towards Data Science.
- ↑ Decision Tree. Machine Learning.
- ↑ Chapter 4: Decision Trees Algorithms.
- ↑ Decision Trees: ID3 Algorithm Explained. Towards Data Science.