網頁刮料

網頁刮料（粵拼：mong5 jip6 gwaat3 liu2；英文：web scraping）係指由網頁嗰度做數據刮取（攞有用嘅數據）。原則上，網頁刮料呢家嘢可以齋靠人手做，但絕大多數用家都會嫌人手慢得滯；所以喺實際應用上，網頁刮料通常都會用自動化嘅電腦程式做，呢啲程式曉用 HTTP 等嘅方法上網，再郁手由啲網頁度攞數據^[1]。

基礎

最基本上，一個做網頁刮料嘅程式會有兩大功能^[2]^[3]：

Fetch：攞用家指定嘅網頁嚟睇，當中網頁可能係用家指定網址，又或者教部電腦按某啲規則搵拃網頁返嚟；
Extract：由手上嘅網頁度攞數據，簡單嘅可以係睇個網頁入面有乜字符，或者數吓每隻字符出現咗幾多次呀噉；

喺廿一世紀初，網頁刮料嘅做法成日俾人攞嚟分析網頁相關嘅問題－例如教程式自動噉由網購網站度攞有關產品嘅資訊（呢啲資訊會由網頁入面有嘅字反映）^[4]，又或者係攞社交媒體上面啲人嘅留言嚟睇，靠分析呢啲留言理解啲人對唔同嘢嘅觀感^[5]。因為網頁刮料咁有用，有唔少電腦科學方面嘅工作者都致力做研究，想知點先可以設計出演算法嚟有效噉做網頁刮料^[2]。

睇埋

引咗

↑ Kenneth, Hirschey, Jeffrey (2014-01-01). "Symbiotic Relationships: Pragmatic Acceptance of Data Scraping". Berkeley Technology Law Journal. 29 (4).
↑ ^2.0 ^2.1 Mahto, D. K., & Singh, L. (2016, March). A dive into Web Scraper world. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 689-693). IEEE.
↑ Dastidar, B. G., Banerjee, D., & Sengupta, S. (2016). An intelligent survey of personalized information retrieval using web scraper. International Journal of Education and Management Engineering, 6(5), 24-31.
↑ Ullah, H., Ullah, Z., Maqsood, S., & Hafeez, A. (2018). Web scraper revealing trends of target products and new insights in online shopping websites. International Journal of Advanced Computer Science and Applications, 9(6).
↑ Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).

拎

（英文） Python Web Scraping Tutorial. GeeksForGeeks.

[1] Kenneth, Hirschey, Jeffrey (2014-01-01). "Symbiotic Relationships: Pragmatic Acceptance of Data Scraping". Berkeley Technology Law Journal. 29 (4).

[mahto2016-2] 2.0 ^2.1 Mahto, D. K., & Singh, L. (2016, March). A dive into Web Scraper world. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 689-693). IEEE.

[3] Dastidar, B. G., Banerjee, D., & Sengupta, S. (2016). An intelligent survey of personalized information retrieval using web scraper. International Journal of Education and Management Engineering, 6(5), 24-31.

[ullah2018-4] Ullah, H., Ullah, Z., Maqsood, S., & Hafeez, A. (2018). Web scraper revealing trends of target products and new insights in online shopping websites. International Journal of Advanced Computer Science and Applications, 9(6).

[5] Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).

[1]

[2]

[3]

[4]

[5]