提示工程
提示工程(粵音:Prompt engineering)係人工智能嘅一個概念,特別自然語言處理(NLP)。 喺提示工程中,任務的描述會被嵌入到輸入中。通常唔喺指俾模型一個明確或暗示嘅參數,而是用問問題嘅形式,直接輸入。 提示工程的典型工作方式,將一個或多個任務轉做提示嘅數據集,並通過所謂的“基於提示學習(prompt-based learning)”黎訓練語言模型。 喺零樣本學習,喺提示入面,預留鼓勵思考鏈的語句(例如:不如我們一步一步咁思考”)可以提高語言模型喺多步驟推理問題中嘅表現。[1][2] [3] [4][5]
思維鏈
[編輯]根據Google嘅CEO,「思維鏈」(Chain-of-thought,CoT)被認為係一種令大型語言模型(LLMs)可以將問題分解做幾個中間步驟嚟解決嘅技術[6],然後先畀出最終答案。喺2022年,Google仲聲稱思維鏈提示可以透過引導模型用類似思維過程嘅推理步驟嚟解決多步驟問題,從而改善推理能力。[6][7][8] 根據Google同Amazon嘅公告,思維鏈技術理論上可以幫助大型語言模型克服一啲需要邏輯思維同多個步驟嚟解決嘅推理任務,例如算術或者常識推理問題。[9][10][11]
舉個例,畀個問題「問:食堂有23個蘋果。如果佢哋用咗20個嚟煮午餐,之後又買咗6個,而家有幾多個蘋果?」,Google話CoT提示可能會令LLM答「答:食堂起初有23個蘋果。佢哋用咗20個嚟煮午餐。所以剩返23 - 20 = 3個。佢哋買多咗6個蘋果,所以而家有3 + 6 = 9個。答案係9。」[7]
根據Google最初嘅提議,[7]每個CoT提示都包含幾個問答例子。呢個令佢成為一個「少量樣本」提示技術。不過,根據Google同東京大學嘅研究人員所講,單純加上「讓我們一步一步諗」呢幾個字,[12]都證明咗係有效嘅,令CoT變成一個「零樣本」提示技術。OpenAI話呢個提示可以實現更好嘅擴展,因為用戶唔再需要構思好多具體嘅CoT問答例子。[13]
當應用喺PaLM(一個有5,400億參數嘅語言模型)上面,Google話CoT提示大大幫助咗模型,令佢喺幾個任務上面可以同專門微調過嘅模型表現相若,喺GSM8K 數學推理基準上面達到當時嘅最高水平。[7]根據Google所講,可以透過喺CoT推理數據集上面微調模型嚟進一步加強呢個能力,同時刺激更好嘅可解釋性。[14][15]
例子:[12] 問:{問題} 答:讓我們一步一步諗。
其他技巧
[編輯]串鏈思維提示只係眾多提示工程技巧嘅其中一種。學界提出咗好多唔同嘅技巧,至少已經發表咗29種唔同嘅方法。[16]
符號鏈式提示 (Chain-of-Symbol, CoS)
浙江西湖大學、香港中文大學同愛丁堡大學嘅研究合作發現,符號鏈式提示配合思維鏈提示可以幫助大型語言模型克服佢哋喺文字空間推理方面嘅困難。換句話講,用「/」咁嘅任意符號可以幫助大型語言模型理解文字中嘅空間關係。研究指出呢個方法可以提升模型嘅推理能力同表現。[17]
例子:[17] 輸入:
有一堆磚。黃色嘅C磚喺E磚上面。黃色嘅D磚喺A磚上面。黃色嘅E磚喺D磚上面。白色嘅A磚喺B磚上面。B磚係白色嘅。而家我哋要攞一塊特定嘅磚。磚必須由上至下咁攞,如果要攞下面嘅磚,就要先攞走上面嘅磚。點樣先可以攞到D磚?
B/A/D/E/C C/E E/D D
輸出:
所以答案係要先攞走C,跟住E,最後先至可以攞到D。
少量樣本學習
提示可以包含幾個例子畀模型學習,例如要求模型完成「maison house, chat cat, chien 」(預期答案係 dog)[18],呢個就叫做少量樣本學習。[19]
生成知識提示
[編輯]生成知識提示[20]首先會要求模型生成同任務相關嘅事實,然後先去完成個任務。呢個方法通常可以提高完成嘅質素[未記出處或冇根據],因為模型可以根據相關事實嚟做判斷。
例子:[20] 為輸入嘅概念生成一啲知識。 輸入:{問題} 知識:
由少至多提示
[編輯]由少至多提示[21]會先要求模型列出問題嘅子問題,然後逐個解決,等後面嘅子問題可以利用之前答案嘅幫助。
例子:[21] 輸入: 問:{問題} 答:我哋將呢個問題分拆: 1.
自我一致性解碼
[編輯]自我一致性解碼[22]會做幾次思維鏈推理,然後揀最常見嘅結論。如果唔同嘅推理結果差異好大,就可以搵人嚟幫手揀正確嘅思維鏈。[23]
複雜度為本提示
[編輯]複雜度為本提示[24]會做幾次思維鏈推理,揀啲思維鏈最長嘅推理結果,然後喺呢啲結果入面揀最常見嘅結論。
自我改進
[編輯]自我改進[25]會先要求模型解決問題,然後要求模型批評自己嘅解決方案,再要求模型根據問題、解決方案同批評重新解決問題。呢個過程會重複進行,直到用晒token、時間用完,或者模型輸出「停止」嘅指令。
批評例子:[25] 我有啲代碼。畀一個建議嚟提高可讀性。唔使改代碼,淨係畀個建議。 代碼:{code} 建議:
改進例子: 代碼:{code} 用呢個建議嚟改進代碼。 建議:{suggestion} 新代碼:
思維樹
[編輯]思維樹提示[26]係思維鏈嘅延伸,佢會要求模型產生一個或者多個「可能嘅下一步」,然後用廣度優先搜索、束搜索或者其他樹搜索方法喺每個可能嘅下一步執行模型。[27]
產婆式提示
[編輯]產婆式提示同思維樹好相似。首先會要求模型答問題同埋解釋,然後再要求模型解釋佢解釋嘅部分內容,如此類推。唔一致嘅解釋樹會被剪走或者放棄。呢個方法可以提升複雜常識推理嘅表現。[28]
例子:[28] 問:{問題} 答:正確,因為
Copy
問:{問題}
答:錯誤,因為
定向刺激提示
[編輯]定向刺激提示[29]會包含一啲提示或者線索,例如想要嘅關鍵字,嚟指引語言模型產生想要嘅輸出。
例子:[29] 文章:{article} 關鍵字:
Copy
文章:{article}
問:用2-4句準確咁總結篇文章,並包含提供嘅關鍵字。
關鍵字:{keywords}
答:
披露不確定性嘅提示
[編輯]語言模型預設嘅輸出可能唔包含不確定性評估。模型可能輸出睇嚟好有信心嘅文字,但背後嘅詞元預測可能有好低嘅似然分數。好似GPT-4咁嘅大型語言模型喺詞元預測方面可以有準確嘅校準似然分數,[30]所以模型輸出嘅不確定性可以透過直接讀取詞元預測嘅似然分數嚟估計。
不過,如果冇辦法獲取呢啲分數(例如透過限制性API嚟使用模型嘅時候),都仲可以估計不確定性同將佢加入模型輸出。其中一個簡單嘅方法就係要求模型用文字嚟估計不確定性。[31]另一個方法就係要求模型用標準化嘅方式拒絕回答,如果輸入唔符合條件嘅話。[未記出處或冇根據]
估計模型敏感度嘅提示
[編輯]研究一直顯示大型語言模型對提示格式、結構同語言特徵嘅細微變化都非常敏感。有啲研究發現喺少量示例設定下,格式改變可以令準確度相差多達76個百分點。[32] 語言特徵好大程度影響提示嘅效果——例如形態學、句法同詞彙語義嘅改變——呢啲都可以喺各種任務中明顯提升表現。[33][34] 例如,分句句法可以提高知識檢索嘅一致性同減少不確定性。[35] 即使增加模型規模、加入更多少量示例,或者做指令微調,呢種敏感性都仲係存在。
為咗處理模型嘅敏感性同令佢更穩健,研究者提出咗幾個方法。FormatSpread通過評估一系列合理嘅提示格式,幫助系統性分析,提供更全面嘅表現區間。[36] 同樣,PromptEval會估計唔同提示嘅表現分佈,令到可以喺有限預算下做到穩健嘅指標評估,例如表現分位數同準確評估。[37]
歷史
[編輯]喺2018年,研究人員首次提出所有之前獨立嘅NLP任務都可以轉化為一個問題答覆問題。佢哋仲訓練咗第一個單一、聯合、多任務模型,可以回答任何任務相關嘅問題,好似「呢句嘅語氣係點」或者「將呢句翻譯成德文」或者「邊個係總統」咁。[38]
喺2021年,研究人員將一個生成式預訓練模型(T0)微調到可以執行12個NLP任務(使用咗62個數據集,因為每個任務可以有多個數據集)。呢個模型喺新任務上面表現良好,仲超越咗直接訓練嚟執行單一任務嘅模型(冇預訓練嘅)。要解決一個任務,T0會收到一個結構化嘅提示,例如If {{premise}} is true, is it also true that {{hypothesis}}? ||| {{entailed}}.
就係用嚟令T0解決entailment嘅提示。[39]
一個提示庫報告喺2022年2月有超過2,000個公開提示,涵蓋大約170個數據集。[40]
喺2022年,「思維鏈」提示技術由Google嘅研究人員提出。[7][41]
喺2023年,有幾個文字轉文字同文字轉圖像嘅提示數據庫公開咗。[42][43]
睇埋
[編輯]文獻
[編輯]- ↑ Liu, Vivian; Chilton, Lydia. "Design Guidelines for Prompt Engineering Text-to-Image Generative Models". ACM Digital Library. Association for Computing Machinery. 原先內容歸檔喺2022-10-26. 喺2022-10-26搵到.
- ↑ Monge, Jim Clyde (2022-08-25). "Dall-E2 VS Stable Diffusion: Same Prompt, Different Results". MLearning.ai (英文). 原先內容歸檔喺2022-08-26. 喺2022-08-31搵到.
- ↑ McAuliffe, Zachary. "Google's Latest AI Model Can Be Taught How to Solve Problems". CNET (英文). 喺10 March 2023搵到.
- ↑ "Prompt Chaining & Large Language Models".
- ↑ "Voiceflow: The Future of AI-Powered Conversational Interfaces".
- ↑ 6.0 6.1 McAuliffe, Zachary (11 March 2022). "Google's Latest AI Model Can Be Taught How to Solve Problems". CNET.
- ↑ 7.0 7.1 7.2 7.3 7.4 Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS 2022) (英文).第35卷. arXiv:2201.11903.
- ↑ Sharan Narang and Aakanksha Chowdhery (2022-04-04). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance".
- ↑ Dang, Ekta (8 February 2023). "Harnessing the power of GPT-3 in scientific research". VentureBeat. 喺10 March 2023搵到.
- ↑ Montti, Roger (13 May 2022). "Google's Chain of Thought Prompting Can Boost Today's Best Algorithms". Search Engine Journal (英文). 喺10 March 2023搵到.
- ↑ Ray, Tiernan. "Amazon's Alexa scientists demonstrate bigger AI isn't always better". ZDNET (英文). 喺10 March 2023搵到.
- ↑ 12.0 12.1 Kojima, Takeshi; Shixiang Shane Gu; Reid, Machel; Matsuo, Yutaka; Iwasawa, Yusuke (2022). "Large Language Models are Zero-Shot Reasoners". arXiv:2205.11916 [cs.CL].
- ↑ Dickson, Ben (30 August 2022). "LLMs have not learned our language — we're trying to learn theirs". VentureBeat. 喺10 March 2023搵到.
- ↑ Chung, Hyung Won; Hou, Le; Longpre, Shayne; Zoph, Barret; Tay, Yi; Fedus, William; Li, Yunxuan; Wang, Xuezhi; Dehghani, Mostafa; Brahma, Siddhartha; Webson, Albert; Gu, Shixiang Shane; Dai, Zhuyun; Suzgun, Mirac; Chen, Xinyun; Chowdhery, Aakanksha; Castro-Ros, Alex; Pellat, Marie; Robinson, Kevin; Valter, Dasha; Narang, Sharan; Mishra, Gaurav; Yu, Adams; Zhao, Vincent; Huang, Yanping; Dai, Andrew; Yu, Hongkun; Petrov, Slav; Chi, Ed H.; Dean, Jeff; Devlin, Jacob; Roberts, Adam; Zhou, Denny; Le, Quoc V.; Wei, Jason (2022). "Scaling Instruction-Finetuned Language Models". arXiv:2210.11416 [cs.LG].
- ↑ Wei, Jason; Tay, Yi (29 November 2022). "Better Language Models Without Massive Compute". ai.googleblog.com (英文). 喺10 March 2023搵到.
- ↑ Sahoo, Pranab; Singh, Ayush Kumar; Saha, Sriparna; Jain, Vinija; Mondal, Samrat; Chadha, Aman (2024-02-05). "A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications". arXiv:2402.07927 [cs.AI].
- ↑ 17.0 17.1 Hu, Hanxu; Lu, Hongyuan; Zhang, Huajian; Song, Yun-Ze; Lam, Wai; Zhang, Yue (2023-10-03). "Chain-of-Symbol Prompting Elicits Planning in Large Language Models". arXiv:2305.10276 [cs.CL].
- ↑ Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
- ↑ Brown, Tom; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared D.; Dhariwal, Prafulla; Neelakantan, Arvind (2020). "Language models are few-shot learners". Advances in Neural Information Processing Systems. 33: 1877–1901. arXiv:2005.14165.
- ↑ 20.0 20.1 Liu, Jiacheng; Liu, Alisa; Lu, Ximing; Welleck, Sean; West, Peter; Le Bras, Ronan; Choi, Yejin; Hajishirzi, Hannaneh (May 2022). "Generated Knowledge Prompting for Commonsense Reasoning". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics: 3154–3169. arXiv:2110.08387. doi:10.18653/v1/2022.acl-long.225. S2CID 239016123.
- ↑ 21.0 21.1 Zhou, Denny; Schärli, Nathanael; Hou, Le; Wei, Jason; Scales, Nathan; Wang, Xuezhi; Schuurmans, Dale; Cui, Claire; Bousquet, Olivier; Le, Quoc; Chi, Ed (2022-05-01). "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". arXiv:2205.10625 [cs.AI].
...least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence.
- ↑ Wang, Xuezhi; Wei, Jason; Schuurmans, Dale; Le, Quoc; Chi, Ed; Narang, Sharan; Chowdhery, Aakanksha; Zhou, Denny (2022-03-01). "Self-Consistency Improves Chain of Thought Reasoning in Language Models". arXiv:2203.11171 [cs.CL].
- ↑ Diao, Shizhe; Wang, Pengcheng; Lin, Yong; Zhang, Tong (2023-02-01). "Active Prompting with Chain-of-Thought for Large Language Models". arXiv:2302.12246 [cs.CL].
- ↑ Fu, Yao; Peng, Hao; Sabharwal, Ashish; Clark, Peter; Khot, Tushar (2022-10-01). "Complexity-Based Prompting for Multi-Step Reasoning". arXiv:2210.00720 [cs.CL].
- ↑ 25.0 25.1 Madaan, Aman; Tandon, Niket; Gupta, Prakhar; Hallinan, Skyler; Gao, Luyu; Wiegreffe, Sarah; Alon, Uri; Dziri, Nouha; Prabhumoye, Shrimai; Yang, Yiming; Gupta, Shashank; Prasad Majumder, Bodhisattwa; Hermann, Katherine; Welleck, Sean; Yazdanbakhsh, Amir (2023-03-01). "Self-Refine: Iterative Refinement with Self-Feedback". arXiv:2303.17651 [cs.CL].
- ↑ Long, Jieyi (2023-05-15). "Large Language Model Guided Tree-of-Thought". arXiv:2305.08291 [cs.AI].
- ↑ Yao, Shunyu; Yu, Dian; Zhao, Jeffrey; Shafran, Izhak; Griffiths, Thomas L.; Cao, Yuan; Narasimhan, Karthik (2023-05-17). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models". arXiv:2305.10601 [cs.CL].
- ↑ 28.0 28.1 Jung, Jaehun; Qin, Lianhui; Welleck, Sean; Brahman, Faeze; Bhagavatula, Chandra; Le Bras, Ronan; Choi, Yejin (2022). "Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations". arXiv:2205.11822 [cs.CL].
- ↑ 29.0 29.1 Li, Zekun; Peng, Baolin; He, Pengcheng; Galley, Michel; Gao, Jianfeng; Yan, Xifeng (2023). "Guiding Large Language Models via Directional Stimulus Prompting". arXiv:2302.11520 [cs.CL].
The directional stimulus serves as hints or cues for each input query to guide LLMs toward the desired output, such as keywords that the desired summary should include for summarization.
- ↑ OpenAI and over 200 people (2023-03-27). "GPT-4 Technical Report". arXiv:2303.08774 [cs.CL]. [睇圖表8。]
- ↑ Eliot, Lance (2023-08-18). "Latest Prompt Engineering Technique Aims To Get Certainty And Uncertainty Of Generative AI Directly On The Table And Out In The Open". Forbes. 喺2024-08-31搵到.
如果你喺提示入面明確表示你想生成式AI輸出確定性或者不確定性嘅資格,你就一定會收到呢啲提示。
- ↑ Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (2024-07-01). "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting". arXiv:2310.11324 [cs.CL].
- ↑ Wahle, Jan Philip; Ruas, Terry; Xu, Yang; Gipp, Bela (2024). "Paraphrase Types Elicit Prompt Engineering Capabilities". 出自 Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (編). Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics. pp. 11004–11033. arXiv:2406.19898. doi:10.18653/v1/2024.emnlp-main.617.
- ↑ Leidinger, Alina; van Rooij, Robert; Shutova, Ekaterina (2023). Bouamor, Houda; Pino, Juan; Bali, Kalika (編). "The language of prompting: What linguistic properties make a prompt successful?". Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics: 9210–9232. arXiv:2311.01967. doi:10.18653/v1/2023.findings-emnlp.618.
- ↑ Linzbach, Stephan; Dimitrov, Dimitar; Kallmeyer, Laura; Evang, Kilian; Jabeen, Hajira; Dietze, Stefan (June 2024). "Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models". 出自 Duh, Kevin; Gomez, Helena; Bethard, Steven (編). Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Mexico City, Mexico: Association for Computational Linguistics. pp. 3645–3655. arXiv:2404.01992. doi:10.18653/v1/2024.naacl-long.201.
- ↑ Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (2024-07-01). "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting". arXiv:2310.11324 [cs.CL].
- ↑ Polo, Felipe Maia; Xu, Ronald; Weber, Lucas; Silva, Mírian; Bhardwaj, Onkar; Choshen, Leshem; de Oliveira, Allysson Flavio Melo; Sun, Yuekai; Yurochkin, Mikhail (2024-10-30). "Efficient multi-prompt evaluation of LLMs". arXiv:2405.17202 [cs.CL].
- ↑ McCann, Bryan; Shirish, Nitish; Xiong, Caiming; Socher, Richard (2018). "The Natural Language Decathlon: Multitask Learning as Question Answering". arXiv:1806.08730 [cs.CL].
- ↑ Sanh, Victor; 等 (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv:2110.08207 [cs.LG].
- ↑ Bach, Stephen H.; Sanh, Victor; Yong, Zheng-Xin; Webson, Albert; Raffel, Colin; Nayak, Nihal V.; Sharma, Abheesht; Kim, Taewoon; M Saiful Bari; Fevry, Thibault; Alyafeai, Zaid; Dey, Manan; Santilli, Andrea; Sun, Zhiqing; Ben-David, Srulik; Xu, Canwen; Chhablani, Gunjan; Wang, Han; Jason Alan Fries; Al-shaibani, Maged S.; Sharma, Shanya; Thakker, Urmish; Almubarak, Khalid; Tang, Xiangru; Radev, Dragomir; Mike Tian-Jian Jiang; Rush, Alexander M. (2022). "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". arXiv:2202.01279 [cs.LG].
- ↑ Wei, Jason; Zhou (11 May 2022). "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com (英文). 喺10 March 2023搵到.
- ↑ Chen, Brian X. (2023-06-23). "How to Turn Your Chatbot Into a Life Coach". The New York Times.
- ↑ Chen, Brian X. (2023-05-25). "Get the Best From ChatGPT With These Golden Prompts". The New York Times (美國英文). ISSN 0362-4331. 喺2023-08-16搵到.