解析

電算同語言學入面，解析（英文：parsing^{[註 1]}，又叫 syntactic analysis ）係指分析一串語言資訊，決定佢喺某種形式語法（英文：formal grammar）（formal grammar）中嘅語法結構（grammatical structure）嘅過程。

喺電算，上面講嘅語言資訊並唔係源碼，而係一種叫 token（臺譯｢符記｣）嘅數據，token 係已經用掃描器（scanner，又叫 lexer）將原碼掃咗一次，而得出嘅識別碼、保留字、字串、數字或者其他符號，而解析就係指將 token 重新組合成一種對應原內容語法結構嘅數據結構，方便以後處理，用嘅數據結構通常會係一棵樹，叫解析樹（parse tree），而做呢樣嘢嘅嘢就叫解析器（parser）。

下面段文未譯好，歡迎你幫手。

人話

下面段文未譯好，歡迎你幫手。

In some machine translation and natural language processing systems, human languages are parsed by computer programs. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language. In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually-designed grammars for programming languages. As mentioned earlier some grammar formalisms are very computationally difficult to parse; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, eg, linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

程式語言

下面段文未譯好，歡迎你幫手。

The most common use of a parser is as a component of a compiler. This parses the source code of a computer programming language to create some form of internal representation. Programming languages tend to be specified in terms of a context-free grammar because fast and efficient parsers can be written for them. Parsers are usually not written by hand but are generated by parser generators.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out.

Overview of process

The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^ and 2, each of which is a meaningful symbol in the context of an arithmetic expression. The parser would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is syntactic parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator, the action is to evaluate the expression; a compiler, on the other hand, would generate code. Attribute grammars can also be used to define these actions.

解析器嘅款

下面段文未譯好，歡迎你幫手。

解析器嘅例

下面段文未譯好，歡迎你幫手。

參攷

Parsing Techniques - A Practical Guide web page of book includes downloadable pdf.

再睇下

解析器嘅概念

整解析器嘅軟件

See also: List of Parsers comparison table.

自由軟件

維基媒體

商

註

↑ 喺語言學，parsing 有多過一個意思，可以指將連續嘅說話分做音素，如果要指明係語法分析，可以講明係 syntactic parsing 或者 syntactic analysis。

[1] 喺語言學，parsing 有多過一個意思，可以指將連續嘅說話分做音素，如果要指明係語法分析，可以講明係 syntactic parsing 或者 syntactic analysis。

[註 1]

解析

人話

程式語言

Overview of process

解析器嘅款

解析器嘅例

Top-down parsers

Bottom-up parsers

參攷

再睇下

解析器嘅概念

整解析器嘅軟件

自由軟件

維基媒體

商

註