模組:Sandbox/Al12si/z

出自維基百科,自由嘅百科全書

Synopsis[編輯]

Test page for CS1 localization and notes on localization

Description[編輯]

If any new subpage is needed it must be created through 模組:Sandbox

Check the test output below, but first make sure Template:沙盤 has the right #invoke, which should look like this:

{{safesubst:#invoke:Sandbox/Al12si/zz|citation|CitationClass=work}}

Outputs[編輯]

Test[編輯]

{{subst:Template:沙盤/還原|

{{subst:Template:沙盤/還原|

{{subst:Template:沙盤/還原|

{{subst:Template:沙盤/還原|

{{subst:Template:沙盤/還原|

{{subst:Template:沙盤/還原|

Production (for comparison)[編輯]

{{subst:Template:沙盤/還原|

A quarterly magazine (香港中文). 2016年春. 喺2022年9月13號搵到.

A quarterly magazine (繁體中文). 2016年夏. 喺2022年9月13號搵到.

A quarterly magazine (美國英文). 2016年秋. 喺2022年9月13號搵到.

A journal (加拿大法文). 2018年6月. 喺2022年9月13號搵到.

A newspaper (因紐特文). 2017年. 喺2022年9月13號搵到.

Notes[編輯]

Lua[編輯]

Each Lua module on Wikipedia is defined by constructing a local associative array with exported function names as keys and closures as the corresponding values. After a module is require’d these exported functions are accessed through the dot operator.

Lua regexp is very simple. There is no operator for alternation. It is not possible to do many things that you take for granted in Perl, or even Javascript.

Lua regexp seems to be not Unicode-safe, but Unicode-safe versions of match and gsub are available through mw.ustring; so instead of saying (for example) s:match('some string') you should say mw.ustring.match(s, 'some string').

However, using mw.ustring will make regexp matching match on POSIX classes. In particular, full-width digits will be matched by %d. This can create problems when, for example, the result of the match is postprocessed by tonumber because the latter will not understand what they are seeing; this can cause a latent error which is difficult to understand.

The only way to create debug messages is with mw.log in the code through the debug console on the module page. However, this function does not seem to work, so there’s no way to create debug messages.

Configuration[編輯]

Error messages are defined in the Configuration module. The “Check date values in” error is bad_date.

The local_digits key can be used to teach the module how to understand full-width digits (and possibly Chinese digits) if and only if there’s wider consensus that this is desirable.

The date_names array controls certain “advanced” features in date validation, such as month ranges, seasons etc. The comments incorrectly state that the local tables (except quarter and season tables) should be filled in only when the module isn’t pulling date names from MediaWiki. Everything must in fact be translated because is_valid_month_range_style depends on both the long and short month tables having been filled in (if we implement month ranges).

CS1 main module[編輯]

The main entry point for cite calls is the citation function in the invoked module (grep “function citation(”). This function must be invoked from a template; invoking it from a page (e.g., this one) will not work because it expects a “frame” (presumably something created by templates) but it’s not clear at this point what a “frame” is.

Further modules are loaded at invoke time through the mechanism described above. These include the Configuration (loaded into cfg) and Date Validation (loaded into validation) modules.

The function of the citation function is to massage the parameters passed by the template (how this works has not been figured out), then pass it to citation0 which will do the real work.

citation0 checks the passed parameters (now in an associative array). Dates are “checked” here. There are special cases for the date parameter.

The real checking appears to be done through validation.dates, using a short list of parameter names that’s expected to contain date values.

name_tag_get is the function responsible for generating language names. Region tags and script tags trigger different logic. An incorrectly matched region tag (e.g., “Chinese (Hong Kong)” for zh-HK or “Canadian French”) is generated by the 5th return (after the cfg.mw_languages_by_tag_t[lang_param_lc]; call), but an ignored script tag (e.g., “因紐特文” for iu-Latn) is generated by the 6th return (after the lang_param_lc:match ('^(%a%a%a?)%-.*') call).

Date Validation module[編輯]

The function called by citation0 in the module is dates. It goes through the list of date values passed by citation0 and performs a few hard-coded checks for the date, year and pmc-embargo-date parameters; in the default case it calls check_date.

The function check_date checks the passed date_string using regexps defined in the patterns array. Note that the ymd pattern is special (it’s assumed to be ASCII-only because it’s ISO 8601), so since we need to define a Cantonese-specific pattern (e.g., call it 年月日) a special Unicode-safe check must be added to check_date (and presumably is_valid_embargo_date).

In fact, every check defined in patterns seems to be hard-coded. (It is not obvious why they’re even defined in an array.) All local patterns must have a corresponding hard-coded check in check_date.

The logic in check_date (and corresponding definitions in patterns) refer to “anchor year” (a) values. These are used to construct author-year references.[1] It’s probably safe to always set it to the same value as “year” (y).

Note that when check_dates uses mw.ustring.match on date strings %d will match on all Unicode digit classes (e.g., full-width digits), which can create problems. There are three ways to work around this problem:

  1. Don’t use the Unicode-safe mw.ustring.match (this is what the old localization code did);
  2. Teach CS1 about full-width digits by defining local_digits in the Configuration module;
  3. Match on [0-9] instead of %d when defining local strings in patterns.

Option 1 is clearly sub-optimal since doing so would mean not being able to use character ranges in patterns (and specifically not being able to match on [日號]). Option 2 works but it changes behaviour so it has been rolled back. Option 3 is what’s currently being done.

Note that if local_digits are defined then using [0-9] will accomplish nothing because by the time the pattern match is done, full-width digits will have already been changed into their corresponding half-width forms.

Next steps[編輯]

The current mods handle 年月日, 年月, and 年. Because mw.log does not work it’s unclear if the code is actually working as intended, although it doesn’t produce any errors.

Many cases are currently not handled, including date ranges and seasons.

As an alternative way to localize this module (and to reduce the amount of code needed to handle date ranges and seasons), it should be to possible to convert Cantonese dates to ISO 8601 format at the start of check_date. This should result in more manageable code and avoid duplicating code that’s already in the standard logic.

Country-tagged languages[編輯]

name_tag_get in the main module has no knowledge of the Lang module’s IANA regions, IANA variants and related tables. So region-tagged language codes will always half-fail — it will not actually fail (which would be helpful), but an English name (provided by the wiki software) would be returned.

See also[編輯]

References[編輯]

Bugs[編輯]

This is supposed to be a module sandbox, but it was created in User space. It can’t be moved back, so I might as well use it as a test page.