User:Hillgentleman/persondata

Persondata 係種特別嘅metadata，可加入啲傳記文。This metadata can then be extracted and processed automatically (unlike conventional Wikipedia content). It consists of a set of standardized data fields which include basic information about the person, such as name, birthday, place of birth, etc. This metadata can be used for a variety of purposes, including advanced search capabilities, statistical analysis, automated categorization, and birthday lists. The addition of persondata will not affect the normal display of an article since the information remains hidden unless a user sets their user stylesheet to display it.

響2007年 8月25號，英文維基百科有12,200 篇文加咗 persondata。2007年 7月，德文維基百科有 163,400 篇文有 "Personendaten" ([1]).

個模點用

To use the {{Persondata}} template, copy the wikitext below to the end of a biographical article and fill in the parameters manually, or use this javascript which can add the template and fill in the information semi-automatically from infoboxes. If you add the template manually, place it just before the categories and interlanguage links. ({{DEFAULTSORT}} is not a real template but direct part of categorization, and therefore should be located between persondata and categories.)

<!-- Metadata: see [[Wikipedia:Persondata]] -->
{{Persondata
|NAME              = 
|ALTERNATIVE NAMES = 
|SHORT DESCRIPTION = 
|DATE OF BIRTH     = 
|PLACE OF BIRTH    = 
|DATE OF DEATH     = 
|PLACE OF DEATH    = 
}}

Next, fill out the data fields. Make sure the name is entered with the surname first (the same way you would with a category listing). Do not delete empty data fields, for example, if a person is still alive, you'll leave the date and place of death blank. Here is an example of a properly filled out template:

<!-- Metadata: see [[Wikipedia:Persondata]] -->
{{Persondata
|NAME              = Magellan, Ferdinand
|ALTERNATIVE NAMES = Magalhães, Fernão de (Portuguese); Magallanes, Fernando de (Spanish)
|SHORT DESCRIPTION = Sea explorer
|DATE OF BIRTH     = Spring [[1480]]
|PLACE OF BIRTH    = [[Sabrosa]], [[Portugal]]
|DATE OF DEATH     = [[April 27]], [[1521]]
|PLACE OF DEATH    = [[Mactan Island]], [[Cebu]], [[Philippines]]
}}

點睇 persondata

File:Persondatascreen.png

A screenshot showing Persondata from Mahatma Gandhi

By default, persondata is invisible to normal users. In order to make persondata visible, you must edit your user stylesheet. To do this, first make sure you are logged in. Then create a page at User:YourUserName/monobook.css and add the following line:

table.persondata {display:table;}

or, if you use Microsoft Internet Explorer:

table.persondata {display:block;}

Tip: After saving User:YourUserName/monobook.css, please empty the Browser-Cache, to see the changes: Mozilla/Firefox: Shift-Ctrl-R, Internet Explorer: Ctrl-F5, Opera: F5, Safari: ⌘-R, Konqueror: Ctrl-R.

If you can see the following block about Ferdinand Magellan, you have successfully made persondata visible: Template:Persondata To make persondata invisible again, simply remove the line of CSS given above from your user stylesheet.

Data fields

The data fields NAME, ALTERNATIVE NAMES, SHORT DESCRIPTION, DATE OF BIRTH, PLACE OF BIRTH, DATE OF DEATH, and PLACE OF DEATH are used to construct a persondata record. These fields can possibly be extended in the future.

Fieldname	Examples
NAME	Magellan, Ferdinand Bush, George Walker Beethoven, Ludwig van Van Zandt, Townes Brutus of Troy King, Martin Luther, Jr. Wainwright, Loudon III John Paul II Elizabeth II John the Baptist Francis of Assisi, Saint Tokugawa, Ieyasu Fujiwara no Michinaga
ALTERNATIVE NAMES	Magalhães, Fernão de (Portuguese); Magallanes, Fernando de (Spanish) Clemens, Samuel Langhorne (real name)
SHORT DESCRIPTION	Sea explorer German philosopher Anarchist writer and publisher 39th President of the United States
DATE OF BIRTH	1480 October 25, 1806 circa 470 BCE
PLACE OF BIRTH	Sabrosa, Portugal Texas Newark, New Jersey
DATE OF DEATH	April 27, 1521 January, 1945 1421
PLACE OF DEATH	Mactan Island, Cebu, Philippines Mount Juliet, Tennessee

Wikilinks in the persondata are not currently necessary; however, they may be useful for some future application.

人名

When specifying the person's name, use the following format: [surname], [forename] [middle names], [title]. For most cases this will be straightforward, for example, "George Walker Bush" becomes "Bush, George Walker". In some cases, however, there may be ambiguity about a person's surname. When in doubt, format the name according to how you would expect it to be alphabetized. For example, Ludwig van Beethoven would be alphabetized under "Beethoven", while Townes Van Zandt would be alphabetized under "Van Zandt". If you're not sure, ask someone familiar with the subject how they would alphabetize the name or consult a cataloguing guide such as the AACR2.

It is usually a good idea to list as much of a person's name as possible in the name field to avoid confusion with similar names. Do not include honorifics (such as "Dr.", "Professor", or "PhD"), however, unless they are part of a title of nobility.

動機

Without uniform formatting, it is very difficult to automatically extract useful information from biographical articles. It is also impossible to automatically alphabetize all the biographical articles since the titles typically begin with the person's first name. By adding standardized metadata to such articles, we can facilitate the creation of new applications for Wikipedia content, such as Wikipedia CD-ROMs, custom search applications, etc. Hopefully, this will be the first of many steps towards enriching Wikipedia with semantic content.

點抽 persondata

由個SQL資料庫掹

Using an SQL query, the persondata can be filtered from Wikipedia articles stored in a database. As an example, here is an SQL query that can be used to extract persondata from wikisign.org:

SELECT
   pages.cur_namespace,
   pages.cur_title,
   SUBSTRING(SUBSTRING(pages.cur_text FROM INSTR(pages.cur_text,'{{Persondata')), 1,
      INSTR(SUBSTRING(pages.cur_text FROM INSTR(pages.cur_text,'{{Persondata')),'}}')+1)
      AS 'Persondata'
FROM cur AS pd
JOIN templatelinks AS tl
   ON pd.cur_namespace = tl.tl_namespace
   AND pd.cur_title = tl.tl_title
JOIN cur AS pages
   ON tl.tl_from = pages.cur_id
   AND pages.cur_namespace = 0
WHERE pd.cur_namespace = 10
AND pd.cur_title = 'Persondata'

In order to be useful, however, the persondata must be further divided into individual data fields.

由個 XML dump 掹

to be done. This may help you:

At http://tools.wikimedia.de/~voj/pd/staging-area/ there is Persondata extracted from the German Wikipedia and script to extract this data from the XML dump. After transforming the data is then loaded into a database (an application of Extract, transform, load). You may need the following scripts:

Extract - calls three piped STX-scripts to extract Persondata templates
- addNamespaces.stx
- extractPersonendaten.stx - change "Personendaten" to "Persondata" in this script
- pd2tab.stx - change parameter names in this script
Transform - needs to be rewritten for english Wikipedia
load.pl - you must create a MySQL database first
Full ETL-Process - runs the other scripts

再睇

[2], [3] - corresponding template and example in Semantic Mediawiki (SMW); note that in SMW either the whole data field is a single link (relation), or the data field is not linked at all (attribute).
hCard - a microformat with similar properties.