Internationalization: An Introduction [PDF] The
Internationalization and Unicode Conference tutorial. This version is the one presented at IUC31.
Character Encodings and Unicode [PowerPoint 2007] Tutorial slides for character encodings and Unicode. This version is being prepared for future presentation.
[ PPT of IUC32 version] note that this version does not match the PDF.
Warning: these are very large files
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License.
I have chosen this license to allow others to use these materials for training and learning purposes. A waiver will generally be granted for commercial use or modification. However, I would like to know about modifications and especially improvements so that these can be shared with the community.
[RFC 5646] AUTH48 Copy: Tags for the Identification of Languages
RFC 4646bis: Tags for the Identification of Languages
Mark Davis and Addison Phillips, editors
A revision to RFC 4646 that incorporates ISO 639-3 and ISO 639-5 language codes into the language subtag registry.
HTML Version, draft-00 (TXT) (XML)(wdiff)
draft-01 (HTML)- (TXT)- (XML)(wdiff)
draft-02 [2006-12-18] (HTML)- (TXT)- (XML)(wdiff)
draft-03 [2007-03-28] (HTML)- (TXT)- (XML)(wdiff)
draft-04 [2007-04-05] (HTML)- (TXT)- (XML)(wdiff)
draft-05 [2007-04-22] (HTML)- (TXT)- (XML)(wdiff)
draft-06 [2007-05-10] (HTML)- (TXT)- (XML)(wdiff)
draft-07 [2007-07-17] (HTML)- (TXT)- (XML)(wdiff)
draft-08 [2007-08-24] (HTML)- (TXT)- (XML)(wdiff)
draft-09 [2007-11-14] (HTML)- (TXT)- (XML)(wdiff)
draft-10 [2007-12-03] (HTML)- (TXT)- (XML)(wdiff)
draft-11 [2007-12-14] (HTML)- (TXT)- (XML)(wdiff)
draft-12 [2008-03-14] (HTML)- (TXT)- (XML)(wdiff)
draft-13 [2008-04-29] (HTML)- (TXT)- (XML)(wdiff)
draft-14 [2008-05-16] (HTML)- (TXT)- (XML)(wdiff)
draft-15 [2008-06-09] (HTML)- (TXT)- (XML)(wdiff)
draft-16 [2008-07-09] (HTML)- (TXT)- (XML)(wdiff)
draft-17 [2008-08-17] (HTML)WGLC- (TXT)- (XML)(wdiff)
draft-18 [2008-10-31] (HTML)post-WGLC- (TXT)- (XML)(wdiff)
draft-19 [2008-12-02] (HTML)post-WGLC- (TXT)- (XML)(wdiff)
draft-20 [2008-12-10] (HTML)IETF LC?- (TXT)- (XML)(wdiff)
draft-21 [2009-02-23] (HTML)IETF LC- (TXT)- (XML)(wdiff)
draft-22 [2009-05-18] (HTML)AD comments- (TXT)- (XML)(wdiff)
draft-22 [2009-06-11] (HTML)AD comments- (TXT)- (XML)(wdiff)
Final version AUTH48- (TXT)- (XML)(wdiff)
Unicode 32 Presentations
Internationalization & Unicode Conference 32 Presentations
These are my presentations for IUC32 (IUC0x20!) in PowerPoint format. Some are in the newer Office 2008 format. You are welcome to look at these, but not to adapt them for any purpose (with the exception that you may use the Internationalization tutorial subject to the (cc) license shown above).
If you see these here, come introduce yourself at the conference next week!
[RFC 4646] Tags for the Identification of Languages
RFC 4646: Tags for the Identification of Languages
Mark Davis and Addison Phillips, editors
Developed by the IETF Language Tag Registry Working Group (LTRU). The RFC that defines language tags used in various Internet standards and protocols, as well as in HTML, XML, locale standards (such as CLDR or .Net), and so forth. This RFC obsoletes RFC 3066 and 1766.
W3C I18N Article: Understanding the New Language Tags
by Richard Ishida, taken from my Multilingual article
- [RFC 4647] Matching of Language Tags
[RFC 4645] Language Subtag Initial Registry
RFC 4645: Initial Language Subtag Registry
Doug Ewell, editor
This RFC defined the initial IANA Language Subtag Registry and contains the instructions for how to assemble that registry. Doug is also currently editing RFC 4645bis and his personal website contains the various documents related to that effort.
W3C Workshop: Constraints and Capabilities: Internationalization of Web Services
Constraints and Capabilities Position Paper: Internationalization of Web Services
HTML version A paper which examines the internationalization of Web services and how policy technologies might be affected. Those interested in the subject could do worse than see the Web Services Internationalization Usage Scenarios and Requirements documents produced by the W3C Internationalization Working Group. These papers can be found linked from the WS Task Force page.
[Internet-Draft] The Record-Jar Format
[Internet-Draft] The record-jar Format
Addison Phillips, author
A Guide to Configuring Computers to Edit Unicode
A Guide to Configuring Computers to Display and Process Non-ASCII Text
Learn to Type Japanese (and other languages) What you need to know in order to configure your computers to display and type text in languages other than English. Includes screenshots and instructions for many flavors of Windows, Mac, Unix.
Learning to Test with non-ASCII Data
A little piece on how to plan internationalization testing, with a focus on the management of test matrices.
- It’s About Time
Command Line Interfaces: Internationalizing Them. C and Encodings short, not altogether complete primer on working with
A Delphi Internationalization Cookbook talks about mulitbyte enabling
Character Sets in JSP Demonstrates the use of the page directives, taglibs, and other niceties with JSP and servlet. This demo is evolving.
Java Locales: Lightweight demo showing some of the data associated with a Java locale.
People's Names and Software Under construction, some information about how to handle personal names in software. A more extensive paper is linked elsewhere on this page.
Papers and Presentations
IMUG Feburary 2013 Presentation and links to demos, etc.
IUC35 Presentations and Demos Includes links to the PowerPoint slides and to some demo pages. In addition to the Internationalization Tutorial there was a presentation with Richard Ishida called Towards the Promised Land: Globalization Developments in Web Standards
IUC34 IRIs: Beyond the Napkin (off-site link) Current developments in IRIs and URIs from the co-chair and co-editor of the IETF IRI WG.
IUC30: The Theory and Practice of Pseudo-Translation [pdf] Discusses pseudo-translation and how it can be used—particularly in testing software for non-ASCII character support. [ppt]
10th Open Forum on Metadata Registries presentation:Making Sense of Language Tags (in PowerPoint format). Covers the history of language identification starting with ISO 639 and leading up to RFC 4646bis and the lastest changes in BCP 47. Presented at the Metadata Forum in New York City, July 2007.
Language Standards for Global Business keynote:How Standards Happen (and why sometimes they don't) (in PowerPoint format). The last second presentation I wrote for the LSGB conference in Barcelona, May 2006.
W3C I18N Article: Understanding the New Language Tags: Adapted from my Multilingual magazine article of the same name. Richard Ishida has a lot to do with it appearing here. May 2006.
xml:lang in XML Document Schemas
Discusses when to use the
xml:lang attribute in your XML documents and when to use a different element or attribute
to identify natural languages.
W3C Note: Working with Time Zones describes the problems you might encounter when working with the date and time types in XML Schema.
Something for Nothing? (a version of this paper appeared in Multilingual #69): Is Translation Memory delivering on its ROI promises?
Unicode 27: Language Tags: A Status Report is a companion piece to the slide presentation that Mark Davis ended up delivering at IUC27.
Unicode 26: Personal Names in Software reviews how to handle people's names in software. Slides (PPT). While Name Games is a very cursory look at two-dimensional resource problems when displaying personal names.
ESWC 2005: RFC 3066bis and the Semantic Web paper with Jeremy Carroll talks about how language tags that follow the structure of RFC 3066bis could be adapted to the Semantic Web to provide better searching and matching of content. This paper was published by Springer-Verlag in Lecture Notes in Computer Science and is available on-line here.
Unicode 25: Web Services and Internationalization explores the work of the W3C Internationalization Web Services Task Force.
Unicode 25: Language and Locale Tags gives a few of the reasons behind the RFC3066bis effort (see above). Look for the presentation here eventually.
Unicode 24: Approaches to Delivering Localized Software examines some of the different ways that localized (translated) software can be created, managed, and delivered to customers. [PDF Format]
Presentation: Managing Multi-Lingual Websites Presentation by Addison Phillips at the September 19, 2000 meeting of the Bay Area Publication Manager's Forum.
Presentation: Creating Multi-Lingual and Multi-Locale Databases International Unicode Conference 19 Presentation [PowerPoint]. SqlDriverConnect for ODBC OTN note on Unicode connections.
Whitepaper: Creating Multi-Lingual and Multi-Locale Databases International Unicode Conference 19 Whitepaper [PDF]. The content of this document expands on that of the presentation.
Whitepaper: Four ACEs, a Survey of ASCII Compatible Encodings International Unicode Conference 22 Whitepaper [PDF].
Presentation: ULocale Tags from IUC23 IUC23 slides discussing the need for locale tags and the ideas behind ULocales. [PDF format].
I18N Gurus The resource for finding out about internationalization.
Unicode The Universal Character Set.
The UTF-8 and Unicode FAQ for Unix/Linux Contains a variety of useful information about using Unicode and especially the UTF-8 encoding of Unicode in a Unix environment.
MS Shell Font The Microsoft documentation on the Shell font, useful for certain kinds of "code page" programs on Windows.
Web Services Internationalization @ W3C Home page of the Task Force, where you'll find useful material on internationalization of Web services.
ICU International Components for Unicode. The IBM Open Source library for C and Java, which provides internationalization capabilities and Unicode support.
UTF-8 and Unicode FAQ for Unix by Markus Kuhn. This has a lot of useful information on dealing with Unicode in your C programs on UNIX.
PC Keyboard Where do you get a real, old-fashioned, clicky IBM keyboard? This is the company that bought the patents from Lexmark.