Prvi commit

This commit is contained in:
David Štaleker
2023-05-12 09:00:07 +02:00
parent d3ffe93e42
commit 03b92525d7
14757 changed files with 9251133 additions and 53 deletions

View File

@@ -0,0 +1,15 @@
## .attributes
Attributes is an Object created during instance initialization (construction), and are used internally by `.get()` to replace dynamic parts of an item path.
| Attribute | Field |
| --- | --- |
| `language` | Language Subtag ([spec](http://www.unicode.org/reports/tr35/#Language_Locale_Field_Definitions)) |
| `script` | Script Subtag ([spec](http://www.unicode.org/reports/tr35/#Language_Locale_Field_Definitions)) |
| `region` or `territory` | Region Subtag ([spec](http://www.unicode.org/reports/tr35/#Language_Locale_Field_Definitions)) |
| `languageId` | Language Id ([spec](http://www.unicode.org/reports/tr35/#Unicode_language_identifier)) |
| `maxLanguageId` | Maximized Language Id ([spec](http://www.unicode.org/reports/tr35/#Likely_Subtags)) |
- `language`, `script`, `territory` (also aliased as `region`), and `maxLanguageId` are computed by [adding likely subtags](./src/likely-subtags.js) according to the [specification](http://www.unicode.org/reports/tr35/#Likely_Subtags).
- `languageId` is always in the succint form, obtained by [removing the likely subtags from `maxLanguageId`](./src/remove-likely-subtags.js) according to the [specification](http://www.unicode.org/reports/tr35/#Likely_Subtags).

View File

@@ -0,0 +1,9 @@
## new Cldr( locale )
Create a new instance of Cldr.
| Parameter | Type | Example |
| --- | --- | --- |
| *locale* | String | `"en"`, `"pt-BR"` |
More information in the [specification](http://www.unicode.org/reports/tr35/#Locale).

View File

@@ -0,0 +1,18 @@
## .get( path )
Get the item data given its path, or `undefined` if missing.
| Parameter | Type | Example |
| --- | --- | --- |
| *path* | String or<br>Array | `"/cldr/main/{languageId}/numbers/symbols-numberSystem-latn/decimal"`<br>`[ "cldr", "main", "{languageId}", "numbers", "symbols-numberSystem-latn", "decimal" ]` |
On *path* parameter, note the leading "/cldr" can be ommited. Also, note that its Array form accepts subpaths, eg. `[ "cldr/main", "{languageId}", "numbers/symbols-numberSystem-latn/"decimal" ]`.
The [locale attributes](#cldrattributes), eg. `{languageId}`, are replaced with their appropriate values.
If extended with the `cldr/unresolved.js` module, get the item data or lookup by following [locale inheritance](http://www.unicode.org/reports/tr35/#Locale_Inheritance), set a local resolved cache if it's found (for subsequent faster access), or return `undefined`.
```javascript
ptBr.get( "main/{languageId}/numbers/symbols-numberSystem-latn/decimal" );
// ➡ ","
```

View File

@@ -0,0 +1,23 @@
## Cldr.load( json, ... )
Load resolved or unresolved [1] JSON data.
| Parameter | Type | Description |
| --- | --- | --- |
| *json* | Object | Resolved or unresolved [1] CLDR JSON data |
```javascript
Cldr.load({
"main": {
"pt-BR": {
"numbers": {
"symbols-numberSystem-latn": {
"decimal": ","
}
}
}
}
});
```
1: Unresolved processing is **only available** after loading `cldr/unresolved.js` extension module.

View File

@@ -0,0 +1,12 @@
## .main( path )
It's an alias for `.get([ "main/{languageId}", ... ])`.
| Parameter | Type | Example |
| --- | --- | --- |
| *path* | String or<br>Array | See `cldr.get()` for more information |
```javascript
ptBr.main( "numbers/symbols-numberSystem-latn/decimal" );
// ➡ ","
```

View File

@@ -0,0 +1,10 @@
## get ➡ ( path, value )
Triggered before a `.get()` (or any alias) return. The triggered listener receives the normalized *path* and the *value* found.
| Parameter | Description |
| --- | --- |
| *path* | See [../core/get.md](.get()) for more information |
| *value* | See [../core/get.md](.get()) for more information |
See [Cldr.on()](global_on.md) or [.on()](on.md) for example.

View File

@@ -0,0 +1,10 @@
## Cldr.off( event, listener )
Removes a listener function from the specified event globally (for all instances).
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
See [Cldr.on()](global_on.md) for example.

View File

@@ -0,0 +1,32 @@
## Cldr.on( event, listener )
Add a listener function to the specified event globally (for all instances).
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
```javascript
Cldr.load({
foo: "bar"
});
function log( path, value ) {
console.log( "Got", path, value );
}
Cldr.on( "get", log );
en = new Cldr( "en" );
en.get( "foo" );
// Got foo bar (logged)
// ➡ bar
zh = new Cldr( "zh" );
zh.get( "foo" );
// Got foo bar (logged)
// ➡ bar
Cldr.off( "get", log );
```

View File

@@ -0,0 +1,28 @@
## Cldr.once( event, listener )
Add a listener function to the specified event globally (for all instances). It will be automatically removed after it's first execution.
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
```javascript
Cldr.load({
foo: "bar"
});
function log( path, value ) {
console.log( "Got", path, value );
}
Cldr.once( "get", log );
cldr = new Cldr( "en" );
cldr.get( "foo" );
// Got foo bar (logged)
// ➡ bar
cldr.get( "foo" );
// ➡ bar
```

View File

@@ -0,0 +1,10 @@
## .off( event, listener )
Removes a listener function from the specified event for this instance.
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
See [cldr.on()](on.md) for example.

View File

@@ -0,0 +1,26 @@
## .on( event, listener )
Add a listener function to the specified event for this instance.
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
```javascript
Cldr.load({
foo: "bar"
});
function log( path, value ) {
console.log( "Got", path, value );
}
cldr = new Cldr( "en" );
cldr.on( "get", log );
cldr.get( "foo" );
// Got foo bar (logged)
// ➡ bar
cldr.off( "get", log );
```

View File

@@ -0,0 +1,27 @@
## .once( event, listener )
Add a listener function to the specified event for this instance. It will be automatically removed after it's first execution.
| Parameter | Type | Example |
| --- | --- | --- |
| *event* | String | `"get"` |
| *listener* | Function | |
```javascript
Cldr.load({
foo: "bar"
});
function log( path, value ) {
console.log( "Got", path, value );
}
cldr = new Cldr( "en" );
cldr.once( "get", log );
cldr.get( "foo" );
// Got foo bar (logged)
// ➡ bar
cldr.get( "foo" );
// ➡ bar
```

View File

@@ -0,0 +1,12 @@
## .supplemental( path )
It's an alias for `.get([ "supplemental", ... ])`.
| Parameter | Type | Example |
| --- | --- | --- |
| *path* | String or<br>Array | See `cldr.get()` for more information |
```javascript
en.supplemental( "gender/personList/{language}" );
// ➡ "neutral"
```

View File

@@ -0,0 +1,8 @@
## .supplemental.timeData.allowed()
Helper function. Return the supplemental timeData allowed of locale's territory.
```javascript
en.supplemental.timeData.allowed();
// ➡ "H h"
```

View File

@@ -0,0 +1,8 @@
## .supplemental.timeData.preferred()
Helper function. Return the supplemental timeData preferred of locale's territory.
```javascript
en.supplemental.timeData.preferred();
// ➡ "h"
```

View File

@@ -0,0 +1,8 @@
## .supplemental.weekData.firstDay()
Helper function. Return the supplemental weekData firstDay of locale's territory.
```javascript
en.supplemental.weekData.firstDay();
// ➡ "sun"
```

View File

@@ -0,0 +1,8 @@
## .supplemental.weekData.minDays()
Helper function. Return the supplemental weekData minDays of locale's territory as a Number.
```javascript
en.supplemental.weekData.minDays();
// ➡ 1
```

View File

@@ -0,0 +1,12 @@
## .get( path )
Overload (extend) `.get()` to get the item data or lookup by following [locale inheritance](http://www.unicode.org/reports/tr35/#Locale_Inheritance), set a local resolved cache if it's found (for subsequent faster access), or return `undefined`.
| Parameter | Type | Example |
| --- | --- | --- |
| *path* | String or<br>Array | See `cldr.get()` above for more information |
```javascript
ptBr.get( "main/{languageId}/numbers/symbols-numberSystem-latn/decimal" );
// ➡ ","
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 KiB

View File

@@ -0,0 +1,134 @@
## Bundle Lookup Matcher
Bundle Lookup is the process of selecting the right dataset for the requested locale. We run this process during instance creation and set it on `instance.attributes.bundle`, which is further used when traversing items of the **main** dataset.
User must load likelySubtags and any wanted main datasets prior to creating an instance. For example:
```javascript
Cldr.load(
require( "cldr-data/supplemental/likelySubtags" ), // JSON data from supplemental/likelySubtags.json
require( "cldr-data/main/en-US/ca-gregorian" ), // JSON data from main/en-US/ca-gregorian.json
require( "cldr-data/main/en-GB/ca-gregorian" ) // JSON data from main/en-GB/ca-gregorian.json
);
var enUs = new Cldr( "en-US" );
console.log( enUs.attributes.bundle ); // "en-US"
console.log( enUs.main( "dates/calendars/gregorian/dateFormats/short" ) ); // "M/d/yy"
var enGb = new Cldr( "en-GB" );
console.log( enGb.attributes.bundle ); // "en-GB"
console.log( enGb.main( "dates/calendars/gregorian/dateFormats/short" ) ); // "dd/MM/y"
```
When instances are created, its `.attributes.bundle` reveals the matched bundle. The `.main` method uses this information to traverse the correct main item.
What happens if we include `main/en/ca-gregorian` to the above example?
```javascript
Cldr.load(
require( "cldr-data/supplemental/likelySubtags" ), // JSON data from supplemental/likelySubtags.json
require( "cldr-data/main/en/ca-gregorian" ), // JSON data from main/en/ca-gregorian.json
require( "cldr-data/main/en-US/ca-gregorian" ), // JSON data from main/en-US/ca-gregorian.json
require( "cldr-data/main/en-GB/ca-gregorian" ) // JSON data from main/en-GB/ca-gregorian.json
);
var enUs = new Cldr( "en-US" ); // English as spoken in United States.
console.log( enUs.attributes.bundle ); // "en"
console.log( enUs.main( "dates/calendars/gregorian/dateFormats/short" ) ); // "M/d/yy"
var enGb = new Cldr( "en-GB" ); // English as spoken in Great Britain.
console.log( enGb.attributes.bundle ); // "en-GB"
console.log( enGb.main( "dates/calendars/gregorian/dateFormats/short" ) ); // "dd/MM/y"
```
Now, the `en-US` requested locale uses the `en` bundle (not the `en-US` bundle as used in the first example) and `en-GB` still uses the `en-GB` bundle. Why? Because, `en` is the default content for `en-US` (deduced from likelySubtags data). Default content means that the child content is all in the parent. Therefore, both `en` and `en-US` are identical. Our bundle lookup matching algorithm always picks the grandest available parent. Note the retrieved main item is still the correct one (as it should be).
A good observer may notice that loading both `main/en/ca-gregorian` and `main/en-US/ca-gregorian` is redundant. Although loading both is not a problem, loading either the `en` or the `en-US` bundle alone is enough.
Let's add a bit of sugar to the requested locales.
```javascript
var en = new Cldr( "en" ); // English.
console.log( en.attributes.bundle ); // "en"
var enUs = new Cldr( "en-US" ); // English as spoken in United States.
console.log( enUs.attributes.bundle ); // "en"
var enLatnUs = new Cldr( "en-Latn-US" ); // English in Latin script as spoken in the United States.
console.log( enLatnUs.attributes.bundle ); // "en"
```
All instances above obviously matches the same `en` bundle. Because, (a) `en` is the default content for `en-US` and (b) `en-US` is the default content for `en-Latn-US`.
What happens if the requested locale includes [Unicode extensions][]?
```javascript
var en = new Cldr( "en-US-u-cu-USD" );
console.log( en.attributes.bundle ); // "en"
console.log( en.main( "numbers/currencies/{u-cu}/displayName" ) ); // "US Dollar"
```
[Unicode extensions][] are obviously ignored on bundle lookup. Note they are accessible via variable replacements.
Below are other non-obvious lookups.
```javascript
Cldr.load(
require( "cldr-data/supplemental/likelySubtags" ), // JSON data from supplemental/likelySubtags.json
require( "cldr-data/main/sr-Cyrl/numbers" ), // JSON data from main/sr-Cyrl/numbers.json
require( "cldr-data/main/sr-Latn/numbers" ), // JSON data from main/sr-Latn/numbers.json
require( "cldr-data/main/zh-Hant/numbers" ) // JSON data from main/zh-Hant/numbers.json
);
var srCyrl = new Cldr( "sr-Cyrl" );
console.log( srCyrl.attributes.bundle ); // "sr-Cyrl"
console.log( srCyrl.main( "numbers/decimalFormats-numberSystem-latn/short/decimalFormat/1000-count-one" ) );
// ➜ "0 хиљ'.'"
var srRS = new Cldr( "sr-RS" );
console.log( srRs.attributes.bundle ); // "sr-Cyrl"
console.log( srRs.main( "numbers/decimalFormats-numberSystem-latn/short/decimalFormat/1000-count-one" ) );
// ➜ "0 хиљ'.'"
var srLatnRS = new Cldr( "sr-Latn-RS" );
console.log( srLatnRS.attributes.bundle ); // "sr-Latn"
console.log( srLatnRS.main( "numbers/decimalFormats-numberSystem-latn/short/decimalFormat/1000-count-one" ) );
// ➜ "0 hilj'.'"
var zhTW = new Cldr( "zh-TW" );
console.log( zhTW.attributes.bundle ); // "zh-Hant"
console.log( zhTW.main( "numbers/symbols-numberSystem-hanidec/nan" ) ); // "非數值"
```
Finally, if an instance is created whose bundle hasn't been loaded yet, its `.attributes.bundle` is set as `null`. If this instance is used to traverse a main dataset, an error is thrown. If this instance is used to traverse any non-main dataset (e.g., supplemental/postalCodeData.json) it can be used just fine.
```javascript
var zhCN = new Cldr( "zh-CN" );
console.log( zhCN.attributes.bundle ); // null
console.log( zhCN.main( /* something */ ) ); // Error: E_MISSING_BUNDLE
```
### Implementation details
[UTS#35][] doesn't specify how bundle lookup matcher should be implemented. [RFC 4647][] section 3.4 "Lookup" has an algorithm for that, although it fails in various cases listed above. Mark Davis, the co-founder and president of the Unicode Consortium, said (via CLDR mailing list and via [Fixing Inheritance doc][]) that bundle lookup should happen via [LanguageMatching](http://www.unicode.org/reports/tr35/#LanguageMatching).
Our belief is that LanguageMatching is a great algorithm for Best Fit Matcher. Although, it's an overkill for Lookup Matcher.
ICU (a known CLDR implementation) doesn't use LanguageMatching for Bundle Lookup Matcher either. But, it has its own implementation, which has its own flaws as Mark Davis says in the Fixing Inheritance doc "ICU uses the %%ALIAS element to counteract some of these problems... It doesnt fix all of them, and the data is not derivable from CLDR."
We also believe ICU's aliases approach is not the best solution. Instead we believe in the following approach, whose result matches LanguageMatching with a score threshold of 100%.
`BundleLookupMatcher( requestedLocale, availableBundles )` is used for bundle lookup given an arbitrary `requestedLocale`.
1. Create a Hash (aka Dictionary or Key-Value-Pair) object, named `availableBundlesMap`, that maps each `availableBundle` (value) to its respective [Remove Likely Subtags][] result (key).
1. In case of a duplicate key, keep the smaller value, i.e., keep the available bundle locale whose length is the smallest; e.g., keep { "en": "en" } instead of { "en": "en-US" }.
1. [Remove Likely Subtags][] from `requestedLocale` and let `minRequestedLocale` keep its result.
1. Return `availableBundlesMap[ minRequestedLocale ]`.
This algorithm is faster than LanguageMatching and needs no extra CLDR to be created and maintained (likelySubtags is sufficient). Note the `availableBundlesMap` can be cached for improved performance on repeated calls.
[Fixing Inheritance doc]: https://docs.google.com/document/d/1qZwEVb4kfODi2TK5f4x15FYWj5rJRijXmSIg5m6OH8s/edit
[Remove Likely Subtags]: http://www.unicode.org/reports/tr35/tr35.html#Likely_Subtags
[RFC 4647]: http://www.ietf.org/rfc/rfc4647.txt
[Unicode extensions]: http://Www.unicode.org/reports/tr35/#u_Extension
[UTS#35]: http://www.unicode.org/reports/tr35