to top

Interfaces
- AttributedCharacterIterator
- CharacterIterator
Classes
Enums
- Normalizer.Form
Exceptions
- ParseException

Summary: Inherited Constants | Ctors | Methods | Inherited Methods | [Expand All]

Added in API level 1

public class

RuleBasedCollator

extends Collator

java.lang.Object
↳	java.text.Collator
	↳	java.text.RuleBasedCollator

Class Overview

A concrete implementation class for Collation.

RuleBasedCollator has the following restrictions for efficiency (other subclasses may be used for more complex languages):

If a French secondary ordering is specified it applies to the whole collator object.
All non-mentioned Unicode characters are at the end of the collation order.
If a character is not located in the RuleBasedCollator, the default Unicode Collation Algorithm (UCA) rule-based table is automatically searched as a backup.

The collation table is composed of a list of collation rules, where each rule is of three forms:

 <modifier>
 <relation> <text-argument>
 <reset> <text-argument>

The rule elements are defined as follows:

Modifier: There is a single modifier which is used to specify that all accents (secondary differences) are backwards:
- '@' : Indicates that accents are sorted backwards, as in French.
Relation: The relations are the following:
- '<' : Greater, as a letter difference (primary)
- ';' : Greater, as an accent difference (secondary)
- ',' : Greater, as a case difference (tertiary)
- '=' : Equal
Text-Argument: A text-argument is any sequence of characters, excluding special characters (that is, common whitespace characters [0009-000D, 0020] and rule syntax characters [0021-002F, 003A-0040, 005B-0060, 007B-007E]). If those characters are desired, you can put them in single quotes (for example, use '&' for ampersand). Note that unquoted white space characters are ignored; for example, b c is treated as bc.
Reset: There is a single reset which is used primarily for contractions and expansions, but which can also be used to add a modification at the end of a set of rules:
- '&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.

This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:

 a < b < c
 a < b & b < c
 a < c & a < b

Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:

 a < b & a < c
 a < c & a < b

Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. For example "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset. In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as if it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as if it expands to two characters (expressed as "a,A < b,B ... & ae;ã & AE;Ã", where ã and Ã are the escape sequences for a-umlaut).

Ignorable Characters

For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character.

Normalization and Accents

RuleBasedCollator automatically processes its rule table to include both pre-composed and combining-character versions of accented characters. Even if the provided rule string contains only base characters and separate combining accent characters, the pre-composed accented characters matching all canonical combinations of characters from the rule string will be entered in the table.

This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. However, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION to enable sorting of combining sequences. For more information, see The Unicode Standard, Version 3.0.

Errors

The following rules are not valid:

A text-argument contains unquoted punctuation symbols, for example "a < b-c < d".
A relation or reset character is not followed by a text-argument, for example "a < , b".
A reset where the text-argument (or an initial substring of the text-argument) is not already in the sequence or allocated in the default UCA table, for example "a < b & e < f".

If you produce one of these errors, RuleBasedCollator throws a ParseException.

Examples

Normally, to create a rule-based collator object, you will use Collator's factory method getInstance. However, to create a rule-based collator object with specialized rules tailored to your needs, you construct the RuleBasedCollator with the rules contained in a String object. For example:

 String Simple = "< a < b < c < d";
 RuleBasedCollator mySimple = new RuleBasedCollator(Simple);

Or:

 String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I"
         + "< j,J< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R"
         + "< s,S< t,T< u,U< v,V< w,W< x,X< y,Y< z,Z"
         + "< å=å,Å=Å"
         + ";aa,AA< æ,Æ< ø,Ø";
 RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);

Combining Collators is as simple as concatenating strings. Here is an example that combines two Collators from two different locales:

 // Create an en_US Collator object
 RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator
         .getInstance(new Locale("en", "US", ""));

 // Create a da_DK Collator object
 RuleBasedCollator da_DKCollator = (RuleBasedCollator)Collator
         .getInstance(new Locale("da", "DK", ""));

 // Combine the two collators
 // First, get the collation rules from en_USCollator
 String en_USRules = en_USCollator.getRules();

 // Second, get the collation rules from da_DKCollator
 String da_DKRules = da_DKCollator.getRules();

 RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules);
 // newCollator has the combined rules

The next example shows to make changes on an existing table to create a new Collator object. For example, add "& C < ch, cH, Ch, CH" to the en_USCollator object to create your own:

 // Create a new Collator object with additional rules
 String addRules = "& C < ch, cH, Ch, CH";

 RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator + addRules);
 // myCollator contains the new rules

The following example demonstrates how to change the order of non-spacing accents:

 // old rule
 String oldRules = "= ¨ ; ¯ ; ¿" + "< a , A ; ae, AE ; æ , Æ"
         + "< b , B < c, C < e, E & C < d, D";

 // change the order of accent characters
 String addOn = "& ¿ ; ¯ ; ¨;";

 RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);

The last example shows how to put new primary ordering in before the default setting. For example, in the Japanese Collator, you can either sort English characters before or after Japanese characters:

 // get en_US Collator rules
 RuleBasedCollator en_USCollator = (RuleBasedCollator)
     Collator.getInstance(Locale.US);

 // add a few Japanese character to sort before English characters
 // suppose the last character before the first base letter 'a' in
 // the English collation rule is ア
 String jaString = "& ア , ー < ト";

 RuleBasedCollator myJapaneseCollator =
     new RuleBasedCollator(en_USCollator.getRules() + jaString);

Summary

[Expand]

Inherited Constants

From class java.text.Collator

Public Constructors
	RuleBasedCollator(String rules) Constructs a new instance of `RuleBasedCollator` using the specified `rules`.

Public Methods
Object	clone() Returns a new collator with the same collation rules, decomposition mode and strength value as this collator.
int	compare(String source, String target) Compares the `source` text to the `target` text according to the collation rules, strength and decomposition mode for this `RuleBasedCollator`.
boolean	equals(Object obj) Compares the specified object with this `RuleBasedCollator` and indicates if they are equal.
CollationElementIterator	getCollationElementIterator(String source) Obtains a `CollationElementIterator` for the given string.
CollationElementIterator	getCollationElementIterator(CharacterIterator source) Obtains a `CollationElementIterator` for the given `CharacterIterator`.
CollationKey	getCollationKey(String source) Returns the `CollationKey` for the given source text.
String	getRules() Returns the collation rules of this collator.
int	hashCode() Returns an integer hash code for this object.

[Expand]

Inherited Methods

From class java.text.Collator

Object	clone() Returns a new collator with the same decomposition mode and strength value as this collator.
abstract int	compare(String string1, String string2) Compares two strings to determine their relative order.
int	compare(Object object1, Object object2) Compares two objects to determine their relative order.
boolean	equals(Object object) Compares this collator with the specified object and indicates if they are equal.
boolean	equals(String string1, String string2) Compares two strings using the collation rules to determine if they are equal.
static Locale[]	getAvailableLocales() Returns an array of locales for which custom `Collator` instances are available.
abstract CollationKey	getCollationKey(String string) Returns a `CollationKey` for the specified string for this collator with the current decomposition rule and strength value.
int	getDecomposition() Returns the decomposition rule for this collator.
static Collator	getInstance() Returns a `Collator` instance which is appropriate for the user's default `Locale`.
static Collator	getInstance(Locale locale) Returns a `Collator` instance which is appropriate for `locale`.
int	getStrength() Returns the strength value for this collator.
abstract int	hashCode() Returns an integer hash code for this object.
void	setDecomposition(int value) Sets the decomposition rule for this collator.
void	setStrength(int value) Sets the strength value for this collator.

From class java.lang.Object

Object	clone() Creates and returns a copy of this `Object`.
boolean	equals(Object o) Compares this instance with the specified object and indicates if they are equal.
void	finalize() Invoked when the garbage collector has detected that this instance is no longer reachable.
final Class<?>	getClass() Returns the unique instance of `Class` that represents this object's class.
int	hashCode() Returns an integer hash code for this object.
final void	notify() Causes a thread which is waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
final void	notifyAll() Causes all threads which are waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
String	toString() Returns a string containing a concise, human-readable description of this object.
final void	wait() Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object.
final void	wait(long millis, int nanos) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.
final void	wait(long millis) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.

From interface java.util.Comparator

Public Constructors

public RuleBasedCollator (String rules)

Added in API level 1

Constructs a new instance of RuleBasedCollator using the specified rules. The rules are usually either hand-written based on the class description or the result of a former getRules() call.

Note that the rules are actually interpreted as a delta to the standard Unicode Collation Algorithm (UCA). This differs slightly from other implementations which work with full rules specifications and may result in different behavior.

Parameters

rules	the collation rules.

Throws

NullPointerException	if `rules == null`.
ParseException	if `rules` contains rules with invalid collation rule syntax.

Public Methods

public Object clone ()

Added in API level 1

Returns a new collator with the same collation rules, decomposition mode and strength value as this collator.

Returns

a shallow copy of this collator.

public int compare (String source, String target)

Added in API level 1

Compares the source text to the target text according to the collation rules, strength and decomposition mode for this RuleBasedCollator. See the Collator class description for an example of use.

General recommendation: If comparisons are to be done with the same strings multiple times, it is more efficient to generate CollationKey objects for the strings and use CollationKey.compareTo(CollationKey) for the comparisons. If each string is compared to only once, using RuleBasedCollator.compare(String, String) has better performance.

Parameters

source	the source text.
target	the target text.

Returns

an integer which may be a negative value, zero, or else a positive value depending on whether source is less than, equivalent to, or greater than target.

public boolean equals (Object obj)

Added in API level 1

Compares the specified object with this RuleBasedCollator and indicates if they are equal. In order to be equal, object must be an instance of Collator with the same collation rules and the same attributes.

Parameters

obj	the object to compare with this object.

Returns

true if the specified object is equal to this RuleBasedCollator; false otherwise.

public CollationElementIterator getCollationElementIterator (String source)

Added in API level 1

Obtains a CollationElementIterator for the given string.

Parameters

source	the source string.

Returns

the CollationElementIterator for source.

public CollationElementIterator getCollationElementIterator (CharacterIterator source)

Added in API level 1

Obtains a CollationElementIterator for the given CharacterIterator. The source iterator's integrity will be preserved since a new copy will be created for use.

Parameters

source	the source character iterator.

Returns

a CollationElementIterator for source.

public CollationKey getCollationKey (String source)

Added in API level 1

Returns the CollationKey for the given source text.

Parameters

source	the specified source text.

Returns

the CollationKey for the given source text.

public String getRules ()

Added in API level 1

Returns the collation rules of this collator. These rules can be fed into the RuleBasedCollator(String) constructor.

Note that the rules are actually interpreted as a delta to the standard Unicode Collation Algorithm (UCA). Hence, an empty rules string results in the default UCA rules being applied. This differs slightly from other implementations which work with full rules specifications and may result in different behavior.

Returns

the collation rules.

public int hashCode ()

Added in API level 1

Returns an integer hash code for this object. By contract, any two objects for which equals(Object) returns true must return the same hash code value. This means that subclasses of Object usually override both methods or neither method.

Note that hash values must not change over time unless information used in equals comparisons also changes.

See Writing a correct hashCode method if you intend implementing your own hashCode method.

Returns

this object's hash code.

Results

Interfaces

Classes

Enums

Exceptions

RuleBasedCollator

Class Overview

Ignorable Characters

Normalization and Accents

Errors

Examples

Summary

Public Constructors

public RuleBasedCollator (String rules)

Parameters

Throws

Public Methods

public Object clone ()

Returns

See Also

public int compare (String source, String target)

Parameters

Returns

public boolean equals (Object obj)

Parameters

Returns

See Also

public CollationElementIterator getCollationElementIterator (String source)

Parameters

Returns

public CollationElementIterator getCollationElementIterator (CharacterIterator source)

Parameters

Returns

public CollationKey getCollationKey (String source)

Parameters

Returns

public String getRules ()

Returns

public int hashCode ()

Returns