- A String is made up of characters.
- Every character is represented by a Unicode code point value.
- Java String code point methods allow us to get the Unicode code points of its characters.
- There are 4 methods in String class related to code points.
Table of Contents
Java String Code Point Methods
Here is the list of 4 methods related to code points.
- codePointAt(int index): returns the integer representing the Unicode code point at the given index. If the index is invalid, the
IndexOutOfBoundsException
is thrown. - codePointBefore(int index): returns the character code point before the given index. The valid value for the index is from 1 to length of the string. If the index is less than 1 or greater than the length of the string,
IndexOutOfBoundsException
is thrown. - codePointCount(int beginIndex, int endIndex): returns the number of Unicode code points between the two indexes. The beginIndex is included and endIndex is excluded in the calculation. The IndexOutOfBoundsException is thrown for invalid index values.
- offsetByCodePoints(int index, int codePointOffset): returns the index within this
String
that is offset from the givenindex
bycodePointOffset
code points
The first method is the most relevant one to get the Unicode code point value of String characters.
The other three methods are useful when we have a special character that is written using the surrogate pairs.
Let’s look at the examples of all of these methods.
1. codePointAt(int index)
jshell> String str = "Hello";
str ==> "Hello"
jshell> int codePoint_H = str.codePointAt(0);
codePoint_H ==> 72
jshell> int codePoint_l = str.codePointAt(2);
codePoint_l ==> 108
2. codePointBefore(int index)
jshell> String str = "Hello";
str ==> "Hello"
jshell> int codePoint_H = str.codePointBefore(1);
codePoint_H ==> 72
jshell> int codePoint_o = str.codePointBefore(str.length());
codePoint_o ==> 111
3. codePointCount(int beginIndex, int endIndex)
jshell> String str = "JavaString";
str ==> "JavaString"
jshell> str.codePointCount(0, str.length());
$34 ==> 10
jshell> str.codePointCount(1, 5);
$35 ==> 4
Since we don’t have a surrogate pair in our string, the codePointCount for the entire string will be the same as the length of the string.
4. offsetByCodePoints(int index, int codePointOffset)
jshell> String str = "Hello World";
str ==> "Hello World"
jshell> str.offsetByCodePoints(3, 4);
$37 ==> 7
Conclusion
We learned how to get the Unicode code point values of string characters using the codePointAt()
method. The other three methods are not used a lot unless you are working on surrogate pairs of Unicode characters.