Strings in Java

Yuvaraj
7 min readJul 19, 2024

--

A string is a set of one or more characters enclosed in double quotes or simply a string is an array of characters. In Java, Strings are a sequence of characters which are objects of the String class, which is part of the java.lang package.. The Java platform provides the String class to create and manipulate strings.

Strings in Java are immutable, meaning that once a string object is created, its value cannot be changed. Any operation that modifies a string will result in the creation of a new string object.

String str = “Hello”;
str = str + “, World!”; // str now references a new string “Hello, World!”

Creating Strings

There are multiple ways to create strings in Java:

  • Using String Literals:

String str = “Hello, World!”;

  • Using the new keyword:

String str = new String(“Hello, World!”);

  • Using a character array:

char[] charArray = {‘H’, ‘e’, ‘l’, ‘l’, ‘o’};
String str = new String(charArray);

When a string literal is created, the Java virtual machine (JVM) checks the string pool to see if it already exists. The string constant pool is a memory area where strings are stored. If the value exists, the string primitive will occupy the existing value. If the value does not exist, the JVM creates a new string and adds it to the pool.

The string constant pool lives inside the memory heap. When an object string is created, the part between the double quotes goes into the string constant pool. The variable assigned to the string is stored in the stack and matched to the string inside the pool. All object values are stored in the heap, including string literals, which are stored in the string constant pool inside the heap.

Stack & Heap Memory
// Java Program to demonstrate
// String
public class StringExample {

// Main Function
public static void main(String args[])
{
String str = new String("Medium");
// creating Java string by new keyword
// this statement create two object i.e
// first the object is created in heap
// memory area and second the object is
// created in String constant pool.

System.out.println(str);
}
}

// Output:

// Medium

Common String Methods

Java provides many methods for manipulating strings. Here are some of the most commonly used ones:

  • Length of a String:
int length = str.length();
  • Concatenation:
String str1 = "Hello";
String str2 = "World";
String str3 = str1.concat(", ").concat(str2);
  • Character Extraction:
char ch = str.charAt(0); // Gets the first character
  • Index of a Character or Substring:
int index = str.indexOf('o'); // 4
int index2 = str.indexOf("World"); // 7
  • Comparison:
boolean equals = str1.equals(str2); // false
int comparison = str1.compareTo(str2); // Compares lexicographically
  • Case Conversion:
String lower = str.toLowerCase();
String upper = str.toUpperCase();
  • Trimming Whitespace:
String trimmed = str.trim();

StringBuffer and StringBuilder

For situations where you need to modify strings frequently, Java provides StringBuffer and StringBuilder classes. These classes create mutable strings.

StringBuffer:

StringBuffer sb = new StringBuffer("Hello");
sb.append(", World!");

StringBuilder:

StringBuilder sb = new StringBuilder("Hello");
sb.append(", World!");

Both classes offer similar functionality, but StringBuilder is generally preferred when thread-safety is not a concern, as it is faster due to the lack of synchronization.

String Pool

Java maintains a string pool for optimization, where it stores string literals. When you create a string using a literal, the JVM first checks the pool. If the string already exists, it returns the reference to the pooled instance. Otherwise, it adds the new string to the pool.

String str1 = "Hello";
String str2 = "Hello";
boolean same = (str1 == str2); // true, both refer to the same instance in the pool

Example Usage

Here’s a practical example demonstrating some of the concepts mentioned:

public class Main {
public static void main(String[] args) {
String greeting = "Hello, World!";
System.out.println("Length: " + greeting.length());
System.out.println("Character at index 1: " + greeting.charAt(1));
System.out.println("Substring: " + greeting.substring(7, 12));
System.out.println("Uppercase: " + greeting.toUpperCase());

StringBuilder sb = new StringBuilder("Hello");
sb.append(", World!");
System.out.println("StringBuilder: " + sb.toString());
}
}

/* Output:

Length: 13
Character at index 1: e
Substring: World
Uppercase: HELLO, WORLD!
StringBuilder: Hello, World! */

Note: Choosing meaningful variable names is a good practice that enhances the understandability and maintainability of your code.

Strings and Character Encodings

A string in Java is a sequence of characters. Characters themselves are abstract representations of symbols, letters, digits, etc. For a computer to store and manipulate these characters, they need to be encoded into bytes. This is where character encodings come into play.

  • Strings in Java: Represent sequences of characters using the String class, internally using UTF-16 encoding.
  • Character Encodings: ASCII, UTF-8, UTF-16LE, and UTF-16BE are methods to encode characters into bytes. ASCII is limited to 128 characters, while UTF-8 and UTF-16 can represent all Unicode characters.
  • Encoding and Decoding: Java provides methods to convert strings to various encodings and back, enabling support for internationalization and interoperability.

Character Encodings Overview

  1. ASCII:
  • Represents characters using 7 bits (1 byte with the leading bit as 0).
  • Can represent 128 characters including control characters (like newline, tab), numbers, English letters, and some punctuation.
  • Example: The character ‘A’ is represented as 65 in ASCII (01000001 in binary).

2. UTF-8:

  • A variable-width encoding that uses 1 to 4 bytes.
  • Compatible with ASCII for the first 128 characters (0–127), where each ASCII character is represented by a single byte.
  • Can represent all Unicode characters.
  • Example: ‘A’ is represented as 65 in UTF-8 (same as ASCII), while ‘€’ (Euro sign) is represented as three bytes: 0xE2, 0x82, 0xAC.

3. UTF-16LE and UTF-16BE:

  • Use 2 or 4 bytes to represent characters.
  • LE (Little Endian) means the least significant byte comes first.
  • BE (Big Endian) means the most significant byte comes first.
  • Example: ‘A’ is represented as 0x0041 in both UTF-16LE and UTF-16BE, but in UTF-16LE it would be stored as 0x41 0x00 and in UTF-16BE as 0x00 0x41.

String Representation in Java

In Java, the String class represents a sequence of characters internally using UTF-16 encoding. Each character in a Java String is a char type, which is a 16-bit Unicode character.

Example of a String in Java

public class Main {
public static void main(String[] args) {
String text = "Hello, World!";

// Display the string
System.out.println("Original String: " + text);

// Display the internal representation in UTF-16
for (char ch : text.toCharArray()) {
System.out.format("Character: %c -> Code: %04x\n", ch, (int) ch);
}
}
}

/* Output:

Original String: Hello, World!
Character: H -> Code: 0048
Character: e -> Code: 0065
Character: l -> Code: 006c
Character: l -> Code: 006c
Character: o -> Code: 006f
Character: , -> Code: 002c
Character: -> Code: 0020
Character: W -> Code: 0057
Character: o -> Code: 006f
Character: r -> Code: 0072
Character: l -> Code: 006c
Character: d -> Code: 0064
Character: ! -> Code: 0021 */

Encoding a String

Java provides methods to convert strings to and from different encodings using the String.getBytes method and the String constructor.

To convert a string to a specific byte encoding:

import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
public static void main(String[] args) throws Exception {
String text = "Hello, World!";

// ASCII encoding
byte[] asciiBytes = text.getBytes(StandardCharsets.US_ASCII);
System.out.println("ASCII: " + Arrays.toString(asciiBytes));

// UTF-8 encoding
byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8);
System.out.println("UTF-8: " + Arrays.toString(utf8Bytes));

// UTF-16LE encoding
byte[] utf16leBytes = text.getBytes(StandardCharsets.UTF_16LE);
System.out.println("UTF-16LE: " + Arrays.toString(utf16leBytes));

// UTF-16BE encoding
byte[] utf16beBytes = text.getBytes(StandardCharsets.UTF_16BE);
System.out.println("UTF-16BE: " + Arrays.toString(utf16beBytes));
}
}


/* Output:

ASCII: [72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33]
UTF-8: [72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33]
UTF-16LE: [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33, 0]
UTF-16BE: [0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33] */

Decoding a String

The String constructor is used to convert a byte array back into a string using the specified charset.

To convert a byte array back to a string using a specific encoding:

import java.nio.charset.StandardCharsets;

public class Main {
public static void main(String[] args) throws Exception {
// UTF-8 byte array representing "Hello, World!"
byte[] utf8Bytes = {72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33};

// Decode from UTF-8
String decodedString = new String(utf8Bytes, StandardCharsets.UTF_8);
System.out.println("Decoded String: " + decodedString);
}
}


/* Output:

Decoded String: Hello, World! */
  • Encoding a String: The getBytes method of the String class is used to convert the string into a byte array using the specified charset.
  • StandardCharsets.US_ASCII for ASCII encoding.
  • StandardCharsets.UTF_8 for UTF-8 encoding.
  • StandardCharsets.UTF_16LE for UTF-16 Little Endian encoding.
  • StandardCharsets.UTF_16BE for UTF-16 Big Endian encoding.
  • Decoding a String: The String constructor is used to convert a byte array back into a string using the specified charset.

Encode and Decode Strings in Different Formats in Java:

import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
public static void main(String[] args) throws Exception {
String text = "Hello, World!";

// ASCII
byte[] asciiBytes = text.getBytes(StandardCharsets.US_ASCII);
System.out.println("ASCII: " + Arrays.toString(asciiBytes));

// UTF-8
byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8);
System.out.println("UTF-8: " + Arrays.toString(utf8Bytes));

// UTF-16LE
byte[] utf16leBytes = text.getBytes(StandardCharsets.UTF_16LE);
System.out.println("UTF-16LE: " + Arrays.toString(utf16leBytes));

// UTF-16BE
byte[] utf16beBytes = text.getBytes(StandardCharsets.UTF_16BE);
System.out.println("UTF-16BE: " + Arrays.toString(utf16beBytes));

// Decoding back to string
String decodedUtf8 = new String(utf8Bytes, StandardCharsets.UTF_8);
System.out.println("Decoded UTF-8: " + decodedUtf8);
}
}


/* Output:

ASCII: [72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33]
UTF-8: [72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33]
UTF-16LE: [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33, 0]
UTF-16BE: [0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33]
Decoded UTF-8: Hello, World!

*/

This code snippet demonstrates how to encode a string in different formats and then decode it back to the original string in Java.

--

--