Java: N-gram generation

Eagarly: Generating a list of n-grams

public static List<String> ngrams(int n, String str) {
    List<String> ngrams = new ArrayList<String>();
    for (int i = 0; i < str.length() - n + 1; i++)
        ngrams.add(str.substring(i, i + n));
    return ngrams;
}

Example: ngrams(3, "abcde") = ["abc", "bcd", "cde"].

Lazily: Creating an iterator

import java.util.Iterator;
class NgramIterator implements Iterator<String> {
    private final String str;
    private final int n;
    int pos = 0;
    public NgramIterator(int n, String str) {
        this.n = n;
        this.str = str;
    }
    public boolean hasNext() {
        return pos < str.length() - n + 1;
    }
    public String next() {
        return str.substring(pos, pos++ + n);
    }
}

Example:

new NgramIterator(3, "abcde")
        .forEachRemaining(System.out::println);

prints

abc
bcd
cde

Comments

Be the first to comment!