Java: N-gram generation
Eagarly: Generating a list of n-grams
public static List<String> ngrams(int n, String str) {
List<String> ngrams = new ArrayList<String>();
for (int i = 0; i < str.length() - n + 1; i++)
ngrams.add(str.substring(i, i + n));
return ngrams;
}
Example: ngrams(3, "abcde")
= ["abc", "bcd", "cde"]
.
Lazily: Creating an iterator
import java.util.Iterator;
class NgramIterator implements Iterator<String> {
private final String str;
private final int n;
int pos = 0;
public NgramIterator(int n, String str) {
this.n = n;
this.str = str;
}
public boolean hasNext() {
return pos < str.length() - n + 1;
}
public String next() {
return str.substring(pos, pos++ + n);
}
}
Example:
new NgramIterator(3, "abcde")
.forEachRemaining(System.out::println);
prints
abc
bcd
cde
Comments
Be the first to comment!