Java Regex is including new line in match
I'm trying to match a regular expression to textbook definitions that I
get from a website. The definition always has the word with a new line
followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has
from thirty to forty wires strung across a shallow sounding board which
lies horizontally on a table before the performer who uses both hands in
playing on it Not to be confounded with the old lute shaped cittern or
cithern
In my attempts to get just the word (in this case "Zither") I keep getting
the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe
^(\S+)$ would work, but that doesn't seem to successfully match the word
at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri;
which seems to successfully match all my attempts the way I want, despite
the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition
taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like
it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!
No comments:
Post a Comment