Finding words |
A common problem when processing incoming text is to isolate the words in the text. This is made more difficult by the punctuation; words have commas, ``quote marks", (even brackets) next to them, or hy-phens in the middle of the word. This punctuation doesn't count as letters when the words have to be looked up in a dictionary by the program.
For this problem, you must separate out ``clean" words from text, that is, words with no attached or embedded
non-letters. A ``word" is any continuous string of non-whitespace characters, with whitespace characters on
each side of it. For this problem, a ``whitespace" character is a space character or an end-of-line character,
or the start or end of the file (so that, for example, if the input file consists of `Anne Bob', where there is a
space character between the A and B but no other, then there are two words, `Anne' and `Bob').
There is a special rule for a hyphen (`-') when it is the very last character in a line:
A common problem when processing incoming text is to isolate the words in the text. This is made more difficult by the punctuation; words have commas, "quote marks", (even brackets) next to them, or hy- phens in the middle of the word. This punctuation doesn't count as letters when the words have to be looked up in a # dictionary by the 12345 "**&! program. #
A common problem when processing incoming text is to isolate the words in the text This is made more difficult by the punctuation words have commas quote marks even brackets next to them or hyphens in the middle of the word This punctuation doesnt count as letters when the words have to be looked up in a dictionary by the program