 |
OVERVIEW
STRATEGY
"REGEXP"
THE FILTERS
LINKS
B
A
C
K
|
REGULAR EXPRESSIONS - "regexp (case insensitive)"
As Used in Eudora 5.1 and later for Windows
Regular Expressions are also available in Eudora for
Macintosh starting in Version 6.
"Regular expressions" are used to match patterns of
characters in many programming languages. There are various
implementations of regular expressions, and Eudora's is based on the POSIX
implementation. Regular expressions use a set of speci al
characters and notation to allow functions such as wildcard characters,
character set substitutions, the logical "or" operator, grouping of
characters or expressions into sub-expressions, and searching from the
beginning or end of a line. The ability to match patterns is a powerful
tool when creating email filters, and regular expressions allow the
creation of complex and effective filtering rules.
There are two forms of the regular expression verb in
Eudora:
• "matches regexp"
• "matches regexp (case insensitive)"
These verbs are found in the Eudora Filters window, in the
drop-down list where you assign the relationship between an email header
and your text search string(s). The "matches regexp" is case
sensitive. I've heard rumors that it's buggy but I haven't tested it much
yet. So far I've used the "matches regexp (case insensitive)"
option exclusively, but I'm beginning some testing of filter rules using
case sensitivity.
|
These are the characters with special powers when used
with regular expressions:
. ? * + | \ [ ]
{ } ( ) ^ $
Because they are special, you can't search for them by
themselves. To search for one of the special characters, you must put
the \ (slant-bar) "escape" character in front of it. For
example, if you want to search for a literal period " . " you
would put "slant-bar period" like this " \. " To find a dollar
sign you would search for "\$". And to find a slant bar, use "\\".
The minus " - " is
special only inside the square brackets and only if it is between
other characters "[a-z0-9 ]". |
|
| |
regular expressions - the
Special Characters
|
| |
|
|
. |
. Period - A
"wildcard", used to represent any one character including spaces.
Especially handy with a multiplier after it (asterisk, question mark or
plus sign), to find zero, one or many of any unspecified characters.
Example1: ".1" - Finds "a1"
or "B1" or "c1" or " 1" (<sp>1) etc.
Example2: "1.3" - Finds "123" or 1z3" or "1 3"
or "1\3".
Example3: "123.*567" - Finds "123567" or "1234567"
or "123lots of stuff can go here when used with an asterisk!567"
Weirdness* (Tested only in v5.1)
When searching for non-alphanumeric characters (punctuation marks and
white space):
- a single non-alphanumeric character will be found by
one, two or three periods
- two sequential non-alphanumeric characters will be
found by two, three or four periods
|
| |
|
|
? |
? Question Mark - A
multiplier for the previous character, character [set] or (group|sub-expression).
It will match zero or one of the previous entity.
Example1: "Web-?Site" - Finds "Website" or
"Web-Site"
Example2: "Clicki?n?g?" - Finds "Click" or "Clicking"
or "Clicki" or "Clickin"
Example3: "Click(ing)? Here" - Finds "Click Here" or "Clicking
Here".
Example4: "Click ?Here" - Finds "Click Here" or "ClickHere".
|
| |
|
|
* |
* Asterisk - A multiplier
for the previous character, character [set] or (group|sub-expression). It
will match zero or more of the previous entity.
Example1: "12*3" - Finds "123" or "13"
or "1223" or "12222222223"
Example2: ".*\.com" - Finds ".com" and also "anything.com",
"aol.com" or "www.fountainofspam.com".
|
| |
|
|
+ |
+ Plus Sign - A multiplier
for the previous character, character [set] or (group|sub-expression). It
will match one or more of the previous entitys, but not zero.
Example1: "12+3" - Finds "123" or "1223"
or "1222222223" etc.
Example2: "http://[0-9]+\.[0-9]+\.0-9]" - Finds "http:1.2.3"
or "http://123.45.678" etc.
|
| |
|
|
| |
| "OR" - Used between characters, words or
phrases to find "one or the other"
Example1: "This|that|the other"
Example2: "This and (that|the other)
Notes* Eudora's regexp Help page state that the parenthesis are
required with the "or" symbol - but that is not correct. The parenthesis
are useful as shown above, but they are not required. Mind your
spaces when using the "or" verb as they are included in the
search.
The "|" character is located on your keyboard above the "\", (it may appear
on the keyboard to be split in the center) - so to make this character just press <shift>
and <backslash> "\".
(The ASCII value of "|" is Dec 124, Hex 7C)
|
| |
|
|
\ |
\ Backslash - The "escape"
character - it foils the special powers of any other special character
immediately following it, rending it "ordinary" . Makes it possible to
search for the special characters.
Example1: "123\.456" - Finds 123.456
Example2: "123\\456" - Finds 123\456
Example3: "attached-file\.(com|exe|bat)" - Finds
"attached-file.com" or "attached-file.exe" or "attached-file.bat"
|
| |
|
|
[ ] |
[ ] Brackets - used to create a
[set] of characters, from which we will find one and only one
item {unless told otherwise}. The "minus" sign takes
on special meaning inside the brackets if it is between other characters,
forming a range of characters to include in the search, such as "[a-z]"
or "[1-9]". The minus sign is normal if it is the first or last
character in the brackets. The "caret" sign "^" when placed
first in the brackets changes the meaning inside the brackets, to "not
this" or "not these", for alphanumeric characters only.
The caret sign ^ has NO special meaning if not in the first position. The
other special characters are also all stripped of their special powers
when place inside the brackets, and become ordinary. .
Example1: [aeiou] - Finds any one occurrence of
an "a" or "e" or "i" or "o" or "u".
Example2: [howdy ] - Finds any one occurrence of "h" or "o"
or "w" or "d" or "y" or a <space>.
Example3: [0-9a-z] - Finds any number "0" to"9" or
any letter "a" through "z" or "A" through "Z"
(with case insensitive search).
Example4: [$0-9]{5} - Finds any five sequential "$" signs
and/or numbers "0" through "9".
Example5: 123[^A-F]456 - Finds "123(any one thing except A
through F)456"
Example6: "[A-Z]<!--" - Finds words broken by HTML comment tags.
For example:
"S<!-- haha -->EX" or "Nor<!-- html comment -->ton
Antivirus".
Weirdness* The caret sign seems either not to
work or works erratically? for negating punctuation marks or white space
characters, and sometimes won't work for alphanumerics if there are
adjacent non-alphanumeric characters.(V5.1) I haven't quite figured out
what the rules are for this yet.
|
| |
|
|
{ } |
{ } Squiggly Brackets - Put
a number {2} or a range of numbers {1,5} between them to
specify exactly how many of the previous character or group you wish to
find together (sequentially).
Example1: "ABC{4}D" - Finds "ABCCCCD".
Example2: "(Http:.*){3}" - Finds any three occurrences of "Http:"
separated by anything (because of the period-asterisk wildcard combination
included in the parenthesis).
Example3: "(Http:){3} - Finds "Http:Http:Http:"
Example4: "\$.?.?.,?[0-9]{3}" - Finds "$1000" or "$1,000"
or "$22,000" or "$399,456,789".
Example5: "ABC{2,4}D" - Finds "ABCCD" or "ABCCCD" or
"ABCCCCD" but not "ABCD".
|
| |
|
|
( ) |
( ) parenthesis - used to
make a group of things or a sub-expression. Works well with the "|"
(or) symbol. Groups and sub-expressions can be used anywhere a single
character could be used in a regular expression, and repeated
sub-expressions using the multipliers " ?*+
" are allowed.
Example1: "This and (that|the other)"
Finds "This and that" or "This and the other".
Example2: "123( optional words )?456" - finds "123456",
or "123 optional words 456"
Example3: "123( )*ABC" - finds "123(
any number of spaces )ABC"
|
| |
|
|
^ |
^ Caret - (When not in
square brackets) Represents the start of the line - the character
following it must be the first character of a line. When used as the first
character in square brackets, the Caret means "[^not these]".
Example1: "^Adv:" - If applied to the Subject
Header will find email with subjects starting with "Adv:".
Example2: "^<X-html>" - Finds "<X-html>" but only if it is
at the beginning of a line.
NOTE* Eudora treats each header as one line, and
the entire body of an email message as one line.
|
| |
|
|
$ |
$ Dollar sign - Represents
the end of the line. The character preceding it will be the last character
on a line if it's been found.
Example: "^A.*Z$" If applied to any
header, will look for any header whose first character is an "A"
and whose last character is a "Z".
NOTE* Eudora treats each header as one line, and
the entire body of an email message as one line.
|
| |
|
|
- |
- Minus sign or
Hyphen - When used inside the square brackets [ ] and between two
alpha or numeric characters it denotes a range of characters to search
for. Otherwise it is treated as normal.
Example1: [a-z] - Finds any one letter "a"
through "z" or "A" through "Z" (with case-insensitive
match)
Example2: [0-9a-z] - Finds any one number "0" to"9"
or any letter "a" through "z" or "A" through "Z"
Example3: [-a-z] - Finds any one "-", "a" through "z"
or "A" through "Z"
Example4: [0-9-] - Finds any one number "0" through "9"
or a hyphen.
|
| |
|
| |
The following "character class" sets are written in
lower case only.
This way: "[[:alpha:]]" , and not like this:
"[[:ALPHA:]]" |
|
| |
|
|
[[:alpha:]] |
[[:alpha:]] - Represents any one alphabet
character; same as "[a-z]". Other characters may be included in the
search by placing them within the outer brackets.
[^[:alpha:]] matches one non-alpha character.
|
| |
|
|
[[:digit:]] |
[[:digit:]] - Represents any one number
character; same as "[0-9]". Other characters may be included in the
search by placing them within the outer brackets.
[^[:digit:]] matches one non-numeric
character.
Example1: "abc[[:digit:]]def" - Finds "abc1def"
or "abc7def" etc.
Example2: "a[[:digit:]!@$]b" - Finds "a1b" or "a!b"
or "a@b" or "a$b".
|
| |
|
|
[[:blank:]] |
[[:blank:]] - Represents a <space> or
<tab>. Other characters may also be included in the search by placing
them within the outer brackets.
[^[:blank:]] matches one non-blank character.
|
| |
|
|
[[:punct:]] |
[[:punct:]] - Represents any one punctuation
character. If it's a character you can see, and it's not [A-Z] or
[ 0-9], this probably gets it. (Does not catch space or tab). Other
characters may be included in the search by placing them within the outer
brackets.
[^[:punct:]] matches one non-punctuation character.
Example1: "123[[:punct:]]456" - Finds "123!456"
or "123&456" or "123@456".
|
| |
|
|
[[:space:]] |
[[:space:]] - Represents any one whitespace
character - space, tab, carriage return, linefeed. Other characters may be
included in the search by placing them within the outer brackets.
[^[:space:]] matches one non-space character.
Example1: "123[[:space:]]456" - Finds "123 456"
or "123<tab>456" or "123<cr>456". But will not find "123<cr><lf>456".
Example2: "Click([[:space:]]{3})?Here" - Finds "ClickHere"
or "Click Here" or "Click<sp><cr><lf>Here" etc.
|
| |
|
|
[[:graph:]] |
[[:graph:]] - Finds any one displayable
character.
[^[:graph:]] matches one non-displayable
character.
Example1: "123[[:graph:]]456" - Finds "123 456"
or "123@456" or "123A456" etc.
|
| |
|
|
[[:cntrl:]] |
[[:cntrl:]] - Matches any one non-printable
character such as carriage return or linefeed.
[^[:cntrl:]] matches one printable character.
Example1: "Hello[[:cntrl:]]{2}Bye!" - Will find:
"Hello
Bye!"
|
| |
|
|
[[:alnum:]]
|
[[:alnum:]] - Finds any one alpha or numeric
character. Same as "[0-9a-z]".
[^[:alnum:]] matches one non-alphanumeric character.
|
| |
|
|
[[:xdigit:]]
|
[[:xdigit:]] - Finds any one hexadecimal
character "0123456789ABCDEF".
[^[:xdigit:]] matches one non-hexadecimal character.
|
| |
|
|
\< |
Doesn't Seem to Work
But according to the Eudora help page it represents
"the start of a word."
|
| |
|
|
\> |
Doesn't Seem to Work
But according to the Eudora help page it represents
"the end of a word."
|
| |
|
|
|
Note* "contains", "doesn't contain", "regexp", and
"regexp(case insensitive)" will all search for individual characters,
groups or words within other larger words or character strings, so keep
this in mind in choosing your search terms. Searching for the word "sex"
for example will also find "Essex" or "heterosexual" or "sextant".
Phrase searches can run into a similar trap: Searching for "the other"
will also locate "Team Grenthe otherwise won the match", for
example.

|