I'm not ashamed to say that I've been more a consumer of publicly-crafted regexes over the years. All the texts on the subject have been cryptic enough, that it's something I never actually bothered looking into seriously enough, and boy-oh-boy I wish I had done this sooner!
Today, I took that step and am already on my way to enlightenment thanks to
- #regex on irc.freenode.net and ShiningThrough for his help
- http://regex101.com/ which is the best online regex tool I've come across to date (thanks ShiningThrough!)
- http://regexone.com/ interactive lessons on groking regexes
I've compiled my solutions as I make my way through these lessons as a Gist for reference and it would be great if you could share your solutions as well. Here's are a couple examples
#### Lesson 9 - http://regex101.com/r/yZ4qG2
Objective:
match 1. abc
match 2. abc
match 3. abc
skip 4.abc
Solutions:
([\d\.]+\s+[a-c]+) # matches any whitespace
([\d\.]+[" "|\t]+[a-c]+) # specifically matches spaces and tab-based whitespace
#### Lession 11 - http://regex101.com/r/hZ7kE1
Objective:
match file_a_record_file.pdf
match file_yesterday.pdf
skip testfile_fake.pdf.tmp
Solutions:
([a-z+\_?]+)\.pdf$
([a-z+\_?]+)(?=\.pdf$) # using positive lookahead
There's a whole lot more to regexes and I'm really liking what I've learned thus far, especially the power of lookarounds. Also remember regexp results will vary depending on the engine in question, i.e. whether it's Perl, PCRE, or even Vim's take on regexes, as well as its support in various languages.
Fellow rubyists should also take note that \A
and \z
— the start and end of string anchors — are to be used in Ruby-based regexes due to it differing from other languages in that it automatically uses "multiline mode" (which enables the aforementioned behaviour of having ^
and $
match per line) for regular expressions
You should at least have a read through Exploring Ruby’s Regular Expression Algorithm for details of its fascinating internals.