Regular Expressions

Regex, short for regular expressions, are a convoluted yet concise way to parse and do things with strings. They can be used to do very fancy, precise searches in files. Many text editors which target programmers support searching using regex, though it may require changing a setting somewhere. They can also be used if you know how a string is formatted and need to capture one specific part of it, like when you are parsing a well-formatted input file. Note they may be case sensitive by default.

Examples

Despite that complicated regex can look like complete nonsense, simple regex are actually quite reasonable. If you want to search for “color” or “colour” in a file, you might do two separate searches. With regex, you could simply search for “colou?r”, and it would find both of them - the “?” indicates that the preceding character (u) can appear 0 or 1 time.

Similarly, if you want to search for “blaaahhh”, but don’t know how many a’s will appear, you could search for “bla+h”, which indicates that at least one “a” will be in the results, but will also accept strings with multiple a’s. We might even want to search for “bla+h*”, which indicates that there can be any number of h’s, from 0 to an unlimited number. Strings that would match this search include “bla”, “blaaaaahhhhh”, and anything in between.

Parsing

Using regex for parsing in code is very neat, though I never really learned how to use it. They might be handy in the online programming screens when applying for software jobs, which sometimes involve annoying string parsing. The syntax will vary by language, but it’s probably going to rely on things called capture groups, which are indicated by parentheses.

Wildcards

Wildcards are very similar to regular expressions, though generally simpler and less standardized. The context I use them in most often is to select multiple files or folders in a terminal. The main wildcards are *, which matches any string, and ?, which matches any single character.

To actually search for a * or ? character, put a backslash (\) before it. Also, [ab]* would match anything that starts with a or b. [a-c]* would match anything starting with a, b, or c.

Here are some examples of how to use this in a terminal/command line. If you have not yet used a terminal, this bit is likely to be confusing and irrelevant.

If you want to list all of your files, you typically type ls. If you just want to check for a certain file, you can type ls file_name.txt. With wildcards, you can view all files whose names end in .txt by typing ls *.txt. The * can go anywhere in the string, so you could also do ls page_*.txt to list all files that start with page_ and end in .txt.

Of course, this wildcard can be used with commands other than ls. If you want to move several files of C code to another folder but leave some reference material behind, you can do mv *.c FolderName/. Or just move everything with mv * FolderName/.


Further Reading

RegexOne, a tutorial

learn-regex, another tutorial

RegExr, if you prefer to learn by experimenation

On Wildcards