Disclaimer
This post is not a comprehensive manual about regular expressions. It’s just a genuine attempt to provide the most to-the-point crash course for SEO experts who want to quickly get the idea of regular expressions and start using them for real life tasks.
First Things First
Before we go any further, let’s see what regular expressions actually are. The official definition goes as follows:
Regular expressions are special characters that match or capture portions of a field, as well as the rules that govern all characters.
Long story short, regular expressions are just special characters that match certain words, digits or signs. That may sound really difficult, but it’s not rocket science if you take a closer look at it.
What You Need Regex For
As an SEO, you will have to use regex (regular expressions) in all sorts of cases, but you’ll be able to use most of their power when dealing with Google Analytics. You may want to use them for creating filters, goals, and fine-tuning your funnel steps.
Now you’ll learn the basics of regular expressions and the ways you can use them in practice. I’ll use my site’s Google Analytics account (WebDesy.com) in the tutorial.
I’ll touch on one metacharacter at a time and show what you can use it for.
1. Backslash
The backslash special character in regular expressions allows you convey literal meaning of characters. Let’s say you want to find all dynamic URLs in your Google Analytics account. Since dynamic URLs use question marks, your first guest may be to use a question mark to find all your dynamic URLs. So, you need to login to your Google Analytics account, go to the Content section and select Overview.
2. Select View Full Report
3. Click The Advanced Option On The New Page
4. Go To Second Drop-down Option And Select Matching RegExp.
5. Start Using It
That done, you can use regular expression to find specific pages. Like we’ve decided above, we’ll just search for a question mark in order to find all dynamic pages of the site. So, just type in a question mark and click apply.
6. Error Message?
But you’re going to see an error message, because a question mark plays a different role in regular expressions. You need to show that you don’t want that special role for your question mark.
7. Remember, Backslash!
That’s exactly where a backlash comes into play. If you need to match a literal meaning of a character rather than what function it performs in regex (regular expressions), you need to use a backslash before that character. If you add a backslash before that question mark and click Apply one more time.
8. Dynamic URLs. Oh Yes.
You’ll see a list of your dynamic URLs (those with question marks in them). So, now it works.
9. Pipe
The pipe | special character allows to define alternatives. For example, “blog|wordpress” means that you want to find URLs containing either “blog” or “wordpress“. So, just type in “blog|wordpress” and hit the Apply button.
10. Bingo!
And as you may have guessed, that’s exactly what you’ll see now. URLs containing either blog or wordpress.
That’s not a very practical usage, but you get the point.
11. The Question Mark
The question mark ? special character means that the character it goes after is optional. For example, you may need to search your Google Analytics keywords report for the singular and plural forms of a specific keyword. So, you would use it as follows: “keywords?” (since ? goes after s, it means the s is optional). Imagine that your keyword is tutorial and you want to find both its singular and plural forms. For starters, you need to go to the Traffic Sources -> Overview section.
12. Click View Full Report
13. Advanced Search (Again)
As in the case with the backlash, click advanced (search) and select Matching Regexp. Having done that, you need to enter the following: “tutorials?” That means you want a report with the tutorial keyword, but it’s also OK with you if it shows a plural form of the word (though it’s optional). That translates to the following behavior. It will search for both tutorial and tutorials. If the plural form is not found, it will still work and won’t give any error messages. Hit the Apply button and take a look at the result.
14. Parentheses
In the previous example, you learned how to make a single character optional (just put a question mark after it). But what if you want a few characters optional? You just need to enclose your optional characters in parentheses ().
Let’s return to the Content -> Overview section and select View Full Report. Now click advanced and select Matching RegExp (refer to the previous portion of the tutorial for more info). Type in “web(site)” and hit Apply.
It’ll bring up both results with “website” and “site” because the regular expression was “web(site)“.
15. Square Brackets
Square brackets [ ] are usually used to specify a range of characters (words or digits for the most part). If you have an online store and you have products with the following IDs: product1, product2, product3, etc…, you may want to select all such pages in your Google Analytics report. Searching for such pages one by one would be really daunting and just waste of time. You can just use the following regular expression for that purpose: product[1-9]. It will match all the IDs that start with ‘product’ and have numbers in the end.
16. Braces
Braces { } repeat the last “piece” of information a specific number of times. Say, you need to find URLs that have 4 digits. You may search for it with the following regular expression: [0-9][0-9][0-9][0-9] But that looks a bit like overkill. Instead of repeating [0-9] 4 times like this [0-9][0-9][0-9][0-9], you can just put is as follows: [0-9]{4}. So, the value in braces defines how many times the stuff that precedes it should be repeated.
As you can see on this screen shot, it surely works.
17. The Dot
A dot matches any one character. You may use it if you want to select a certain range of IP addresses: 123.45.67.25. The dots with backslashes just mean that they should be used as literal dots (see the section about backslashes if that does not seem to make sense for you). The last dot though is sort of a placeholder that matches any one character. Since IP addresses use digits only (0 through 5) in that particular location, it will match 123.45.67.250, 123.45.67.251, 123.45.67.252, 123.45.67.253, 123.45.67.254б 123.45.67.255.
18. Plus Sign
A plus sign matches one or more of the previous items. You can use it in combination with a dot to define that you’re searching for one character or more: “.+” It will match a, aa, aaa, etc.
19. Star
It’s much like a dot, but it will also match if there’s no character at all. For example, “.*” will match no character at all or a, aa, aaa, etc.
20. Caret
A caret will match everything that starts the same way as your regex does. For example, if you want to match all the posts that start with “/parallax“, you need to use the following regular expression: “^/parallax“.
Now you can click Apply and you’ll see a list of URLs that start with “/parallax“.
21. Dollar Sign
You need to put a dollar sign to show what you want your keyword (URL, etc) to end with. You may want to find all your pages that have names ending with “form/“. In that case, you need to use the following regular expression: “form/$“.
22. The Conclusion
It goes without saying that Google Analytics is a great tool as is, but you can get more info out of it if you know tricks like regular expressions. This sort of skill allows you to really fine-tune both your internal and external link profiles, plus it won’t cost you anything extra.
Leave a Reply