Arabic keyword filtering in Google Analytics

If you are anything like me you have struggled to find a clear way to view Google analytics data for just Arabic traffic.  In this post I will explain one method I use to get a better picture of the Arabic traffic coming to the sites I work on.  First though, allow me explain a common way many look to find Arabic traffic.

Those familiar with Google Analytics are aware that within the Audience Demographics section of the GA dashboard you can view visitor language statistics.  When you drill down into that section you see the Language dashboard that looks something like this.

ga languages dashboard

At first glance this appears useful.  The traffic from each language appears to be divided up nicely and in some cases it is broken down by language and country.  However, when you drill down into a specific language and then you add Keyword as the second Dimension you often get something like this.

 

arabic language dashboard

 

As you can see from this snap shot above none of the top searches that are attributed by GA as being in Arabic are actually Arabic keywords.   Why is this?  The reason is that language in Google Analytics is based your browser’s language setting and not necessarily on the language a keyword is written in.  This results in a few different problems if you are trying to use this data to understanding Arabic traffic to your site.

 

  1. Many Internet users never change their browser’s language settings.  This means that there are many Internet users whose browser setting has little to no bearing on the language they use to search.
  2. Many Internet users for some reason or another have set their browser language setting to one language but they actually do their searches in another language.
  3. Some multilingual Internet users do searches in many different languages and don’t update their language settings every time they do a search.

In light of these realities using the language demographic section of Google Analytics may not be the best way to understand trends in Arabic traffic to your site.

 

The Solution

There isn’t currently a built in way to only show keywords in a certain language within GA but there is a way to use filters to get this data in Google Analytics.

 

Filtering for only queries with the Arabic script in them

Using some custom filters it is possible to segment a majority of your Arabic search traffic so you can start to analyze traffic that comes for Arabic search queries better.  The way we can accomplish this is through some relatively simple regular expressions and the of correct filter settings.  Below is a three step approach to getting this data.

 

Step 1 – Select search traffic overview

Within the Traffic Sources section select Search and then Overview.  Once this dashboard appears, set the Primary Dimension to Keyword.

search traffic overview

 

Step 2 – Create an Arabic script Regex filter

Those of you who aren’t familiar with regular expressions might want to check out this site to learn more if you are interested.  Here however, I will just be using one regular expression character multiple times to accomplish my task.  The character I will use in the pipe symbol “|” which means “this or that”.  The rest of the characters I will use are Arabic letters.  The first thing I did was to type out the Arabic alphabet like I have done below.

 

ي و  ه  ن  م  ل  ك  ق  ف  غ  ع  ظ  ط  ض  ص  ش  س  ز  ر  ذ  د  خ  ح  ج  ث  ت  ب  أ

 

Then I replace the spaces with pipe symbols like below.

 

ي |و |ه |ن |م |ل |ك |ق |ف |غ |ع |ظ |ط |ض |ص |ش |س |ز |ر |ذ |د |خ |ح |ج |ث |ت |ب |أ

 

This is the main regular expression that can be used to filter for Arabic script keywords.

 

Step 3- Input the Arabic regular expression into Google Analytics

Once you have this Arabic regular expression created, you will need to add it to the filter section within the dashboard.  Click the Advanced Filter section to edit a filter.  Then select “Include” and select “Keyword” as a Dimension.  Select “Matching RegExp” from next drop down menu to the right of the Dimension and then paste the Arabic script regular expression into the input box.  Once this is complete click Apply and wait for Google to process the filter.

arabic keyword traffic

There you have it! You will now only see keyword search traffic that contains Arabic letters in the keywords.

How it works

In essence the way this filter works is by telling the GA interface that you only want to see search traffic that contains any Arabic letter in the Arabic alphabet.  So long as a keyword has an Arabic letter in it will show up under this filter.

 

One step further – Filtering out keywords from other Arabic script languages

Since there are some other languages that use the Arabic script you could potentially see some of this traffic show up when you use the above filter.

However, if you want to try to filter out some of this traffic, there are mainly just three sizable languages that use the Arabic script other than Arabic itself.  These main three languages are Farsi  with 110 million speakers, Pasto with 50 million speakers, and Urdu with 66 million speakers.  Though the number of speakers of these languages is relatively high, the number of Internet users who use these languages to search regularly is much smaller.

Regardless of this fact, I have created a regular expression that contains letters from these three languages that aren’t used in Arabic.  We can then use this regular expression to exclude keywords that contain these letters from our filtered results.

 

ے|ڈ|ژ|ﭺ|پ|ځ‎|‫ږ‎|‫څ| ‎ړ‎

 

Now to implement it.   In the GA dashboard just add a second Dimension of Keyword below your initial filter with the same settings as the first filter except this time you want to chose Exclude instead of Include.  Paste the regular expression into the input box and click Apply again.

arabic script traffic without farsi traffic

There you go!  You should now see many of the keywords (if there are any) from other Arabic script languages filtered out.

As a disclaimer if there are keywords in these other languages that only contain letters from the Arabic alphabet then these keywords will still show up in this filtered traffic as well.  Since Google Analytics should used to understand trends as opposed to understanding exact site traffic numbers you should find enough relevant data regardless of this issue.  This trend data should help you make better decisions about your Arabic marketing efforts online.

Do you have any ideas on how to better filter Arabic keyword traffic in Google Analytics?  I would love to here your thoughts.