Of course you don't have to use PHP, or MySQL for that matter. But it's my method of choice for most web apps, and it's also a pretty common one. The general gist is to do a lookup on a database of geographical locations for IP addresses, having taken your visitor's IP address from the PHP superglobal array $_SERVER. Yes there are caveats: the database is not 100% complete/accurate, and some ISPs (like AOL!) use proxies across different countries so the user will appear to come from somewhere other than their true country of origin. Boo hoo, let's do it anyway; according to MaxMind, their free(!) GeoLite Country database is 99.3% accurate, and their licensed version, 99.8%.

The database is released monthly in CSV format, so I'll have to import it into MySQL using mysqlimport, or LOAD DATA INFILE. I prefer the first option. Those of you that are MySQL fans probably know that there is a CSV storage engine available, but that's only in version 5.1 which is still in Release Candidate stage, so I'll stick to mysqlimport.

Download the GeoLite database from Maxmind, extract the CSV file and rename it to something more convenient; mysqlimport uses the filename for the name of the MySQL table it imports into:

root@jim-desktop:/home/jim/data# wget http://www.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
root@jim-desktop:/home/jim/data# unzip GeoIPCountryCSV.zip
root@jim-desktop:/home/jim/data# mv GeoIPCountryWhois.csv geo_csv.csv

Before we import the data into MySQL we need to create a table for it to go into. The following DDL accurately describes the structure of the data. Obviously create a new database if you want; here I have one called geo_ip:

CREATE TABLE  `geo_ip`.`geo_csv` (
`start_ip` char(15) NOT NULL,
`end_ip` char(15) NOT NULL,
`start` int(10) unsigned NOT NULL,
`end` int(10) unsigned NOT NULL,
`cc` char(2) NOT NULL, `cn` varchar(50) NOT NULL )
ENGINE=MyISAM DEFAULT CHARSET=latin1;

If you look at the data in the CSV file you'll see it's delimited by commas and the text is qualified by double quotes. With that in mind, we use the following statement to import the data:

root@jim-desktop:/home/jim/data# mysqlimport --fields-terminated-by=","  --fields-optionally-enclosed-by="\"" \
                    --lines-terminated-by="\n" geo_ip /home/jim/data/geo_csv.csvgeo_ip.geo_csv
Records: 102957  Deleted: 0  Skipped: 0  Warnings: 0

If the mysqlimport binary is not in your environment path, use locate or find to find it. If you don't have it at all then use LOAD DATA INFILE.

So we now have the raw data imported into MySQL, but how do we use it? First let's take a look at the data:

mysql> select * from geo_csv order by rand() limit 10;
+---------------+----------------+------------+------------+----+----------------+
| start_ip      | end_ip         | start      | end        | cc | cn
| +---------------+----------------+------------+------------+----+----------------+
| 207.209.7.0   | 207.209.7.255  | 3486582528 | 3486582783 | AU | Australia      |
| 79.99.200.0   | 79.99.207.255  | 1331939328 | 1331941375 | BE | Belgium        |
| 217.27.192.0  | 217.27.207.255 | 3642474496 | 3642478591 | DE | Germany        |
| 194.59.180.0  | 194.59.180.255 | 3258692608 | 3258692863 | FR | France         |
| 81.16.160.0   | 81.16.175.255  | 1360044032 | 1360048127 | SE | Sweden         |
| 62.23.198.192 | 62.23.198.207  | 1041745600 | 1041745615 | GB | United Kingdom |
| 64.49.231.240 | 64.49.232.15   | 1077012464 | 1077012495 | US | United States  |
| 83.217.68.32  | 83.217.68.95   | 1406747680 | 1406747743 | BE | Belgium        |
| 91.193.20.0   | 91.193.23.255  | 1539380224 | 1539381247 | CH | Switzerland    |
| 194.37.249.0  | 194.37.249.255 | 3257268480 | 3257268735 | SE | Sweden         |
+---------------+----------------+------------+------------+----+----------------+
10 rows in set (0.22 sec)

The table is essentially a big list (~103k records) of IP ranges, given in both dot-decimal and decimal form. The decimal form is the most efficient to search on as the datatype int requires less memory than char, and with integers we can reliably make use of operators such as greater than, less than, BETWEEN, etc. Exactly how you will use the data will depend on your scenario. I began looking into this when I was working on a German website that wanted to know when a visitor was from Switzerland, so it could display prices in CHF rather than Euros. So in fact, all I needed to know was whether the visitor was Swiss, and if they were from any other country, they would see Euros. So the only columns I'll need from the table are start and end, and all the rows belonging to Switzerland, or 'CH'. So to make searches more efficient I'll grab only the data I need and put it in a new table called ch_ip:

mysql> create table ch_ip as select start,end from geo_csv where cc='CH';
Query OK, 2023 rows affected (0.05 sec)Records: 2023  Duplicates:  Warnings: 0

Great, that's cut the data from nearly 103 thousand records to just over 2 thousand, and we've also lopped off four columns. I'll now be searching on a table that's 18K in size, rather than the original 5.3MB. Maybe you need every row of data in your scenario, but in many cases you only need a fraction. And in any case, you really don't need the start_ip and end_ip columns (as you will see shortly). You could also split off the country names (cn column) into another table so that cc becomes a foreign key. Or you could ditch the country names completely and create an array of CC => CN in your application; there are only 239 unique CCs after all:

So I have my table of 2,023 Swiss IP ranges. Now I need to grab a visitor's IP address and convert it into decimal notation. For this we can use PHP's built-in function ip2long. We use sprintf to ensure the result is always unsigned:

<?php  $ip_num = sprintf("%u", ip2long($_SERVER['REMOTE_ADDR']));  ?>

Once we have $ip_num we can create our MySQL query:

<?php $qry = "SELECT '' FROM ch_ip WHERE $ip_num BETWEEN start AND end"; ?>

All we need to know is whether the query returns > 0 rows. If it does, then the visitor is Swiss and we'll set their locale appropriately. Obviously we don't want to be performing this query on every page, so once it has been performed once for the visitor we'll set a session variable. So the final code looks like this:

<?php
session_start();
if (!session_is_registered("locale")) {
    //check if the session variable has already been set first
    $con = mysql_connect('localhost', 'geo_user', 'geo_password');
    if ($con) {
        $ip_num = sprintf("%u", ip2long($_SERVER['REMOTE_ADDR']));
        mysql_select_db("geo_ip", $con);
        $result = mysql_query( "SELECT '' FROM ch_ip WHERE $ip_num BETWEEN start AND end" );
        $num_rows = mysql_num_rows($result);
        if ($num_rows > 0) {
            $_SESSION['locale'] = "ch";
         }
         else { $_SESSION['locale'] = "de"; }
    }
    else {
        //If no db connection can be made then set their locale to German
        $_SESSION['locale'] = "de";
    }
}
?>

Comments

comments powered by Disqus