Build your own CMS, part 1

Today I stumbled upon this really terrible example, of “How to create an admin panel”, it’s terrible in so many ways, that it made me cry. And yet it’s been posted on January 14th, 2012. I thought to my self, why don’t I share my knowledge in the field I’ve been working for the past 10 years. Not that it’s a rare topic, probably there are plenty of tutorials like this around the web, but for the sake of noting stuff down in a personal blog I’m going to do it.

Prerequisites

First things first. You’ll need a decent environment to work in, and this tutorial will start with this setup:

  • Debian Linux
  • Apache HTTP server version 2.2
  • PHP 5.3.x
  • PostgreSQL 8.4 (yes, yes, yes, not what you expected, but that’s mostly because I’m more used to it and it’s closer to SQL standard than MySQL :))

But actually, the environment doesn’t matter that much, you can stick to Windows and IIS as long as you can get PHP and PostgreSQL up and running. In later parts, we’ll add some sugar to the system (like mod_rewrite), and it will concentrate more on Apache and it’s configuration.

File system layout

Download the source code of this tutorial here.

It’s not that important in the first steps of creating your own CMS, but it might get complicated later on if you don’t think about it early enough. We have two options to think about:

  1. All the files will reside in webroot (a.k.a. httpdocs or /var/www/). It’s the only option if you have FTP access that directly points to webroot directory or the administrator is not the kindest lad in town – you can’t ask him to do some custom configuration for you. So you’ll have to add some anti-direct-execution security to your inclusion files.
  2. Only main entry points reside in webroot (index.php, article.php, etc.), everything else is moved outside webroot – good solution, to protect you from module loading failures or attacks where somebody tries to get directly to some include file (for example, requesting config.inc and reading it as plain text if your administrator was not smart enough to disable access to *.inc) or some file named delete_system.php, that was supposed to be included only after confirmation dialog ­čÖé

I would prefer to go with the second one, but for all the newbies reading this, I’ll go with the first one. In most cases it’s still the most popular web hosting solution, so better be safe, than sorry.

First step of protection is naming all your files with an extension .php, so that they will always be passed though PHP parser and will not reveal the source (in case the .inc extension is not mapped to PHP, which usually isn’t). Second step is using some master constant checking in your include files. For example:

if (!defined('MASTER_INC')){
	echo 'Forbidden';
	exit;
}

And in your main entry point (which from now on I’ll assume is index.php, but it might be default.php, depending on your environment configuration) you should set this before any inclusion begins:

define('MASTER_INC', 1);

Good, now we are safe, nobody can launch unexpected behavior, except for that small possibility of missing PHP module, where the web server will transfer all the PHP files as a simple text file. Yikes! Anyway, that’s why I’d go for the solution No.2, where you put all the code of business logic hidden outside webroot. But for now, we’ll have to pray the new gods and the old ones as well, for it never to happen.

Database layout

It’s a normal procedure to start with a database layout and an administration side of your system, so that you can lay out all the database relations, do the planing ahead and afterwards write some content “management” code.

Let’s create a database the PostgreSQL way, by that I’ll assume that you are using either psql console or pgAdmin. We’ll create new database user which can log-in (by the way that is not the default behavior in PostgreSQL :)) and we’ll create new database, which will be UTF-8 encoded and owned by the new user.

--
-- Create database user
--
CREATE ROLE cms_tutorial LOGIN PASSWORD 'asdf';
--
-- Create database and set it's owner
--
CREATE DATABASE cms_tutorial ENCODING 'unicode' OWNER cms_tutorial;

Next, we’ll need some basic data structures created, for the administration panel authorization. Connect to the newly created database with the newly created user, so that it becomes the owner of all the objects created later on and you wouldn’t have to alter owners for each table afterwards. Now run this create script:

--
-- This is the main incremental sequence
--
CREATE SEQUENCE pkey;
--
-- CMS administrator account table
--
CREATE TABLE admin (
  id integer NOT NULL DEFAULT nextval('pkey'),
  username varchar(64) NOT NULL,
  password varchar(32) NOT NULL,
  status smallint DEFAULT 0
);
ALTER TABLE admin ADD PRIMARY KEY (id);

This probably calls for some explanations for those who have been used to MySQL. PostgreSQL does not have an AUTO_INCREMENT, it has something far more better – sequences, that increment on request and return the new value. They can be used globally, across multiple tables (and not only tables) and they are concurrent (so there will never be any duplicate values returned from them) – great feature, lovin’ it! As you can see we’ve created a sequence called “pkey”, and we’re going to use it across many tables, because it’s a 64 bit integer (as far as I remember) and for a small or a medium system it’s more than enough to keep the score. To get the next incremental value from a sequence, you have to call PostgreSQL method “nextval” passing sequence name as a parameter. In the field “admin.id” I’ve added default value to be gathered from our sequence “pkey”.

Create a connection

It’s a good practice to create a configuration file, where you can store global configuration settings that are used all around your code and need to change when you’re moving your site from one hosting solution to different one. Let’s start with filling in our database connection settings in config.inc.php:

<?php
if (!defined('MASTER_INC')){
	echo 'Forbidden';
	exit;
}

define('DB_HOST', 'localhost');
define('DB_PORT', 5432);
define('DB_NAME', 'cms_tutorial');
define('DB_USER', 'cms_tutorial');
define('DB_PASS', 'asdf');
?>

Next, let’s create the connection to our new database in index.php, right after inclusion of config.inc.php (so that PHP knows about our new database configuration constants).

$conn_str = 'host='.DB_HOST.'  dbname='.DB_NAME.' user='.DB_USER.' password='.DB_PASS.' port='.DB_PORT;
pg_connect($conn_str);
$r = pg_query("SELECT nextval('pkey')");
if (pg_num_rows($r) > 0){
	list ($pkey) = pg_fetch_row($r);
	var_dump($pkey);
}

And we’re done. If you launch your site, you should see a number printed out with a var_dump(). If there are some errors about failed connection to database, you should check PostgreSQL configuration in pg_hba.conf file, which is an access configuration file, and see if it’s allowed to connect from the host where your HTTP server resides to the PostgreSQL database server. Also do some checking with “phpinfo()” (in PHP) to see if there is a pgsql extension installed. Once I came across one virtual server service provider, who claimed that they have a PostgreSQL support, but they didn’t have this extension installed and it was like buying a car without an engine.

Authorization sessions

Alright, we’re getting close to some fun stuff. But we need to talk about sessions first. What’s a session you say? Well, session is a small persistent storage in PHP, that keeps values stored across page reloads, and as you might know, PHP is all about reloading – it chews your source code, executes it and then shuts down. It’s identifier is stored in a cookie, which is a similar persistent storage, but resides in client’s browser. And this is where we hit one small security issue called “session hijacking”, that might happen. Session hijacking is a method of stealing someones session by stealing it’s ID from cookies and before you know it, someone has started to break stuff in your content management system on your behalf. But no worries, I have some tricks up my sleeve.

  1. Regenerate session ID on every page load – this might be the most popular one out there;
  2. Track user’s IP – not that you can trust the user’s IP retrieved from your HTTP server, but it’s one more thing to keep the bad guys further away;
  3. Add some fingerprinting – or actually it’s a double session ID generation, where you store second key in cookies or in the address (don’t put anything in the address that’s session related – it’s a really bad practice, and it’s like leaving your keys at the doorstep), that has to comply to some other key stored in the session itself.

Anyway, there’s nothing that’s floating around the world wide web, that can not be intercepted, even the fingerprint cookie, but it might still give some headaches to our perpetrator.

To start a session, just write session_start() in your code, but do it before any output has started as setting a cookie requires that no output has started yet (cookies are transferred as a header field attached to your output). So your entry script should start to look like this:

<?php
define('MASTER_INC', 1);

include('config.inc.php');

session_start();
$old_sessionid = session_id();
session_regenerate_id();
$new_sessionid = session_id();

echo 'Old Session: '.$old_sessionid.'<br />';
echo 'New Session: '.$new_sessionid.'<br />';

$conn_str = 'host='.DB_HOST.'  dbname='.DB_NAME.' user='.DB_USER.' password='.DB_PASS.' port='.DB_PORT;
pg_connect($conn_str);
$r = pg_query("SELECT nextval('pkey')");
if (pg_num_rows($r) > 0){
	list ($pkey) = pg_fetch_row($r);
	var_dump($pkey);
}

Line No.6 starts the session (it tries to get a session ID from a cookie, if it’s not there, new session gets created. Line No.8 generates new session ID (so every time you reload the page, the output of New and Old Session ID’s should differ).

Now we’re going to add a tracking code for client’s IP and some salty tokens with fingerprints. Welcome to the security kitchen.

session_start();
$client_ip = (isset($_SERVER['REMOTE_ADDR']) ? $_SERVER['REMOTE_ADDR'] : 'unknown');
$regenerate = false;
if (isset($_SESSION['client_session'])){
	if (!isset($_COOKIE['client_token'])){
		// Token does not exist - might be missing in case of hijack or cookie has timed out
		session_destroy();
		echo 'Session timeout';
		exit;
	}
	$client_fp = md5($_SESSION['client_session']['secret'].(isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : 'Unknown').$_COOKIE['client_token']);
	if ($_SESSION['client_session']['fingerprint'] !== $client_fp || $_SESSION['client_session']['ip'] !== $client_ip){
		// Fingerprint or an IP does not match - do the automatic log-out
		session_destroy();
		echo 'Forbidden';
		exit;
	}
	if ($_SESSION['client_session']['time'] < time() - (30)){
		// 30 seconds have passed, we should force regeneration process
		$regenerate = true;
	}
} else {
	$regenerate = true;
}
if ($regenerate){
	// Regenerate fingerprints
	session_regenerate_id();
	$time = time();
	$token = md5(session_id().$time);
	$secret = md5(session_id());
	$fingerprint = md5($secret.(isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : 'Unknown').$token);
	$_SESSION['client_session'] = array(
		'fingerprint' => $fingerprint,
		'secret' => $secret,
		'ip' => $client_ip,
		'time' => $time
	);
	// Token expires in 1 hour
	setcookie('client_token', $token, $time + (60 * 60), '/');
}

Good, we’ve added some security to our sessions, not that it has made everything completely safe, but as I said – it might give some headaches to our enemies.

Log-in

Before we delve into this HTML/CSS/PHP and maybe some JavaScript madness, we have to add some data to our newly created database:

--
-- Insert new admin user
--
INSERT INTO admin (username, password, status) VALUES ('myuser', md5('mypass'), 1);

Rule of security No.1 – NEVER ever thou shall store any password in a plain-text form into the database. That’s why we are using PostgreSQL built-in method of “md5()”, which by now is not the safest encryption┬áalgorithm, because there are a lot of md5 rainbow tables floating around the web, but it’s still safer, than just a plain-text password or a sha1 encrypted ones that got stolen from LinkedIn.

Now to the fancy part. We should create a login form:

<form method="post" action="" id="login">
	<h1>Admin Login</h1>
	<div>
		<label for="login-username">Username:</label>
		<input type="text" name="username" id="login-username" placeholder="Your Username" />
	</div>
	<div>
		<label for="login-password">Password:</label>
		<input type="password" name="password" id="login-password" placeholder="Your Password" />
	</div>
	<div class="submit">
		<input type="submit" name="login" id="login-submit" value="Log-In" />
	</div>
</form>

This should be pretty straight forward, but let me do some explanations about HTTP methods used in our form. There are 4 methods defined by HTTP standard:

  • GET – as the name suggests, it is to be used to get stuff from your server – never send sensitive data using a GET method, because this method adds all the parameters to the URL and it’s really easy to mess up and, for example, share your log-in information on Facebook;
  • POST – also, as the name implies, it’s for posting stuff – sending information to the server and that’s it. This is the right way of sending information to the server, like, posting a comment, sending log-in information, etc.;
  • PUT – not used in www environment, but it was intended to … yes exactly … put stuff on the server. Actually, this method has come back and now has been advertised by this new buzzword called REST, which is a service protocol over HTTP;
  • DELETE – well, this should be self explanatory. And yes, REST is trying to bring this back;

But, as I said, in every day use, you’ll probably use only GET and POST, and be reasonable about them. Use GET only when you’re requesting information from server (like search forms, page listings, etc.) and use POST when you’re sending information to server (log-in, posting comments, writing blog entries and uploading files).

OK, now we have to process our client’s input. Let’s add some code:

$login_errors = array();
if (isset($_POST['username'])){
	$username = trim($_POST['username']);
	$password = md5($_POST['password']);

	$q = "SELECT id 
				FROM admin 
				WHERE username='".pg_escape_string($username)."' AND password='".pg_escape_string($password)."' AND status=1";
	$r = pg_query($q);
	if (pg_num_rows($r) > 0){
		list ($admin_id) = pg_fetch_row($r);
		$_SESSION['admin_id'] = intval($admin_id);
		header('Location: index.php');
		exit;
	} else {
		array_push($login_errors, 'Username or a password is incorrect');
	}
}

Rule of security No.2 – ALWAYS escape strings and integers when building a SQL query.

So what happens in this portion of fine cooked code? Basically, we check weather there is a POST variable “username” sent to the server (I’ll assume that there is also a “password” sent, it’s not a good practice to assume, but for the┬ásimplicity’s sake we’ll omit that check). We gather username and password and query an admin’s ID from database if we can find one with this username and encrypted password, and it’s status is 1 (where we assume that zero, means disabled user). If we find a match, we save it in the session, so after the page has reloaded, we still know with whom we are dealing with.

Now let’s update our HTML to signal user about successful or unsuccessful authorization:

<?php
if (isset($_SESSION['admin_id']) && intval($_SESSION['admin_id']) > 0){
?>
<div id="account">
	<h1>Welcome!</h1>
	<p><a href="?logout=1">Log-out</a></p>
</div>
<?php
} else {
?>
<form method="post" action="" id="login">
	<h1>Admin Login</h1>
<?php
	if (isset($login_errors) && is_array($login_errors) && count($login_errors) > 0){
		echo '<p class="error">'.implode('<br />', $login_errors).'</p>';
	}
?>
	<div>
		<label for="login-username">Username:</label>
		<input type="text" name="username" id="login-username" placeholder="Your Username" />
	</div>
	<div>
		<label for="login-password">Password:</label>
		<input type="password" name="password" id="login-password" placeholder="Your Password" />
	</div>
	<div class="submit">
		<input type="submit" name="login" id="login-submit" value="Log-In" />
	</div>
</form>
<?php
}
?>

Here, we do a check, weather session value has been set and we offer a log-out option. If for some reason, the login failed, we display error message and offer our client to retry.

Log-out

Log-out procedure is pretty straight forward – we destroy the session by calling session_destroy() and forget that the user was ever logged in. But, this is not the practice I’d suggest you to do for a couple of reasons:

  1. Sessions are great to store not only authorization data, but also some other settings, that we don’t want to store in any other way (like cookies or as an URL parameter). For example, a site configuration that has been customized by the client (think about, for example, weather to show a “tip of the day” or not);
  2. Session will be regenerated after the reload, because of the session_start() at the beginning of our code, that means, that this will just delete the session file (or remove key from memcahce, depending on how your sessions are handled) and create new one;
  3. And from my personal┬ápreference, it’s better to be in control, than just discard and recreate.

So we’re going to clean out only unnecessary stuff from our sessions:

if (isset($_GET['logout'])){
	unset($_SESSION['admin_id']);
	header('Location: index.php');
	exit;
}

That’s it folks…

… for now. This is the first post from the series, I hope you liked it and I hope that I made some stuff clear (especially about the security). Don’t worry if this post did not contain any buzzwords, like “Framework”, “OOP”, or “MVC” … good things come to those who wait.

Oh yes, and if you were really lazy on typing or copying everything I wrote here, then here you can download the final code of this part.

1 thought on “Build your own CMS, part 1”

  1. This was an interesting read, with some good pointers towards security. Looking forward to the second part…

Leave a Reply

Your email address will not be published. Required fields are marked *