Can You Hack Your Own Site? A Look at Some Essential Security Considerations
In Articles by Ben CharnockVersion one goes gold! Visitors are landing from every corner of the globe. You know there are likely to be a few teething problems, I mean, this is 1.0.0.0... all those zeroes are meant to allow us a little grace right?
Maybe that dastardly style sheet just won’t cascade elegantly on browser X. An incomplete comment chucks out some broken mark-up. Maybe you should have persisted those database connections after all. Hey, we all overlook things in the excitement of getting our first version running – but how many of these oversights can we happily stomach, and how many might just leave a bitter taste in ours, and more painfully our client’s mouths...
This article walks through the brainstorming stage of planning for what is in this instance, a hypothetical user-centric web application. Although you won’t be left with a complete project – nor a market ready framework, my hope is that each of you, when faced with future workloads, may muse on the better practices described. So, without further ado...Are you sitting comfortably?
The Example
We’ve been asked by our client to incorporate into an existing site, a book review system. The site already has user accounts, and allows anonymous commentary.
After a quick chat with the client, we have the following specification to implement, and only twenty four hours to do it:

Note: The client's server is running PHP5, and MySQL – but these details are not critical to understanding the bugbears outlined in this article.
The Processes:

Our client has given us a PHP include to gain access to the database:



We don’t actually need the source to this file to use it. In fact, had the client merely told us where it lived we could have used it with an include statement and the $db variable.
On to authorisation... within the datatable schema we are concerned with the following column names:
- username, varchar(128) – stored as plain text.
- password, varchar(128) – stored as plain text.
Given that we’re working against the clock... let’s write a PHP function as quickly as we can that we can re-use to authenticate our users:

$_REQUEST Variables
In the code above you will notice I’ve highlighted an area amber, and an area red.
Why did I highlight the not-so-dangerous $_REQUEST variables?
Although this doesn’t expose any real danger, what it does allow for is a lax approach when it comes to client side code. PHP has three arrays that most of us use to get our posted data from users, and more often than not we might be tempted to use $_REQUEST. This array conveniently gives our PHP access to the POST and GET variables, but herein lies a potential hang-up...
Consider the following scenario. You write your code client side to use POST requests, but you handover the project while you grab a break – and when you get back, your sidekick has written a couple of GET requests into the project. Everything runs okay – but it shouldn’t.
A little while later, an unsuspecting user types an external link into a comment box, and before you know it, that external site has a dozen username/password combinations in its referrer log.
By referencing the $_POST variables instead of $_REQUEST, we eliminate accidentally publishing any working code that might reveal a risky GET request.
The same principle applies to session identifiers. If you find you’re writing session variables into URLs, you’re either doing something wrong or you have a very good reason to do so.

SQL Injection
Referring again to the PHP code: the red highlighted line might have leaped out at some of you? For those who didn’t spot the problem, I’ll give you an example and from there see if something strikes you as risky...

This image makes clear the flaw in embedding variables directly into SQL statements. Although it can’t be said exactly what control a malicious user could have – it is guaranteed, if you use this method to string together an SQL statement, your server is barely protected. The example above is dangerous enough on a read-only account; the powers a read/write connection have are only limited by your imagination.
To protect against SQL injection is actually quite easy. Let’s first look at the case of quote enclosed string variables:
The quickest protection is to strip the enclosure characters or escape them. Since PHP 4.3.0 the function mysql_real_escape_string has been available to cleanse incoming strings. The function takes the raw string as a single parameter and returns the string with the volatile characters escaped. However mysql_real_escape_string doesn’t escape all the characters that are valid control characters in SQL... the highlighted elements in the image below shows the techniques I use to sanitise String, Number and Boolean values.

The first highlight, the line that sets $string_b uses a PHP function called addcslashes. This function has been part of PHP since version 4 and as is written in the above example, is my preferred method for SQL string health and safety.
A wealth of information is available in the PHP documentation, but I’ll briefly explain what addcslashes does and how to it differs to mysql_real_escape_string.

From the diagram above you can see that mysql_real_escape_string doesn’t add slashes to the (%) percent character.
The % is used in SQL LIKE clauses, as well as a few others. It behaves as a wildcard and not a literal character. So it should be escaped by a preceding backslash character in any cases where string literals make up an SQL statement.
The second parameter I pass to addcslashes, which in the image is bold; is the character group PHP will add slashes for. In most cases it will split the string you provide into characters, and then operate on each. It is worth noting, that this character group can also be fed a range of characters, although that is beyond the scope of this article – in the scenarios we’re discussing, we can use alphanumeric characters literally e.g. “abcd1234” and all other characters as either their C-style literal “\r\n\t”, or their ASCII index “\x0A\x0D\x09”.

The next highlight makes our number values safe for SQL statements.
This time we don’t want to escape anything, we just want to have nothing but a valid numerical value – be it an integer or floating point.
You might have noticed line 10, and perhaps wondered as to the purpose. A few years ago I worked on a call centre logging system that was using variable += 0; to ensure numerical values. Why this was done, I cannot honestly say... unless prior to PHP 4 that was how we did it?! Maybe somebody reading can shed some light on the subject. Other than that, if you, like I did, come across a line like that in the wild, you’ll know what it’s trying to do.
Moving forward then; lines 11 and 12 are all we need to prepare our numerical input values for SQL. I should say, had the input string $number_i contained any non-numerical characters in front or to the left of the numerical ones... our values $number_a, $number_b and $number_c would all equals 0.
We’ll use floatval to clean our input numbers; PHP only prints decimal places when they exist in the input value – so printing them into an SQL statement won’t cause any errors if no decimal was in the input. As long as our server code is safe, we can leave the more finicky validating to our client side code.
Before we move on to a final listing for our PHP, we’ll glance at the final code highlight, the Boolean boxing.
Like the C++ equivalent, a Boolean in PHP is really an integer. As in, True + True = Two. There are countless ways to translate an input string to a Boolean type, my personal favourite being: does the lower case string contain the word true?
You each may have you own preferred methods; does the input string explicitly equal “true” or is the input string “1” etcetera... what is important is that the value coming in, whatever it might look like, is represented by a Boolean (or integer) before we use it.

My personal philosophy is simply, if X is true or false, then X is a Boolean. I’ll blissfully write all the code I might need to review later with Booleans and not short, int, tinyint or anything that isn’t Boolean. What happens on the metal isn’t my concern, so what it looks like to a human is far more important.
So, as with numbers and strings, our Booleans are guaranteed safe from the moment we pull them into our script. Moreover our hygienic code doesn’t need additional lines.

Processing HTML
Now that we have our protected our SQL from injections, and we’ve made certain only a POST login can affably work with our script, we are ready to implement our review submission feature.
Our client wants to allow review enabled users to format their contributions as regular HTML. This would seem straightforward enough, but we also know that emails addresses are ten to the penny, and bookstore accounts are created programmatically – so in the better interests of everyone we’ll make sure only the tags we say pass.
Deciding how we check the incoming review might seem daunting. The HTML specification has a rather wholesome array of tags, many of which we’re happy to allow.
As longwinded the task might seem, I eagerly advise everyone – choose what to allow, and never what to deny. Browser and server mark-up languages all adhere to XML like structuring, so we can base our code on the fundamental fact that executable code must be surrounded by, or be part of, angle bracketed tags.
Granted, there are several ways we can achieve the same result. For this article I will describe one possible regular expression pipeline:

These regular expressions won’t produce a flawless output, but in the majority of cases – they should do a near elegant job.
Let’s take a look at the regular expression we’ll be using in our PHP. You’ll notice two arrays have been declared. $safelist_review and $safelist_comment – this is so we can use the same functions to validate reviews and later, comments:

...and here is the main function that we will call to sanitise the review and comment data:

The input parameters, I have highlighted red and blue. $input is the raw data as submitted by the user and $list is a reference to the expression array; $safelist_review or $safelist_comment depending of course on which type of submission we wish to validate.
The function returns the reformatted version of the submitted data – any tags that don’t pass any of the regular expressions in our chosen list are converted to HTML encoded equivalents. Which in the simplest terms makes < and > into < and > other characters are modified too, but none of these really pose a security threat to our client or the users.
Note: The functions: cleanWhitespace and getTags are included in the article’s source files.
You’d be correct to assume all we have really done is helped survive the aesthetics of our site’s pages, and not done everything to protect the user’s security. There still remains a rather enormous security hole even with the SQL safe, request spoofing cured and mark-up manipulated. The JavaScript injection;
This particular flaw could be fixed by a few more regular expressions, and/or modification to the ones we are already using. Our anchor regular expression only allows “/...”, “h...” and “#...” values as the href attribute – which is really only an example of a solution. Browsers across the board understand a huge variety of script visible attributes, such as onClick, onLoad and so forth.
We have in essence created a thorny problem for ourselves, we wanted to allow HTML – but now we have a near endless list of keywords to strip. There is of course, a less than perfect – but quite quickly written way to do this:

On reflection you’d be absolutely justified in asking, “Why didn’t we just use BBCode or Textile or...?”

Myself, if I were dealing with mark-up processing, I might even go for XML walking. After all the incoming data should be valid XML.
However, this article is not meant to teach us how to regex, how to PHP or how to write anything in one particular language. The rationale behind it simply being, don’t leave any doors ajar.
So let’s finish off then; with quick review of what we've looked at:

Although this article hasn't equipped you with any off the shelf project. A primary purpose of my writing was not to scare away the designers who code, or nitpick the work of coders anywhere - but to encourage everyone to author robust code from the off. That said, I do plan to revisit certain elements of this article in more detail later.
Until then, safe coding!
Sample Code
You can grab the sample PHP code used in this article here
Comments
Leave a CommentAdd a Comment













MGK
July 16th, 2008
Thank you a lot !! that was pretty damn usefull and well thought.
I mean, the way of thinking can certainly help for further coding sessions !
Next time, are you going to write a tuto about the main php vulnerabilities/exploits and tell us how to fix them ? could be usefull.
thanks a lot !
Alex Coleman
July 16th, 2008
Very useful! Will definitely devote some time putting this article into effect. Thanks!
Elliott Cost
July 16th, 2008
Looks insanely awesome. I’ll have to make time to read this.
Tommy M
July 16th, 2008
Great post. I have been working on a simple CRM for a friend’s landscaping company. I’ll be sure to use this article as a checklist when running security checks. Thanks!
Alexis
July 16th, 2008
You seriously just used Comic Sans, didn’t you?
Adam Jackett
July 16th, 2008
Great article. The only thing I see missing, and maybe this goes without saying, is validate your input on the server side, not just client side. Make sure your email addresses, phone numbers, etc etc are all formatted properly, but don’t just rely on javascript. You must do this server side as well, javascript validation should strictly be used for usability purposes, not security, as potential “hackers” can bypass the original form altogether leaving the client side validation useless.
Andrei Constantin
July 17th, 2008
just perfect. too bad many of us just throw everything on the internet before checking
Thomas Milburn
July 17th, 2008
Oh dear!
This article is certainly a start to but there are still quite a few large holes in the examples above. For example in the last piece of code the function replaceUnsafeWords is not in the least bit secure. You can bypass it by writing JAVASCRIPT in capitals or by adding in a style=”expression()” attribute. Always check your filtering methods with the XSS cheat sheet at http://ha.ckers.org/xss.html
Another thing not mentioned is passwords should be hashed with a salt before being inserted into the database. If you want more security articles, a good site is http://shiflett.org/
Shane
July 17th, 2008
Interesting article, and security is every bit as important (more so) than an accordian or hover effect!
RM
July 17th, 2008
OMG, best ever in Nettuts!
Scaring and useful post by the way.
Gilbert
July 17th, 2008
Nice tut. Defensive programming is always the best way to go. I agree that whitelisting is always safer then blacklisting.
One question. Instead of using the long complex regular expression for sanitizing HTML, why not use php’s strip_tags() function? In the parameter just put any tags you want to allow.
Ben Charnock
July 17th, 2008
Thanks Thomas Milburn; of course this isn’t the definitive security 101. I appreciate your mention of hashed passwords - but if I may be frank, encrypted storage is the last line of defence in a domain strangers _shouldn’t_ even be in. If you get my meaning?
In reply to Gilbert, I chose regex because it’s easily applicable to client and server code; be that PHP, VB, any of the ECMA derivatives etc etc. The article ended up larger than I’d planned… in an ideal world, strip_tags all the way.
No that’s a lie, PHP is not in my ideal world at all.
Rijalul Fikri
July 17th, 2008
Wow, really need this kind of tutorial. Thank you for it. I like everything except for the Comic Sans
Ben Charnock
July 17th, 2008
Okay okay… jeez… so I chose the single most obvious cursive font.
One of my lecturers told me a few years ago Comic Sans was most readable my most people.
This isn’t getting me off the hook is it?
Jon
July 17th, 2008
Thanks for a great article. I’m not a developer, but I do occasionally have to do php/mysql myself. On a whole I found this article eye opening, and incredibly helpful.
But you did lose me completely on cleaning booleans.
Thomas Milburn
July 17th, 2008
@Ben
Sure, encrypted storage isn’t essential but it is an extra layer of security if someone finds a database backup or your source code. The replaceUnsafeWords function should be a whitelist instead of a blacklist. What happens if someone tries to link to javascript.com, it turns into zilch.com!!
BTW Thanks for showing me some new functions addcslashes and mysql_pconnect I haven’t come across those before.
Ben Charnock
July 17th, 2008
You’re quite correct Thomas. When I wrote the unsafe words function, I was aware it would yield hideous results.
My intention was to discourage mark-up in the first place by painting it to be painfully long winded… I can think of more than a handful of *cough* places where user feedback is HTML - and should one be having a particularly bad day, they could floor MySQL.
It would probably serve me well to insert an edit emphasising the “less than perfect” part haha!
Ben Charnock
July 17th, 2008
Jon, the Boolean thing…
The SQL reason:
Say you pass something like this…
myPage.php?showOutOfStock=1
…printed into SQL directly like…
SELECT * FROM `table` WHERE `outofstock` = 1
You could pass anything in place of 1.
The philosophical reason:
“true” is a string, “1″ is a number… a boolean is true or false, because, that’s what it is.
Reading if(foobar) and if(!foobar) makes more logical sense than if(foobar == “true”) and if(foobar == “false”)
ali
July 17th, 2008
this one needs some special time. Looks like an excellent tutorial!
Craig Farrall
July 17th, 2008
Fantastic tutorial, I will definately have to try this out one day.
Matt Radel
July 17th, 2008
Damn, I need to learn php & db shizz. Looks like a great tut!
Guillaume
July 17th, 2008
Nice tutorial, I didn’t know about the addcslashes method. Seems to be a right way to avoid SQL Injection.
Faith Rivenbank
July 17th, 2008
I’ve enjoyed the article, and definitely learned a thing or two.
Notepad++’s default comment font is Comic Sans.
._. weird.
gumbah
July 17th, 2008
LOL @Alexis
Andy
July 17th, 2008
I’m sorry, but this really isn’t a well-planned article. The example code presented is insecure and the explanations are too brief. Plus, the code will crash and burn under certain conditions.
What you readers really should do is:
* follow the best practices that’s documented in the PHP manual.
* Read Chris Shiflett’s blog on PHP security (as Thomas Milburn also suggested).
* Flip through the Web Application Security slides as well as following the links on the page.
It’s a shame that nettuts’ articles hasn’t really lived up to the standards of Professional Web Development. There’s a lot of things missing from the articles I’ve encountered here at nettuts which should be focused on (such as semantics, statistically proven design approaches, solid backend coding, etc.) and which I find to be a distinctive marker between a Professional Web Developer and a hobbyist.
So far, nettuts have seemingly been a source for mediocre web developers to push out mediocre articles to an oblivious crowd of readers.
So a pledge to the authors who decide to write an article for nettuts:
Please! If you intend to write an article for nettuts, make sure you know what you are talking about! If you are going to teach, make sure you teach them right!
Abethebabe
July 17th, 2008
@Ben
I forgive the comic sans, this time…
Gonzalo
July 17th, 2008
Or.. you could just use a good web development framework.
alphadog
July 17th, 2008
The first security risk is rushing the job to fit under some ridiculous “under 24hrs” criteria… rush job always = insecure site.
yamaniac
July 17th, 2008
Duh! Good article! But I din understand a word
Miles Johnson
July 17th, 2008
Very nice, you could just use a singleton or db access class.
Charles
July 17th, 2008
pconnect is *dangerous* and should *never* be used unless you have configured both Apache AND MySQL to deal with the persistent connections. If the configuration is not correct, you’re just going to end up locking people out of your site with a “too many connections” error straight from MySQL.
Ben Griffiths
July 17th, 2008
This is a really great article, thanks
Jon
July 17th, 2008
@Ben
It’s clicked and I see where you’re coming from now.
Thanks for clearing that up for me.
Connor
July 17th, 2008
Great tutorial, looks to be very useful.
Lamin Barrow
July 17th, 2008
Thanks for publishing this article here. It should help expose some of the security threats web applications face as new ways are always devised to expose them.
Lamin Barrow
July 17th, 2008
I meant exploit them. Excuse my clumsiness.
Ed Mort
July 17th, 2008
Ok ! It’s right !
PHP are a secure language, but most part of security need by “handmade”.
All PHP developers must have this on mind !
Thanks
Danny
July 17th, 2008
This is really usefull, thanks
Kia Kroas
July 17th, 2008
Miles Johnson, I don’t mean to be picky, I just wanted to clarify things a bit. A singleton doesn’t necessarily keep persistent connections. If Ben Charnock wanted one connection only, he could have just referenced it through $db (without making more connects). Also I think he’s using procedural and not object oriented style. I use my own mysql connection class with its own singleton when doing PHP5 OOP…really speeds things up.
Anyway, persistent connections stay on even after the php script closes, and that’s the advantage–and the disadvantage is that each child spawns its own persistent connection. In comparison, a singleton doesn’t keep the connection going once the script exits. I haven’t ever needed to code a site that needed persistent connections though (and I’m too lazy to deal with the MySQL and PHP tunings to make sure it works flawlessly).
Also, in LIKE statements, it’s not just %, there’s also the _ that acts as a wildcard. I’ve always trusted mysql_real_escape_string (except in parameters with wildcards) because I can’t be sure of the current character encoding and don’t want to deal with it.
And lastly, I’d recommend people use HTMLTidy ( http://tidy.sourceforge.net/ ) or HTML Purifier ( http://htmlpurifier.org/ ) for html cleansing. Unless it’s a small project, it’s too risky (not to mention bothersome) to code your own.
Overall, nice beginners tutorial. Keep it up.
Dan
July 17th, 2008
This looks like a great tutorial, albeit a bit above my head, as I’m not even positive on the wrong ways to do things yet.
I would love to see some fairly basic, but useful PHP (&& / || MySQL) tutorials show up here.
Braden Keith
July 17th, 2008
Oh boy, time to add to my hacking skills. bahaha. I’m to lazy to do it to my own site, I just hope that everyone that visits is friendly. With this new one coming up here I will take the measures though. Thanks.
Joefrey Mahusay
July 17th, 2008
Very interesting tutorial… Thanks
Patrick
July 18th, 2008
Hm, nothing new to me. And some things are really wrong.
never use mysql_pconnect, unless you know how it works. Most of the MySQL-Servers use only 1024 connections. If there are more, your users will get an error message. And this is a serverwide setting - if your site is hosted on a multiuser-machine you’ll get this error sooner as you thought.
this tutorial will just keep “scriptkids” away - there are several ways more to inject javascript code or hack your database.
There’s just ONE impotant thing to know: Validate ANY data, which comes from the outside. This includes also $_SERVER and $_COOKIE. This server-side variables are definitively manipulable by anyone and this is nowhere written in this tutorial.
Taylor Satula
July 18th, 2008
Umbody just used comic sans didn’t they
Another Blog
July 18th, 2008
Nice, I came from a security background (wrong side of the fence if you get me) into programing and web design and I would agree with these. Another key one though being remote / local file inclusion. For instance if you are including files from user input
for instance
Andrew
Topher
July 18th, 2008
Nice intro to security, hard to take it seriously with all that comic sans though.
Ben Charnock
July 18th, 2008
I think it’s fair I defend myself…
This article was written to draw attention to a limited number of huge flaws written into sites I frequent, most of you frequent, and some of you *operate*.
It isn’t a project, a product or a library.
If you read it, and you think it is… please read it again.
PS: mysql_pconnect shouldn’t be used? That tickles, really, stop that…
Steve
July 19th, 2008
I will have to read this more carefully. Thank you for the great advice!
wiz
July 20th, 2008
none of the images/code show up for me. I would like to take a look at how to make my code more secure, but I can’t see the lines which you are giving examples with. Would be nice if someone could fix the broken images.
Mark Abucayon
July 20th, 2008
very cool, I will study this one later. Good Job
Maicon
July 21st, 2008
This is great! I have a overdose of useful information. I wait for more articles like this!
David Millar
July 22nd, 2008
We recently came across an interesting attack and we’ve posted a link to your article, the sample attack, the solution and some suggested tips.
Please see the article at:
http://www.rtraction.com/blog/devit/sql-injection-hack-using-cast.html
Craig
July 23rd, 2008
Excellent article, hopefully will push myself to check more thoroughly instead of subconsciously skipping the parts of my site that are a bit sketchy!
Dan Donald
July 26th, 2008
Great post! There are so many attack vectors that can be used now, it’s great to see more practical measures you can work into your code.
Keep it up!
James
July 31st, 2008
A very interesting read… thank you!
Raj
August 4th, 2008
very nice tutorial. definitely useful read. will try it out. Thanx Ben.
Lucas
August 5th, 2008
definetly a great article… but like two guys said before: it’s not a good idea to store passwords in plain text… I can highly recommend to encrypt them using md5 or base64
Matthew Prasinov
August 5th, 2008
Thanks, some interesting stuff here
konan
August 7th, 2008
Areally good post!
We are using mysql_pdo which automatically protects against all these mysql injections.
Manisha
August 10th, 2008
Thanks for this grate source of information.