Occasionally I read through some comments on the PHP Manual, sometimes to get ideas on different methods of doing things, other times just to try to keep current with some of the vast array of functions available.
Sometimes, I see things that really scare me – code that is written and published with the best will in the world from the author – but yet displays a lack of a deeper understanding of how to solve a problem. One such case was the invert_case() and rand_case() functions which basically looped through each character in a string doing whatever it had to do to each character as it went. Highly inefficient.
Remember, the only difference in ASCII between an uppercase letter and a lowercase letter is a single bit that is 0 for uppercase and 1 for lowercase.
This brief tutorial is based on code available at:
http://www.pgregg.com/projects/php/code/str_case.phps
and you can see example output at:
http://www.pgregg.com/projects/php/code/str_case.php
Surely it would be possible to write some code that would simply flip this bit in each character to the value you want:
– AND with 0 to force uppercase
– OR with 1 to force lowercase
– XOR with 1 to invert the case
– randomly set it to 1 or 0 to set random case.
There are two methods to achieving this, the first makes a simple character mask and performs a bitwise operation on the string as a whole to change it as required. This method is designed to help teach how this works. The second method uses the power of the PCRE engine by using a regex to calculate the changes and apply them in one simple step.
Both solutions are, I believe, elegant and are presented here for you.
Solution #1:
Code:
// Code that will invert the case of every character in $input
// The solution is to flip the value of 3rd bit in each character
// if the character is a letter. This is done with XOR against a space (hex 32)
$stringmask = preg_replace("/[^a-z]/i", chr(0), $input); // replace nonstrings with NULL
$stringmask = preg_replace("/[a-z]/i", ' ', $stringmask); // replace strings with space
return $input ^ $stringmask;
|
The method here is to generate a string mask, in two stages, that will act as a bitmask to XOR the 3rd bit of every letter in the string. Stage 1 is to replace all non-letters will a NULL byte (all zeros) and Stage 2 is to replace all letters with a space (ASCII 32) which just happens to be a byte with just the 3rd bit set to 1 i.e. 00100000
All we have to do then is XOR our input with the string mask and magically the case of all letters in the entire string are flipped.
Solution #2:
Code:
return preg_replace('/[a-z]+/ie', ''$0' ^ str_pad('', strlen('$0'), ' ')', $input);
|
Much more compact and works by using a regex looking for letters and using the i (case insensitive) modifier and most importantly the e (evaluate) modifier so we can replace by executing php code. In this case, we look for batches of letters and replace them with itself XORed with a string of spaces (of the same length).
Similar principles apply to the random case example, but we complicate this slightly by adding and invert mask (to the solution 1 method). This invert mask is created by taking a random amount of spaces (between 0 and the size of the input string). We then pad this out to the size of the original string with NULL bytes and finally randomise the order with str_shuffle(). We then bitwise AND the stringmask and the invertmask so we create a new mask where randomly letters in the mask have spaces or NULLs. We then XOR this to the original string as before and before you know it you have a randomly capitalised string.
The Solution 2 version requires you to remove the + so that we only match a single letter at a time (or else our randomly chosen case would apply to words at a time), and we use a termary to randomly decide on using a space or a NULL:
Code:
return preg_replace('/[a-z]/ie', '(rand(0,1) ? '$0' ^ ' ' : '$0')', $input);
|
I hope this has been a worthwhile read and I would certainly welcome feedback on this article.