In the world of HipHop for PHP, developers use HPHPi (the HipHop interpreter) when writing, debugging and testing PHP code. After the PHP code is committed, the PHP files are translated into C++ files which are then compiled and linked into the HipHop server. The HipHop server runs on production machines with a huge increase in performance.
However, it is well known that HPHPi itself is slow. Despite its slowness, HPHPi is useful for developers running HipHop because it is semantically similar to HipHop-compiled code, and can catch bugs before the code gets compiled. We use HPHPi because the old PHP interpreter can no longer run our production code and Facebook developers need to see the result of changes they make immediately.
As the number of PHP developers and the size of PHP codebase continue to grow, there have been more and more complaints on the poor performance of HPHPi. So, we decided to spend some time to improve HPHPi itself so that PHP developers everywhere can have a faster turnaround cycle in their daily work. Here’s how we’re doing it:
- Sharing parse trees among different sandboxes. On a shared server, the first request from a user usually times out because the HPHPi server needs to parse the files. For a PHP codebase as big as Facebook’s, most of the files would remain the same across sandboxes. If the same file has already been parsed in one sandbox, the parse tree can be reused in a different sandbox. This saves the parse time and also significantly reduces the memory footprint that was used to store essentially duplicate parse trees. The same benefit can be realized in a development environment if a developer has several sandboxes.
- Moving PHP file parsing out of a critical region. The HPHPi server has a file repository for storing parse trees. The file repository is shared across all the threads. When parsing a file, the file repository was locked exclusively and no other thread can check out a file even for execution only. While this may not have been a serious problem, this optimization makes it possible to parse multiple files in parallel.
- AST node specialization. HPHPi walks the parse tree when executing PHP code (called eval in HPHPi). Many of the AST nodes in the parse tree are generic. As a simple example, the IncOpExpression node represents 4 flavors of pre/post increment/decrement. Most of the time, the operand is a simple variable rather than a complex l-value expression. So a parse tree optimization phase is added where 4 specialized AST nodes IncVariableExpression, VariableIncExpression, DecVariableExpression and VariableDecExpression are generated to replace the more generic IncOpExpression node when possible. This not only speeds up most of the increment/decrement operations, but also reduces the AST node size (and thus, memory footprint) because the specialized AST nodes no longer need to store extra metadata that indicates pre/post or increment/decrement.
- Cache a pointer to built-in function metadata. In HPHPi, a simple function call still involves a hash table lookup. The function name is used as the key to look up the function table to get the function metadata. This lookup can be avoided for built-in functions. At the optimization phase, a specialized AST node BuiltinFunctionCallExpression is used to replace the more generic SimpleFunctionCallExpression node. The built-in function is looked up at optimization time and the result is cached in the new AST node, which has better runtime performance.
- Precompute static values. Scalar values include various literals (for example, integer literals, string literals) and arrays that contain only scalar values. A new AST node ScalarValueExpression is added to cache the precomputed scalar value. Most importantly, scalar arrays will become ScalarValueExpressions so that we don't spend time building scalar arrays at run time.
- Preprocess AST nodes. For AST nodes that are frequently eval'ed, we can preprocess the AST nodes at the optimization phase. For example, BinaryOpExpression has two child AST nodes. Many times one of the child nodes is a ScalarValueExpression and it is overkill to do a virtual C++ call of eval to get its value. We can directly fetch the value that is cached in the child node if preprocessing tells us that the child node is a ScalarValueExpression.
- Type prediction. The HipHop compiler does type inference to deduce the type of a variable at compile time so the more specific type can be used when generating C++ code, resulting in better performance. A similar idea is applied here. For example, the type of $a in ++$a is very likely an integer so we can bypass the call to an overloaded operator in C++ with a faster code path when the type is correctly predicted.
- Improve various hash table lookups. In some cases, HPHPi used C++ std::map or a similar map to store the lookup table (for example, the function table). They are replaced by HipHop's StringIMap or StringMap to speed up lookup because often the key is already a StaticString, which has the hash value precomputed. In addition, StaticStrings are made unique in HPHPi so a raw char * pointer comparison can short circuit the more expensive call to strcmp or strcasecmp.
- Make common function calls faster. When calling a function dynamically, the arguments are packed into an array because the caller does not know the C++ signature of the callee. Each compiled function has a C++ i-proxy (invoke proxy) to unpack the argument array and make the real call in C++. Packing arguments into an array is an expensive operation for the simple purpose of packing a vector of arguments because the array needs to be constructed/destructed. The HipHop compiler has an optimization to avoid the use of an argument array when the number of arguments does not exceed 6. Each compiled function also has an ifa-proxy (invoke few args proxy) that passes through the arguments directly to the callee. HPHPi lacked this optimization but it is now added. Because most function calls involve no more than 6 arguments, it helps to make common function calls faster.
- Running unit tests in server mode. The HPHPi optimizations move many computations from run time to parse time. Depending upon the unit test, it can either speed up or slow down the test. If the unit test spends most of the time parsing files, it can get slower because of the extra compile time optimizations. Facebook’s internal tools team has developed a server mode unit test framework. The client-side test driver sends each unit test to an HPHPi server and the test result is passed back to the client. This can significantly speed up unit tests because the HPHPi server can run multiple tests in parallel, reusing the parse trees among different threads.
We got a big performance win by moving to running our unit tests from within a server instead of spinning up a new instance of HPHPi to run each test.
The benchmark we used to measure the result of HPHPi improvement was bench.php, which is a set of synthetic benchmark programs that the HipHop team has collected. We ran comparisons between my HipHop git commits on August 9, 2011 and October 11, 2011.
Before the optimization, bench.php took 72.33 seconds to execute. After we added the optimization, execution took 33.5 seconds, a 53.6% increase in speed. The execution times vary even on the same machine so these numbers are rough and not cast in stone.
Minghui is a software engineer on the HipHop team.