Skip to content

Spider PHP Asynchronous MySQL Library

View Project Source: https://github.com/jessecascio/spider

Spider is an asynchronous PHP MySQL library allowing for concurrent MySQL queries. The inspiration for Spider came from working with large star pattern database schemas. Star pattern databases include a wide (many column) lookup table containing ids to corresponding (fact) tables. Pulling data from star schemas requires either complex queries with numerous JOIN statements, or multiple queries to each fact table, followed by programmatic analysis/aggregation of the various data pieces.

By having an asynchronous MySQL library, we are able to run multiple queries in parallel, processing the data either as each query is returned or once all queries have completed.

Architecture

The architecture behind Spider is based on pulling data from a data source via multiple PHP processes and storing the results in temporary storage, either in RMDS or a cache. The client using the Spider library does not run the queries, rather just starts all of the PHP processes, assigns them queries, and monitors them while the queries are executed and the results are stored in temporary storage. While the queries are returning data, the Spider library handles any data formatting via custom callbacks, and once completed notifies the client that the data is available.

Src Folder

Connection - Holds MySQL connection for the data source i.e. what the data is pulled from. Additional sources can be implemented by implementing the Connection decorator.

Storage - Wrapper for the temporary data storage. Currently storage for both MySQL and Memcached are available; however, more storage solutions can be added by implementing the Storage decorator.

Component - Contains the core Spider classes. Config is used for holding all the different configuration options. Logger is a facotory wrapper class for Monolog. Nest handles spawning up all the new PHP processes. Web is the controller which starts, monitors, and stops the whole process.

bin - Holds the script which is the PHP process for executing a query and storing the result set in temporary storage.

Examples

See the project's github repo for examples: https://github.com/jessecascio/spider