P A R A L L E L U N I V E R S E ====================================================== Usage Guide (Version 5.5-3.1) Compatible with MySQL 5.5 and Percona 5.5 servers Chapter 1. Parallel Query Parallel Query processes tables in parallel with multiple threads for fast execution. Chapter 2. Parallel Network Query (Distributed Query) Parallel Network Query joins local tables of the server with remote tables of other servers in the network where remote table processing is delegated to host servers. Chapter 3. Parallel Network Query to Speed up Slow Server Parallel Network Query is used to speed up slow server by distributing tables of the query to multiple servers for processing which effectively aggregates server resources (such as disk I/O bandwidth and CPU/cores). ====================================================== ====================================================== Chapter 1. Parallel Query ====================================================== Parallel Query processes tables in parallel with multiple threads for fast execution. 1. Start Parallel Universe with either paralleluniversed or mysqld_safe with --mysqld option. Use a different data dir if another MySQL server is already running, see MySQL 5.5 Manual 5.6.3 Running Multiple MySQL Instances on Unix. (paralleluniverse-5.5-3.1-* is the installation dir) $ cd /usr/local/paralleluniverse-5.5-3.1-* $ bin/mysqld_safe --no-defaults --mysqld=paralleluniversed --lc-messages-dir=/usr/local/paralleluniverse-5.5-3.1-*/share --datadir=/usr/local/paralleluniverse-5.5-3.1-*/data "or your MySQL data dir" --socket=/tmp/paralleluniverse.sock --pid-file=/tmp/paralleluniverse.pid --port=3307 "your other server parameters" & 2. Connect to Parallel Universe with MySQL client and confirm that you are connected to the right server. $ bin/mysql --socket=/tmp/paralleluniverse.sock -u root Server version: Parallel Universe 5.5-3.1 3. Change the prompt(optional) and select a database. mysql> prompt "Parallel Universe> " (without quotes) PROMPT set to 'Parallel Universe> ' Parallel Universe> use my_database; 4. Run explain on the query you want to parallel process. Parallel Universe> explain select * from t1,t2,t3,t4 where my_select_cond; +----+-------------+-------+- | id | select_type | table | +----+-------------+-------+- | 1 | SIMPLE | t1 | | 1 | SIMPLE | t2 | ... | 1 | SIMPLE | t3 | | 1 | SIMPLE | t4 | +----+-------------+-------+- 5. Enable parallel processing for the last 2 tables in the table join order and run explain again(optional to confirm parallel processing). Parallel Universe> set @parallel_table_list="my_database.t3,my_database.t4"; (default=null) Parallel Universe> explain select * from t1,t2,t3,t4 where my_select_cond; +----+-------------+-------+- -+------------------------------------+ | id | select_type | table | | Extra | +----+-------------+-------+- -+------------------------------------+ | 1 | SIMPLE | t1 | | | | 1 | SIMPLE | t2 | ... | | | 1 | SIMPLE | t3 | | Processed in parallel | | 1 | SIMPLE | t4 | | Processed in parallel | +----+-------------+-------+- -+------------------------------------+ 6. Run the query. Parallel Universe> select * from t1,t2,t3,t4 where my_select_cond; Performance gain may not be obtained until more tables are parallel processed. Not all queries can benefit from parallel processing. 7. Add next table to the list and run the query. Parallel Universe> set @parallel_table_list="my_database.t2,my_database.t3, my_database.t4"; Parallel Universe> select * from t1,t2,t3,t4 where my_select_cond; 8. Continue adding tables until reaching performance plateau beyond which only marginal gain is obtained. 9. Assign null to the list to disable parallel processing. Parallel Universe> set @parallel_table_list=""; Query OK, 0 rows affected (0.00 sec) 10.Or set the maximum number of parallel tables to 1 to disable parallel processing. You can limit the number of tables to be processed in parallel with this user variable. (regardless of what's in @parallel_table_list) Parallel Universe> set @max_parallel_tables=1; (default=0, unlimited) 11.Shut down the server. Parallel Universe> quit $ bin/mysqladmin --socket=/tmp/paralleluniverse.sock -u root shutdown ====================================================== Parallel Query Requirements ====================================================== * Tables processed in parallel must be the last consecutive tables in table join order of the query. * Tables which are part of nested outer join or outer join with more than one inner table are excluded. * Temporary tables created by the query are excluded. * Tables in subselect(subquery) are excluded. ====================================================== Parallel Query Performance Tuning ====================================================== A user can fine tune 2 new server parameters in addition to existing MySQL 5.5 and Percona 5.5 server parameters. * @inter_table_buffer_size user variable which specifies the size of buffer which stores records from previous tables in parallel processing. The sum of used fields in the query and overhead must fit in this buffer therefore if the sum exceeds this value then the query will fail. (server error #1722: ER_INTER_TABLE_BUFFER_OVERFLOW) Parallel Universe> set @inter_table_buffer_size=16384; (=default=16KB) * --exclusive_key_buffer_size server command line option (applicable only to MyISAM tables) which specifies the size of exclusive key buffer allocated for every MyISAM table processed in parallel. $ bin/mysqld_safe ... --exclusive_key_buffer_size=8M & (=default=8,388,608) ====================================================== Chapter 2. Parallel Network Query (Distributed Query) ====================================================== Parallel Network Query joins local tables of the server with remote tables of other servers in the network where remote table processing is delegated to host servers. (remote tables represented as Federated tables) select * from , # 1st group of tables (g1) , # 2nd group of tables (g2) . . . < local tables @ base server > # Last group of tables (gn) where query_condition; 1. Start all servers with either paralleluniversed or mysqld_safe with --mysqld option. Use a different data dir if another MySQL server is already running, see MySQL 5.5 Manual 5.6.3 Running Multiple MySQL Instances on Unix. (paralleluniverse-5.5-3.1-* is the installation dir) $ cd /usr/local/paralleluniverse-5.5-3.1-* $ bin/mysqld_safe --no-defaults --mysqld=paralleluniversed --lc-messages-dir=/usr/local/paralleluniverse-5.5-3.1-*/share --datadir=/usr/local/paralleluniverse-5.5-3.1-*/data "or your MySQL data dir" --socket=/tmp/paralleluniverse.sock --pid-file=/tmp/paralleluniverse.pid --port=3307 --federated --skip-name-resolve "your other server parameters" & 2. Connect to the base server with MySQL client and confirm that you are connected to the right server. $ bin/mysql --socket=/tmp/paralleluniverse.sock -u root . . . Your MySQL connection id is 1 Server version: Parallel Universe 5.5-3.1 . . . 3. Change the prompt(optional). mysql> prompt "Parallel Universe> " (without quotes) PROMPT set to 'Parallel Universe> ' 4. Create table links. Tables (link via Federated table) Aux1 Aux2 . . . Base g1 (local) g1 -> Aux1:g1 g1 -> Aux1:g1 g2 -> Aux2:g2 g2 (local) g2 -> Aux2:g2 . . . gn -> Base:gn gn -> Base:gn gn (local) Example with 3 Servers with 1 Table each Aux1 Aux2 Base (192.168.0.1) (192.168.0.2) (192.168.0.3) db1.t1 (local) db1.t1 -> Aux1:db1.t1 db1.t1 -> Aux1:db1.t1 db2.t2 -> Aux2:db2.t2 db2.t2 (local) db2.t2 -> Aux2:db2.t2 db3.t3 -> Base:db3.t3 db3.t3 -> Base:db3.t3 db3.t3 (local) 5. Enable parallel processing for all tables of the query. Parallel Universe> set @parallel_table_list="db1.t1,db2.t2,db3.t3"; 6. Run explain on the query to determine table join order. Parallel Universe> explain select * from db1.t1,db2.t2,db3.t3 where query_condition; +----+-------------+--------+- | id | select_type | table | +----+-------------+--------+- | 1 | SIMPLE | db3.t3 | | 1 | SIMPLE | db1.t1 | ... | 1 | SIMPLE | db2.t2 | +----+-------------+--------+- 7. Redefine servers according to the table join order. Aux1 Aux2 Base db3.t3 db1.t1 db2.t2 (192.168.0.3) (192.168.0.1) (192.168.0.2) 8. Login to the new base server (192.168.0.2) and enable parallel processing for all tables of the query. 9. Specify the numbers of tables to be processed by auxiliary servers. ----- First n tables of table join | order to be processed by | auxiliary server 1 | | --- Next m tables of table join | | order to be processed by | | auxiliary server 2 | | | | Remaining tables to | | be processed by base server v v Parallel Universe> set @table_count_list="1,1"; This assigns first table of table join order to auxiliary server 1 and next table to auxiliary server 2 and the remaining table to base server. Tables processed by each server must be consecutive in the table join order and must not be Federated tables. (see the following note) 10.Specify auxiliary servers in the table join order. Aux. -------------------- Server | 1 | | Aux. | --- Server | | 2 | | v v Parallel Universe> set @server_list="192.168.0.3:3307,192.168.0.1:3307"; auxiliary server is specified by IP address:Port(and optional :Socket, default=TCP Port 63306). Socket is used for inter server transmission of partial query result. 11.Specify your account info at auxiliary servers. Parallel Universe> set @multi_server_user="john_doe"; Parallel Universe> set @multi_server_password="john_doe_password"; You must have an account at all auxiliary servers, to create an account at auxiliary servers, Parallel Universe> grant all on *.* to john_doe identified by 'john_doe_password'; 12.Run explain on the query (optional to confirm Parallel Network Query). Parallel Universe> explain select * from db1.t1,db2.t2,db3.t3 where query_condition; +---+-----------+--------+ +-----------------------------------------------+ |id |select_type| table | | Extra | +---+-----------+--------+ +-----------------------------------------------+ | 1 | SIMPLE | db3.t3 | | Processed in parallel @192.168.0.3:3307:63306 | | 1 | SIMPLE | db1.t1 | ... | Processed in parallel @192.168.0.1:3307:63306 | | 1 | SIMPLE | db2.t2 | | Processed in parallel | +---+-----------+--------+ +-----------------------------------------------+ auxiliary servers are identified by IP Address:Port:Socket. 13.Run the query. Parallel Universe> select * from db1.t1,db2.t2,db3.t3 where query_condition; 14.Assign null to the table count list to disable Parallel Network Query. Parallel Universe> set @table_count_list=""; (=default) OR Parallel Universe> set @max_parallel_tables=1; to disable Parallel Query which automatically disables Parallel Network Query. NOTE Tables processed by each server must be consecutive in the table join order and must not be Federated tables. If this condition is not met then move or replicate tables from one server to another replacing corresponding Federated tables at the destination server. If tables are moved then corresponding Federated tables need to be created at the source server. Example (Federated tables not shown), table join order is t1,t2,t3. Server 1 Server 2 t1 t2 t3 Solution 1) Move or replicate t1 at Server1 and use Server 2 as the base. t1 moved OR t1 replicated Server 1 Server 2 Server 1 Server 2 t1 t1 t1 (unused) t2 t2 t3 t3 Parallel Universe> set @table_count_list="2"; Parallel Universe> set @server_list="Server 1"; Solution 2) Move or replicate t3 at Server1 and use Server 1 as the base. t3 moved OR t3 replicated Server 1 Server 2 Server 1 Server 2 t1 t1 t2 t2 t3 t3 t3 (unused) Parallel Universe> set @table_count_list="1"; Parallel Universe> set @server_list="Server 2"; ====================================================== Parallel Network Query Requirements ====================================================== * All tables of the query must be present at each server as local table or Federated table. * The query must yield the identical execution plan at all servers. * The group of tables to be processed by each server must all be parallel tables except the first auxiliary server. For the first auxiliary server, the last n tables of the group must be parallel tables where n is at least 1 and up to the entire group. * Parallel table may not be part of nested outer join or outer join with more than one inner table. * Parallel table may not be a temporary table created by the query. * Parallel table may not be a table in subselect(subquery). * Tables processed by each server must be consecutive in the table join order of query and must not be Federated tables. (Federated tables participate in the optimization phase of query execution but not in the execution phase.) * Servers must open TCP ports assigned to the port and the socket. * The servers must be all 64 bit Parallel Universe 5.5-3.1 servers or all 32 bit Parallel Universe 5.5-3.1 servers. ====================================================== Parallel Network Query Performance Tuning ====================================================== A user can fine tune following server parameters in addition to existing MySQL 5.5 and Percona 5.5 server parameters. * @inter_table_buffer_size user variable which specifies the size of buffer which stores records from previous tables in parallel processing. The sum of used fields in the query and overhead must fit in this buffer therefore if the sum exceeds this value then the query will fail. (server error #1722: ER_INTER_TABLE_BUFFER_OVERFLOW) Parallel Universe> set @inter_table_buffer_size=16384; (default, 16KB) * --exclusive_key_buffer_size server command line option (applicable only to MyISAM tables) which specifies the size of exclusive key buffer allocated for every MyISAM table processed in parallel. $ bin/mysqld_safe ... --exclusive_key_buffer_size=8M & (default, 8,388,608) * @inter_server_socket_size user variable which specifies the size of inter server socket which transmits partial query result to the next server. Parallel Universe> set @inter_server_socket_size=16384; (default, 16KB) * @inter_server_socket_timeout user variable specifies the wait time for inter sever socket I/O, this value may need increase when encountering server ERROR 1723 (HY000): PARALLEL/NETWORK QUERY Unable to ... may require longer inter server socket timeout. Parallel Universe> set @inter_server_socket_timeout=10; (default, in seconds) * @inter_server_socket_retry user variable specifies the number of retries for interrupted inter server socket I/O, increase this count when encountering server ERROR 1723 (HY000): PARALLEL/NETWORK QUERY Unable to ... may require larger inter server socket retry. Parallel Universe> set @inter_server_socket_retry=19; (default) ====================================================== Parallel Network Query Debugging and Test ====================================================== * For debugging and test, you may want to run auxiliary servers on the same host hardware as the base server, see MySQL 5.5 Manual 5.6.3 Running Multiple MySQL Instances on Unix. Parallel Universe> set @server_list="127.0.0.1:3308,127.0.0.1:3309:63307"; IP Address of 127.0.0.1 refers to localhost but ports and sockets must be distinct. In this example, 1st aux. server is using the default socket (TCP Port 63306). You must also have user@localhost account at all auxiliary servers (which is not covered by user without '@' account), create an account at auxiliary servers. Parallel Universe> grant all on *.* to john_doe@localhost identified by 'john_doe_password'; * Start servers with log file, base server's log file includes errors from auxiliary servers. Use appropriate file name (base.log, aux1.log, aux2.log and so on). $ bin/mysqld_safe --no-defaults --mysqld=paralleluniversed . . . --log=~/base.log & ====================================================== Chapter 3. Parallel Network Query to Speed up Slow Server ====================================================== Parallel Network Query is used to speed up slow server by distributing tables of the query to multiple servers for processing which effectively aggregates server resources (such as disk I/O bandwidth and CPU/cores). 1. Run explain on the query to determine table join order. Parallel Universe> explain select * from where query_condition; +----+-------------+--------+- | id | select_type | table | +----+-------------+--------+- | 1 | SIMPLE | t1 | | 1 | SIMPLE | t2 | | 1 | SIMPLE | t3 | ... | 1 | SIMPLE | t4 | | 1 | SIMPLE | t5 | | 1 | SIMPLE | t6 | +----+-------------+--------+- 2. Divide ordered tables into groups. t1 --- group 1 processed by | Auxiliary Server 1 | (192.168.0.1) t2 --- t3 --- group 2 processed by | Auxiliary Server 2 | (192.168.0.2) t4 --- t5 --- group 3 processed by | Base Server | (192.168.0.3) t6 --- 3. Start auxiliary servers with either paralleluniversed or mysqld_safe with --mysqld option. Use a different data dir if another MySQL server is already running, see MySQL 5.5 Manual 5.6.3 Running Multiple MySQL Instances on Unix. (paralleluniverse-5.5-3.1-* is the installation dir) $ cd /usr/local/paralleluniverse-5.5-3.1-* $ bin/mysqld_safe --no-defaults --mysqld=paralleluniversed --lc-messages-dir=/usr/local/paralleluniverse-5.5-3.1-*/share --datadir=/usr/local/paralleluniverse-5.5-3.1-*/data "or your MySQL data dir" --socket=/tmp/paralleluniverse.sock --pid-file=/tmp/paralleluniverse.pid --port=3307 --federated --skip-name-resolve "your other server parameters" & 4. Replicate or move table groups to auxiliary servers. A) Replicate all groups at auxiliary servers. OR B) Replicate one group at auxiliary servers. The rest of groups are created as Federated tables (links to base server). OR C) Move one group to auxiliary servers. The rest of groups are created as Federated tables (links to appropriate server). Case A) Aux1 Aux2 Base t1 (local) t1 (local) t1 (local) t2 (local) t2 (local) t2 (local) t3 (local) t3 (local) t3 (local) t4 (local) t4 (local) t4 (local) t5 (local) t5 (local) t5 (local) t6 (local) t6 (local) t6 (local) Case B) Aux1 Aux2 Base t1 (local) t1 -> Base:t1 t1 (local) t2 (local) t2 -> Base:t2 t2 (local) t3 -> Base:t3 t3 (local) t3 (local) t4 -> Base:t4 t4 (local) t4 (local) t5 -> Base:t5 t5 -> Base:t5 t5 (local) t6 -> Base:t6 t6 -> Base:t6 t6 (local) Case C) Aux1 Aux2 Base t1 (local) t1 -> Aux1:t1 t1 -> Aux1:t1 t2 (local) t2 -> Aux1:t2 t2 -> Aux1:t2 t3 -> Aux2:t3 t3 (local) t3 -> Aux2:t3 t4 -> Aux2:t4 t4 (local) t4 -> Aux2:t4 t5 -> Base:t5 t5 -> Base:t5 t5 (local) t6 -> Base:t6 t6 -> Base:t6 t6 (local) 5. Enable parallel processing for all tables of the query. Parallel Universe> set @parallel_table_list="mydb.t1,mydb.t2,mydb.t3, mydb.t4,mydb.t5,mydb.t6"; 6. Specify numbers of tables to be processed by auxiliary servers. ----- First 2 tables of table join | order to be processed by | aux. server 1 | | --- Next 2 tables of table join | | order to be processed by | | aux. server 2 | | | | Remaining tables to | | be processed by base server v v Parallel Universe> set @table_count_list="2,2"; 7. Specify auxiliary servers in the table join order. Aux. -------------------- Server | 1 | | Aux. | --- Server | | 2 | | v v Parallel Universe> set @server_list="192.168.0.1:3307,192.168.0.2:3307"; auxiliary server is specified by IP address:Port(and optional :Socket, default=TCP Port 63306). Socket is used for inter server transmission of partial query result. 8. Specify your account info at auxiliary servers. Parallel Universe> set @multi_server_user="john_doe"; Parallel Universe> set @multi_server_password="john_doe_password"; You must have an account at all auxiliary servers, to create an account at auxiliary servers, Parallel Universe> grant all on *.* to john_doe identified by 'john_doe_password'; 9. Run explain on the query (optional to confirm Parallel Network Query). Parallel Universe> explain select * from t1,t2,t3,t4,t5,t6 where query_condition; +---+-----------+--------+ +-----------------------------------------------+ |id |select_type| table | | Extra | +---+-----------+--------+ +-----------------------------------------------+ | 1 | SIMPLE | t1 | | Processed in parallel @192.168.0.1:3307:63306 | | 1 | SIMPLE | t2 | | Processed in parallel @192.168.0.1:3307:63306 | | 1 | SIMPLE | t3 | ... | Processed in parallel @192.168.0.2:3307:63306 | | 1 | SIMPLE | t4 | | Processed in parallel @192.168.0.2:3307:63306 | | 1 | SIMPLE | t5 | | Processed in parallel | | 1 | SIMPLE | t6 | | Processed in parallel | +---+-----------+--------+ +-----------------------------------------------+ auxiliary servers are identified by IP Address:Port:Socket. 10.Run the query. Parallel Universe> select * from t1,t2,t3,t4,t5,t6 where query_condition; 11.Assign null to the table count list to disable Parallel Network Query. Parallel Universe> set @table_count_list=""; (=default) OR Parallel Universe> set @max_parallel_tables=1; to disable Parallel Query which automatically disables Parallel Network Query. ====================================================== Parallel Network Query to Speed up Slow Server Requirements ====================================================== * All tables of the query must be present at each server as local table or Federated table. * The query must yield the identical execution plan at all servers. * The group of tables to be processed by each server must all be parallel tables except the first auxiliary server. For the first auxiliary server, the last n tables of the group must be parallel tables where n is at least 1 and up to the entire group. * Parallel table may not be part of nested outer join or outer join with more than one inner table. * Parallel table may not be a temporary table created by the query. * Parallel table may not be a table in subselect(subquery). * Tables processed by each server must be consecutive in the table join order of query and must not be Federated tables. (Federated tables participate in the optimization phase of query execution but not in the execution phase.) * Servers must open TCP ports assigned to the port and the socket. * The servers must be all 64 bit Parallel Universe 5.5-3.1 servers or all 32 bit Parallel Universe 5.5-3.1 servers. ====================================================== Parallel Network Query to Speed up Slow Server Performance Tuning ====================================================== A user can fine tune following server parameters in addition to existing MySQL 5.5 and Percona 5.5 server parameters. * @inter_table_buffer_size user variable which specifies the size of buffer which stores records from previous tables in parallel processing. The sum of used fields in the query and overhead must fit in this buffer therefore if the sum exceeds this value then the query will fail. (server error #1722: ER_INTER_TABLE_BUFFER_OVERFLOW) Parallel Universe> set @inter_table_buffer_size=16384; (default, 16KB) * --exclusive_key_buffer_size server command line option (applicable only to MyISAM tables) which specifies the size of exclusive key buffer allocated for every MyISAM table processed in parallel. $ bin/mysqld_safe ... --exclusive_key_buffer_size=8M & (default, 8,388,608) * @inter_server_socket_size user variable which specifies the size of inter server socket which transmits partial query result to the next server. Parallel Universe> set @inter_server_socket_size=16384; (default, 16KB) * @inter_server_socket_timeout user variable specifies the wait time for inter sever socket I/O, this value may need increase when encountering server ERROR 1723 (HY000): PARALLEL/NETWORK QUERY Unable to ... may require longer inter server socket timeout. Parallel Universe> set @inter_server_socket_timeout=10; (default, in seconds) * @inter_server_socket_retry user variable specifies the number of retries for interrupted inter server socket I/O, increase this count when encountering server ERROR 1723 (HY000): PARALLEL/NETWORK QUERY Unable to ... may require larger inter server socket retry. Parallel Universe> set @inter_server_socket_retry=19; (default) ====================================================== Parallel Network Query to Speed up Slow Server Debugging and Test ====================================================== * For debugging and test, you may want to run auxiliary servers on the same host hardware as the base server, see MySQL 5.5 Manual 5.6.3 Running Multiple MySQL Instances on Unix. Parallel Universe> set @server_list="127.0.0.1:3308,127.0.0.1:3309:63307"; IP Address of 127.0.0.1 refers to localhost but ports and sockets must be distinct. In this example, 1st aux. server is using the default socket (TCP Port 63306). You must also have user@localhost account at all auxiliary servers (which is not covered by user without '@' account), create an account at auxiliary servers. Parallel Universe> grant all on *.* to john_doe@localhost identified by 'john_doe_password'; * Start servers with log file, base server's log file includes errors from auxiliary servers. Use appropriate file name (base.log, aux1.log, aux2.log and so on). $ bin/mysqld_safe --no-defaults --mysqld=paralleluniversed . . . --log=~/base.log &