For those of you who haven’t heard, bulk processing in a nutshell can be described more or less of a small multiple processes that is being ran at one single process rather than individually. In this writing, I’m about to show you one simple case in which bulk query to database can enhance performance compared to doing several queries one after another.
There are a few process in this example that we will run through:
- Create postgres DB instance
- Create users table, and seed some data into the table(100 data)
- Create a simple Golang program. In the program we will get the users by given multiple id. One function will get it one query by one query, the other one will fetch the users data given by multiple id at once.
Create Postgres DB Instance
We’re going to use docker for the sake of simplicity in creating disposable DB instance. Following the basic instruction in the following link: https://hub.docker.com/_/postgre
you can run the written command (with some minor adjustment)
docker run --name local-postgres -e POSTGRES_PASSWORD=password -p 5432:5432 -d postgres
We’re going to change the container name to be local-postgres, the password to be password, and declare port mapping 5432 to 5432.
Creating Table and Data Seeding
After our DB container is running, we need to create the table and put some data for our experiment. You can download / see the seeder SQL query here:
It will create users table with two columns (id auto increment primary key, name varchar), then we’ll insert 100 random data. More or less the table example will be like this kind of format:
Go Program to Do the Benchmarking
Since the DB is all ready, now we’re going to create simple go program to conduct the experiment.
Disclaimer: The code written for this experiment is written to the point for the sake of simplicity.
The code flow will works like these:
- Init the DB function (we’re using sqlx library for all the DB init and operation, https://github.com/jmoiron/sqlx)
- Hardcode some random id contains 10 and 20 id’s that we will query from the database
- Run function that will get the name based on 10/20 multiple id’s one by one and print the elapsed time. It will loop for each id and do query WHERE id = $currentID.
- Run function that will get the name based on 10/20 multiple id’s bulk in one query, and print the elapsed time. It will use query WHERE id = ANY ($multipleID) ~ equivalent to WHERE IN query.
- Point 3 and 4 will be ran 10 times each to get the average elapsed time
The source code can be in this link:
The result based on above scenario can be seen on table below (data is in miliseconds ms):
As we can see that the bulk process on average got 8.4 times faster than the single query for 10 data, and even 13.7 times faster than single query for 20 data.
Since the bulk query performance does better performance than looping multiple data and do query for each of them, it’s better to use bulk query if you want to get multiple data at once. Single query iterated through a loop is slower due to its flow to call and use DB connection over and over again while the bulk performance wins due to one call is all that it needs to get the data. However, this might not be working on all DB calling cases, It is more suitable for the case when one single simple and fast query is being done for multiple different parameters, then you can group it into batch process to get this kind of improvement.