Disk seeks are evil, so let’s avoid them, pt. 4

Continuing in the theme from previous posts, I’d like to examine another case where we can eliminate all disk seeks from a MySQL operation and therefore get two orders-of-magnitude speedup. The general outline of these posts is:

B-trees do insertion disk seeks. While they’re at it, they piggyback some other work on the disk seeks. This piggyback work requires disk seeks regardless.
TokuDB’s Fractal Tree indexes don’t do insertion disk seeks. If we also get rid of the piggyback work, we end up with no disk seeks, and a two order of magnitude improvement.

So it’s all about finding out which piggyback work is important (important enough to pay a huge performance penalty for), and which isn’t.

This blog post is about one of the most straightforward operations: ad-hoc insertions with a primary key. Since the difference I’ve identified between B-tree indexes and Fractal tree indexes is the disk seeks on insertions, it may seem that there’s little to look at in this case. But the semantics of insertion into a primary key is as follows: if the primary key being inserted does not already exist in the table, insert the element, otherwise return an error notifying the user that a duplicate key exists.

So, if we have a table:

create table foo (a int, b int, primary key (a))

1	create table foo (a int, b int, primary key (a))

and we run:

insert into foo values (1000,1);

1	insert into foo values (1000,1);

Before inserting (1000,1) into the table, the storage engine must first check to see if any row in the table where a=1000 exists. If so, return an error stating that there is a duplicate key. The requirement to return an error is VERY expensive: it requires a disk seek.

The only way a disk based storage engine can verify if a key exists is to look up the element. This is the same reasoning used for unique secondary keys here. If the primary key is ad-hoc, the lookup requires disk seeks. And disk seeks are evil.

Once again, B-trees already pay for the disk seek just for the insertion. Because the leaf node must be in memory to do the
insertion, verifying uniqueness essentially comes for free. So B-trees suffer from disk seeks, whether you do duplicate checking or not.

So how can users achieve fast performance with a Fractal tree based storage engine — as compared to B-trees, which will be slow regardless? Remove this requirement of returning an error if a duplicate is key found. It is way too expensive. MySQL has syntax to perform other actions instead of returning an error:

replace into
insert ignore
insert … on duplicate key update

The question is, do these commands do piggyback work that requires disk seeks? If they do, performance will be slow for ANY disk based storage engine. If not, then fractal tree data structures will be achieve up a speedup of up to two orders of magnitude. In next week’s post, I examine these cases.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Disk seeks are evil, so let’s avoid them, pt. 4

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

How to Improve Database Performance: The Ultimate Guide

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Disk seeks are evil, so let’s avoid them, pt. 4

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

How to Improve Database Performance: The Ultimate Guide

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation