Boosting Old Technology – Can Caching Help?

I have written a few times about some older Supermicro servers I am working with recently.

Since the setup I built has 4 x 1tb hard drives (spinning kind), and no space for an SSD to use as cache I bought a used 160gb Fusion-IO ioDrive on eBay.

I had been running Debian on the server. The drivers/software for the Fusion-IO under Linux were a bit restrictive, requiring a specific distro and kernel. I could not find any free/cheap caching software to use under Linux that actually worked. I found some open source stuff, it just did not work.

The Hardware/Software Setup

Why the switch from Debian to Windows?

  1. Fusion-IO setup under Windows is much easier
  2. Availability of Intel’s caching software
  3. Being able to easily run any version of mySQL

The server getting the upgrade powers a database – nothing else. The hard drives use the onboard Intel RAID controller, operating in RAID 10, which based on my previous testing gave the best write performance while providing redundancy.

The workload is 90% write, with no users sitting waiting for writes to complete. Write performance still needs to be quick to keep up with the transaction volume. Users do not want slow reads either – of course.

Ideally I would have a Fusion ioDrive big enough to hold all the database files, but that is not financially possible.

Currently the live data on the database is about 110+ gigs. Technically that would fit on the 160gb Fusion ioDrive. As the database grows, there could be issues.

Intel LogoThe Intel Cache Acceleration Software provides read caching only, it operates in write-through mode to populate the cache – write-back is not available in the software but I would probably not use it anyway due to the increased risk of data loss in the event of a crash or power failure.

Lets find out what a PCI based SSD can do with some caching of those spinning disks. All tests were done on the server, with no users connected. The database was restored for each test, automatically populating the cache if caching enabled.

110+ Gig Database Restore

  • 2 hours, 57 minutes – Restore to RAID 10 (no caching)
  • 3 hours, 10 minutes – Restore to RAID 10 (with caching)
  • 2 hours, 7 minutes – Restore to Fusion-IO storage (no caching)

Sample Reads

I conducted a number of read tests, accessing various databases running reports and user queries. The same queries were run for a tests after a fresh database restore. Some queries make full use of indexes, some do not.

  • 22.54 seconds – Read from RAID 10 (no caching)
  • 22.62 seconds – Read from RAID 10 (with caching)
  • 21.57 seconds – Read from Fusion-IO storage (no caching enabled)

Surprised by those numbers?!? I sure am, underwhelming to say the least. Time to do some research into why the performance is so bad when it should be no much better, both when using cache and when direct from the Fusion-IO card.

Testing The Disks

RAID 10

A previous benchmark on this exact same hardware, but with a different version of Windows showed a maximum write speed of 138 MB/sec. Now running the same benchmark software, on the same hardware but a different version of Windows it is maxing out at only 58 MB/sec. Things get more and more strange today.
RAID 10 - Slow

Fusion-IO ioDrive

I have no previous benchmark for this one, but the current graph shows speeds that I would expect from this PCI memory card. An impressive maximum write speed of 811 MB/sec. and a read speed of 847 MB/sec.
fusion-io-benchmark

With a write speed increase of 1360% over the spinning disk, why is the mySQL restore only 39% faster when restoring to the Fusion-IO disk?

Super Micro IPMI Firmware – X7DWT-INF

supermicro-logoI am managing a rack of Super Micro servers, the motherboard model is X7DWT-INF with built in IPMI using a AOC-SIMSO+ card.

The servers are old – quite old, but still perform their designated tasks just fine.

The rack contains a mix of Windows & Linux servers.

iodriveI wanted to convert one of the Linux servers over to Windows 2008 (the latest Windows OS supported by the board) to test out the performance of a Fusion-IO ioDrive (also old technology).

Despite having installed all the operating systems on this rack of Super Micro boards, getting Windows 2008 installed via IPMI was quite difficult. The system would install until the graphical interface showed up – then I lost all mouse or keyboard control.

Tried everything I could think of, no luck. Maybe there is a newer firmware than is installed on this machine? The AOC-SIMSO+ had firmware 1.60 installed, after hours of searching I found that 1.64 was available but I could not find any release notes anywhere.

Don’t know what the difference in firmware is, but it seemed to have fixed my problem. Just for my own future reference I am leaving a copy of the 1.64 build here… with such old computers Super Micro has sorta pushed the firmware to the back of their site so it is difficult to location.

I’m still looking for the latest firmware for the motherboard itself…

Is Windows Slowing Down Your Internet?

This post does not apply to most people, but if you have high speed Internet – it might. My daily driver is a 300 Mbps fiber line. It is fast and I can saturate the link but it takes multiple connections to do it (quite a few connections actually).

If I download a file single file via the web or FTP I might get 30 Mbps per second. That is fairly quick but it is a lot less than 300.

If I download two files at once, from the same remote server both files will come down at 30 Mbps for a combined speed of 60 Mbps.

If you are in the same situation, the problem is probably related to your TCP receive window. The TCP window combined with the distance in MS you are away from the source server determines your maximum download speed.

I found a great article that someone wrote that explains both the calculations of your maximum download speed potential – AND how Windows may actually be limiting your download speed. Yes, it is true!

After reading the article, titled How Windows is Killing Internet Download Speeds I learned that indeed Windows was limiting my connection speed.

I saw major speed increases when downloading from a server that was far away from me. Why Windows limits your connection speed is unknown to me, but simply by executing this:

netsh interface tcp set heuristics disabled

I was able to remove the limit and now download at full speeds.

Windows Display Scaling

I needed a new laptop (have not had one for years) and a nice Dell was just announced at CES 2015 – the XPS 13 2015 model.

Dell XPS 13It has a super thin design, three processors to choose from and a super high resolution display option with a maximum resolution of 3200 x 1800. Their base model has a resolution of 1920 x 1080.

Got the new laptop a couple days ago, it defaults to a resolution of 3200 x 1800, it needs to use Windows desktop display scaling at 200% or everything on the screen is so small you can’t read anything unless you are superman.

At this point I don’t see the advantage of a high resolution display and then having to boost everything up by 200%. Essentially you are reducing your screen resolution though the enlargement process. What happens if you try and enlarge a photo, have you ever tried it? You loose resolution of the photo, you can’t create something out of nothing.

When an application does not specifically support display scaling the application ends up looking very blurry. My main text editor for development is Suplime Text, the latest non beta is version 2… it does not support display scaling so the text is blurry and impossible to work with (it is supported in the newer beta builds).

I tried explaining this to a friend but he did not get the point, suggested I may need glasses. So here are a couple screen shots to demonstrate the problem, without using display scaling. A firefox browser is on the screen, sized to 1024×768 on both screen shots so you have a point of reference on both images.

1920 x 1080 No Display Scale

1920 x 1080 No Display Scale

This resolution may actually be a little small on a 13 inch screen for some. I still have 20/20 vision despite my age and decades of 8+ hours of computer usage every day so the resolution seems about right for me.

3200 x 1800 No Display Scale

Notice how much smaller the firefox window is in the second screen shot – and look at the desktop icons and the icons in the toolbar on the bottom. On a small 13 inch screen they are impossible to see, even with 20/20 vision.

There is nothing wrong with the product itself, the issue is putting a high resolution display in such a small screen. If you take a high resolution display and need to zoom everything up 200% so you can see it, why bother with the high resolution display?

I paid a premium to get this model laptop, but in the end it is not worth it. A native screen resolution of anything higher than 1920 x 1080 in a 13 in inch display just seems wasted.

MySQL replication woes – error 1236

I have been using MySQL replication for years, specifically multi-master where you can do both read and write to either server.

mySQL_Dock_Icon_by_Presto_XTo make my situation even worse, I do it across the global Internet. The added latency and/or outages between sites can cause ‘issues’ that you might not see if your servers were connected locally with high speed Ethernet.

This weekend one of my servers lost a hard drive. One of the NEW WD red hard drives, I just wrote about a few weeks ago. The drives have only been in production for less than two months and one has failed already.

Using Linux software RAID, the data is OK and the machine is still humming along while we wait for a replacement drive to be installed.

One thing that did not survive the hard drive crash is the MySQL replication. The other server (the one where no hard disk crashed) actually started showing this error after the replication stopped:

Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.006259’ at 2608901, the last event read from ‘/var/log/mysql/mysql-bin.006259’ at 2608901, the last byte read from ‘/var/log/mysql/mysql-bin.006259′ at 2609152.’

They suggest out of disk in the message, but not the case here. The problem was probably from the server being restarted without being properly shut down (a big NO NO).

So if you get this error, or any other replication error that references a binlog and a location how can you find out what the problem is?

If you have been running MySQL replication for any length of time, you have seen times where the replication stops and you need to know why.

To view the transaction that is killing your server, head over to the master server and use the mysqlbinlog utility to view what is going on.

In my case I was greeted with this message.

# Warning: this binlog is either in use or was not closed properly.

Essentially it is saying the file is messed up, with no valid content in that file the replication is stuck. To get it started again you will need to update your slave(s) with a new instruction, telling them to move to the next binglog. I advanced to the next binlog like this:

STOP SLAVE;
CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.006260';
CHANGE MASTER TO MASTER_LOG_POS = 0;
START SLAVE;

With that the slave can start up again, reading from the next binlog which contains valid content.

Crisis diverted, all we need now is a new hard drive and the server should be a happy camper once more.

apt-config: /usr/local/lib/libz.so.1: no version information available (required by /usr/lib/libapt-pkg.so.4.10)

I get an email daily from cron on a few of my directadmin servers. No idea what it means or what that cron job is even supposed to be doing.

After searching around on the net there are a lot of people with the same issue.

Some of the suggested fixes do not sound very nice, like making some hacks to custombuild and then rebuilding every piece of software on the system. That simply sounds dangerous!

I found this blog post where the author deletes multiple copies that exist on his system and then create link to the correct version.

That sounded not so dangerous so I gave it a try and it worked. In my case I only had /usr/local/lib/libz.so.1.2.3 not /usr/local/lib/libz.so.1.2.3.4 so my solution looked more like this.

rm /usr/local/lib/libz.so.1
ln -s /usr/lib/libz.so.1.2.3 /usr/local/lib/libz.so.1

Compiling DNSDB Exim on Debian Wheezy Directadmin

About 1.5 years ago I did a posting with easy instructions to compile in a custom build of Exim on Directadmin.

Since then I upgraded to Debian Wheezy and Exim has been upgraded to 4.84. The step-by-step instructions don’t work anymore as a result.

Here is an updated version of those instructions.

First, ensure you have the required dependencies.

apt-get install libdb5.1-dev libperl-dev libsasl2-dev

Change all occurrences of 4.84 to the version you want to use. The sample pulls Exim from some mirror, you might need to lookup a working URL to a mirror if this one goes down.

wget http://exim.mirrorcatalogs.com/exim/exim4/exim-4.84.tar.gz
tar xvzf exim-4.84.tar.gz
cd exim-4.84/Local
wget http://www.directadmin.com/Makefile
perl -pi -e 's/^EXTRALIBS/#EXTRALIBS/' Makefile
perl -pi -e 's/HAVE_ICONV=yes/HAVE_ICONV=no/' Makefile
perl -pi -e 's/^#LOOKUP_DNSDB=yes/LOOKUP_DNSDB=yes/' Makefile
cd ..
make
make install

The above commands will download the unmodified source for exim, extract it, download a makefile from the directadmin servers, use a perl command to adjust the makefile, compile and install the fresh exim build.

The file that is created is /usr/sbin/exim-4.84-1, so we must change the name and overwrite the existing exim file.

/etc/init.d/exim stop
cp -f /usr/sbin/exim-4.84-1 /usr/sbin/exim
chmod 4755 /usr/sbin/exim
/etc/init.d/exim start

To verify you have a working Exim with DNSDB compiled in do the following:

exim -bV

Exim 4.84

Testing NO RAID vs RAID 5 vs RAID 10

WD-Red-1TBI am setting up a new server, with four 2.5 inch hot swap 1tb WD Red disks – model WD10JFCX. Write performance is key in this setup as 90% of the workload is putting data to disk.

SSD drives would be nice, but not cost effective to get the space needed. I suspect the best option is going to be RAID 10, but lets run a few tests first.

No RAID

Testing with a single drive gives us a max. write speed of 74 MB/Sec and maximum read speed of 113 MB/Sec which is not very impressive.

Single Drive

RAID 5

Next up, a four disk RAID 5 setup – the performance is BAD! You do gain quite a bit of read performance maxing out at just under 300 MB/Sec but the write performance has taken a major hit vs. a single drive coming in at only 31 MB/Sec.

Raid 5

RAID 10

Now lets check out RAID 10, which should be the fastest configuration out of the bunch. The downside is you need at least four hard drives and you loose 50% of your capacity to redundancy.

As expected the performance is about double what we get from a single drive. Maximum write comes in at 138 MB/sec and maximum read is 178 MB/sec. The read did not increase by double, but the performance did improve for sure over a single drive.

If your workflow is mostly read the RAID 5 configuration actually performs better. I suspect the more drives you have in your RAID 5 configuration the faster it would probably get since you have more spinning heads.

raid-10

For our write intensive workload we will be going with the RAID 10, which is very slow vs. SSD but the cost of SSD would be over $1,000 for the same space. These 1tb WD Red drives we are using currently cost about $75 each. By using RAID 10 we get the fastest write performance and quite a bit of redundancy.

WD RE4 series benchmark.

So I was curious after doing that benchmark test on my SSD’s in RAID 0.

I ran the benchmark on a server that has a 1TB WD RE4 series disk. They claim 128 MB/s sustained from the disk.

My benchmark of the drive confirms their claim.

WD RE4 Benchmark

The drive maxed out at about 127 MB/s write and 138 MB/s on the read. I guess that is a fast drive, but is left in the dust when put up against an SSD and blown away totally against SSD in RAID 0.

Single Drive vs Raid 0

Every computer nerd knows RAID 0 is quick. My most recent computer (now about 1 year old) I put in three SSD drives operating in RAID 0 – I wanted quick.

I had an extra drive come out of another machine so I stuck it in there, but as a stand alone drive where I just put temporary files etc.

I was doing some reading today about different raid levels and performance. If you need redundancy RAID 10 seems to be the fastest option so that is what I will be using in my next batch of servers.

I never bothered to benchmark them before, I just knew that RAID 0 was faster. Is there an actual difference? Lets find out.

Performance of single SSD

The single SSD maxes out with a write speed of 138MB/sec with a read speed of 276MB/sec. That is quite snappy. I’ve seen laptop hard drives that can only do in the 12MB/sec, those laptops are painful to use.

Performance of three SSD in RAID 0

RAID 0 with three SSD disks. A max write speed of 1205MB/sec and a max read speed of 1552MB/sec – incredibly fast!

Interesting how the write speed closes the gap when you have multiple disks. Bottom line, running disks in RAID 0 does make a considerable difference! Even more interesting is how the performance increased by more than three times!

I used the free benchmark software from ATO Tech to do the testing.