I can't do that, because tmpfs doesn't support O_DIRECT.
PURPOSE
======
Create a small to medium area for fast temporary storage with the ability to grow "onto" slower media. In my case, it's for MySQL temporary tables.
REQUIREMENTS
============
* Some RAM to spare
* 2*160G SSDs (or whatever you can get your hands on, or even regular rotating drives).
USAGE
=====
First we'll create a 40G file on tmpfs and associate a loop device to that file:
1. Create the ramdisk-backed loopback file:
# dd if=/dev/zero of=/dev/shm/ramdisk bs=1M count=40000
2. Create the loop device:
# losetup -f /dev/shm/ramdisk
3. (Optional) Create a logical volume (this is only to get disk statistics from iostat and /proc/partitions):
# gdisk -l /dev/loop0
Disk /dev/loop0: 83886080 sectors, 40.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 0E9B6683-EB36-4EAF-8A54-2C04B3486E7E
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 83886046
Partitions will be aligned on 2048-sector boundaries
mysql> select 83886046-33;
+-------------+
| 83886046-33 |
+-------------+
| 83886013 |
+-------------+
1 row in set (0.00 sec)
# echo "0 83886013 linear /dev/loop0 34" | dmsetup -u $(uuidgen) create ramdisk
Then we'll need to configure the SSD:
1. Find the sector info from the SSD:
# gdisk -l /dev/sdc
Disk /dev/sdc: 311427072 sectors, 148.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): B9A305D6-4536-42DB-A0F9-8861168F4061
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 311427038
2. Use the numbers above, hash out 80% of the drive:
mysql> select ceil(311427038-33*0.8);
+------------------------+
| ceil(311427038-33*0.8) |
+------------------------+
| 311427012 |
+------------------------+
1 row in set (0.00 sec)
echo "0 311427012 linear /dev/sdc 34" | dmsetup -u $(uuidgen) create ssd
Result:
ls -l /dev/mapper/
total 0
crw------- 1 root root 10, 63 Feb 25 21:48 control
brw-rw---- 1 root disk 253, 1 Apr 1 09:03 ramdisk
brw-rw---- 1 root disk 253, 2 Apr 1 09:03 ssd
Next: Create a linear RAID device on these drives, with the ramdisk in the "bottom":
mdadm --create /dev/md0 --level=linear --raid-devices=2 --name rambacked /dev/mapper/ramdisk /dev/mapper/ssd
Remember I said the logical volumes were optional? You might as well have done this:
mdadm --create /dev/md0 --level=linear --raid-devices=2 --name rambacked /dev/loop0 /dev/sdc1
Remember I said the logical volumes were optional? You might as well have done this:
mdadm --create /dev/md0 --level=linear --raid-devices=2 --name rambacked /dev/loop0 /dev/sdc1
Create a filesystem:
mkfs.xfs -f -l lazy-count=1 -L rambacked /dev/md0
Mount it:
mkdir /rambacked
mount -o rw,noauto,noatime,nodiratime,logbufs=8,nobarrier /dev/md0 /rambacked
Now, write 100G to it, first 40G will be blazingly fast:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 143745408 22648 53620368 0 0 0 7 0 0 0 0 100 0 0
0 0 0 143745792 22648 53620380 0 0 0 0 1028 620 0 0 100 0 0
0 0 0 143745904 22648 53620380 0 0 0 0 1007 514 0 0 100 0 0
0 0 0 143745680 22648 53620380 0 0 0 0 1607 540 0 0 100 0 0
0 0 0 143745680 22648 53620380 0 0 0 0 1096 530 0 0 100 0 0
0 0 0 143746288 22656 53620372 0 0 0 68 1016 585 0 0 100 0 0
0 0 0 143746288 22656 53620372 0 0 0 0 1008 525 0 0 100 0 0
0 0 0 143746400 22656 53620380 0 0 0 0 1033 541 0 0 100 0 0
1 1 0 143746272 22656 53620380 0 0 12 909984 1018 5932 0 1 98 1 0 <- starting dd write.
2 0 0 143746528 22656 53620392 0 0 0 2995456 1027 18285 0 4 92 4 0
1 1 0 143746528 22664 53620384 0 0 0 2945324 1013 17996 0 4 92 3 0
2 0 0 143746656 22664 53620392 0 0 0 2930912 1028 17906 0 4 92 3 0
1 0 0 143746528 22664 53620392 0 0 12 2932992 1006 17920 0 4 92 3 0
1 1 0 143746208 22664 53620404 0 0 0 3002720 1608 18326 0 4 92 4 0 <- 3Gb/s direct io
1 0 0 143746096 22664 53620404 0 0 0 2982210 1088 18202 0 4 92 4 0
2 0 0 143745968 22672 53620396 0 0 0 2954616 1012 18076 0 5 92 3 0
1 1 0 143746080 22672 53620404 0 0 12 2890080 1005 17667 0 5 92 3 0
1 1 0 143746208 22672 53620416 0 0 0 2911584 1028 17802 0 4 92 4 0
2 0 0 143746336 22672 53620416 0 0 0 2915680 1008 17807 0 5 92 2 0
1 1 0 143746336 22672 53620416 0 0 0 2963936 1024 18124 0 5 92 3 0
1 0 0 143746336 22680 53620408 0 0 12 2957816 1012 18104 0 4 92 3 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 1 0 143746464 22680 53620428 0 0 0 2944480 1019 18004 0 4 92 3 0
2 0 0 143746448 22680 53620428 0 0 0 2908608 1013 17783 0 4 92 3 0
1 1 0 143746576 22680 53620428 0 0 12 2943520 1642 18019 0 5 92 3 0
1 1 0 143746576 22680 53620428 0 0 0 2912813 1077 17815 0 5 92 3 0
1 1 0 143746688 22688 53620432 0 0 0 2882072 1010 17744 0 5 93 3 0
0 1 0 143746688 22688 53620432 0 0 0 2149280 5857 12183 0 3 94 3 0 <- spills over on SSD.
0 1 0 143746816 22688 53620444 0 0 0 265665 3368 1591 0 0 96 4 0
0 1 0 143746816 22688 53620444 0 0 0 180416 2596 1223 0 0 96 4 0 <- now 100% on SSD device (see iostat output below)
0 1 0 143746944 22688 53620444 0 0 0 170176 2522 1215 0 0 96 4 0
0 1 0 143746816 22696 53620436 0 0 0 173240 2534 1228 0 0 96 4 0
0 1 0 143746816 22696 53620444 0 0 0 171208 2536 1224 0 0 96 4 0
0 1 0 143746816 22696 53620444 0 0 0 172236 2543 1208 0 0 96 4 0
iostat output:
Linux 2.6.18-274.el5 (staging-db-lv-2.staging.marinsw.net) 03/31/2012
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 1.69 0.00 0.41 0.00 88.68 214.24 0.01 23.96 0.60 0.03
md0 0.00 0.00 0.00 1.23 0.00 282.16 229.89 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.48 0.00 0.02 0.02 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.60 9592.20 4.80 2182758.40 227.54 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.60 9592.20 4.80 2182758.40 227.54 1.25 0.13 0.03 25.5 <- 100% in RAM
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.60 25900.60 4.79 5893825.15 227.55 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.60 25901.20 4.79 5893973.65 227.55 3.11 0.12 0.03 67.54 <- 26000 writes per second
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 1.20 25693.20 9.60 5846630.40 227.55 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 1.20 25692.60 9.60 5846481.60 227.54 3.78 0.15 0.03 74.26
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.60 25683.40 4.80 5844178.00 227.54 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.60 25683.40 4.80 5844178.00 227.54 3.20 0.12 0.03 66.98
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 2230.20 0.00 507516.80 227.57 4.15 1.86 0.34 76.84 <- spilling over on SSD.
md0 0.00 0.00 0.00 7463.40 0.00 1698205.00 227.54 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 5231.80 0.00 1190377.80 227.53 0.80 0.15 0.03 15.66
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 1506.60 0.00 342835.20 227.56 5.13 3.41 0.65 98.64 <- almost exclusively on SSD.
md0 0.00 0.00 0.00 1507.80 0.00 342842.00 227.38 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 1.20 0.00 6.80 5.67 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.00 1508.00 0.00 343145.60 227.55 5.20 3.45 0.65 98.56 <- exclusively SSD.
md0 0.00 0.00 0.00 1508.40 0.00 343244.80 227.56 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 0.00 0.00 0.60 1506.00 4.80 342723.20 227.48 5.12 3.40 0.65 98.34 <- 1600 writes per second.
md0 0.00 0.00 0.60 1505.60 4.80 342432.20 227.35 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.80 0.00 6.60 8.25 0.00 0.00 0.00 0.00
Useful? Maybe... if you have a use for a partition that's fast most of the time (as long as you don't go over a certain storage limit) but with the ability to spillover onto slower storage, this is it.
Inga kommentarer:
Skicka en kommentar