1 .\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode
3 .\" Copyright (c) 1997, 1998
4 .\" Nan Yang Computer Services Limited. All rights reserved.
6 .\" This software is distributed under the so-called ``Berkeley
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
17 .\" 3. All advertising materials mentioning features or use of this software
18 .\" must display the following acknowledgement:
19 .\" This product includes software developed by Nan Yang Computer
21 .\" 4. Neither the name of the Company nor the names of its contributors
22 .\" may be used to endorse or promote products derived from this software
23 .\" without specific prior written permission.
25 .\" This software is provided ``as is'', and any express or implied
26 .\" warranties, including, but not limited to, the implied warranties of
27 .\" merchantability and fitness for a particular purpose are disclaimed.
28 .\" In no event shall the company or contributors be liable for any
29 .\" direct, indirect, incidental, special, exemplary, or consequential
30 .\" damages (including, but not limited to, procurement of substitute
31 .\" goods or services; loss of use, data, or profits; or business
32 .\" interruption) however caused and on any theory of liability, whether
33 .\" in contract, strict liability, or tort (including negligence or
34 .\" otherwise) arising in any way out of the use of this software, even if
35 .\" advised of the possibility of such damage.
37 .\" $Id: vinum.8,v 1.48 2001/01/15 22:15:05 grog Exp $
38 .\" $FreeBSD: src/sbin/vinum/vinum.8,v 1.33.2.10 2002/12/29 16:35:38 schweikh Exp $
45 .Nd Logical Volume Manager control program
51 .Bl -tag -width indent
52 .It Ic attach Ar plex volume Op Cm rename
54 .Ic attach Ar subdisk plex
58 Attach a plex to a volume, or a subdisk to a plex.
60 .Ic checkparity Ar plex
64 Check the parity blocks of a RAID-4 or RAID-5 plex.
72 Create a concatenated volume from the specified drives.
78 Create a volume as described in
79 .Ar description-file .
81 Cause the volume manager to enter the kernel debugger.
89 Detach a plex or subdisk from the volume or plex to which it is attached.
90 .It Ic dumpconfig Op Ar drive ...
91 List the configuration information stored on the specified drives, or all drives
92 in the system if no drive names are specified.
98 List information about volume manager state.
106 Initialize the contents of a subdisk or all the subdisks of a plex to all zeros.
107 .It Ic label Ar volume
108 Create a volume label.
115 .Op Ar volume | plex | subdisk
117 List information about specified objects.
126 List information about drives.
135 List information about subdisks.
144 List information about plexes.
153 List information about volumes.
155 Remake the device nodes in
165 Create a mirrored volume from the specified drives.
171 Move the object(s) to the specified drive.
172 .It Ic printconfig Op Ar file
173 Write a copy of the current configuration to
178 program when running in interactive mode. Normally this would be done by
182 .It Ic read Ar disk ...
185 configuration from the specified disks.
188 .Op Ar drive | subdisk | plex | volume
191 Change the name of the specified object.
193 .\".It Ic replace Ar drive newdrive
194 .\"Move all the subdisks from the specified drive onto the new drive.
196 .Ic rebuildparity Ar plex Op Fl f
200 Rebuild the parity blocks of a RAID-4 or RAID-5 plex.
208 .Op Ar volume | plex | subdisk
210 Reset statistics counters for the specified objects, or for all objects if none
216 .Ar volume | plex | subdisk
222 configuration to disk after configuration failures.
228 .\".Ar volume | plex | subdisk | disk
230 .\"Set the state of the object to
232 .It Ic setdaemon Op Ar value
233 Set daemon configuration.
237 .Op Ar volume | plex | subdisk | drive
239 Set state without influencing other objects, for diagnostic purposes only.
241 Read configuration from all vinum drives.
247 .Ar volume | plex | subdisk
249 Allow the system to access the objects.
253 .Op Ar volume | plex | subdisk
255 Terminate access to the objects, or stop
257 if no parameters are specified.
265 Create a striped volume from the specified drives.
269 is a utility program to communicate with the
274 is designed either for interactive use, when started without command line
275 arguments, or to execute a single command if the command is supplied on the
276 command line. In interactive mode,
278 maintains a command line history.
281 commands may optionally be followed by an option. Any of the following options
282 may be specified with any command, but in some cases the options are ignored.
290 .Bl -tag -width indent
295 option overrides safety checks. Use with extreme care. This option is for
296 emergency use only. For example, the command
302 even if it is open. Any subsequent access to the volume will almost certainly
304 .It Fl i Ar millisecs
311 milliseconds between copying each block. This lowers the load on the system.
315 option to specify a volume name to the simplified configuration commands
323 option is used by the list commands to display information not
324 only about the specified objects, but also about subordinate objects. For
325 example, in conjunction with the
329 option will also show information about the plexes and subdisks belonging to the
335 option is used by the list commands to display statistical information. The
337 command also uses this option to specify that it should create striped plexes.
341 option specifies the transfer size for the
350 option can be used to request more detailed information.
355 option can be used to request more detailed information than the
364 to wait for completion of commands which normally run in the background, such as
367 .Sh COMMANDS IN DETAIL
369 commands perform the following functions:
371 .Bl -tag -width indent -compact
372 .It Ic attach Ar plex volume Op Cm rename
374 .Ic attach Ar subdisk plex
379 inserts the specified plex or subdisk in a volume or plex. In the case of a
380 subdisk, an offset in the plex may be specified. If it is not, the subdisk will
381 be attached at the first possible location. After attaching a plex to a
384 reintegrates the plex.
390 renames the object (and in the case of a plex, any subordinate subdisks) to fit
393 naming convention. To rename the object to any other name, use the
397 A number of considerations apply to attaching subdisks:
400 Subdisks can normally only be attached to concatenated plexes.
402 If a striped or RAID-5 plex is missing a subdisk (for example after drive
403 failure), it should be replaced by a subdisk of the same size only.
405 In order to add further subdisks to a striped or RAID-5 plex, use the
407 (force) option. This will corrupt the data in the plex.
408 .\"No other attachment of
409 .\"subdisks is currently allowed for striped and RAID-5 plexes.
411 For concatenated plexes, the
413 parameter specifies the offset in blocks from the beginning of the plex. For
414 striped and RAID-5 plexes, it specifies the offset of the first block of the
415 subdisk: in other words, the offset is the numerical position of the subdisk
416 multiplied by the stripe size. For example, in a plex with stripe size 271k,
417 the first subdisk will have offset 0, the second offset 271k, the third 542k,
418 etc. This calculation ignores parity blocks in RAID-5 plexes.
427 Check the parity blocks on the specified RAID-4 or RAID-5 plex. This operation
428 maintains a pointer in the plex, so it can be stopped and later restarted from
429 the same position if desired. In addition, this pointer is used by the
431 command, so rebuilding the parity blocks need only start at the location where
432 the first parity problem has been detected.
438 starts checking at the beginning of the plex. If the
442 prints a running progress report.
453 command provides a simplified alternative to the
455 command for creating volumes with a single concatenated plex. The largest
456 contiguous space available on each drive is used to create the subdisks for the
461 command creates an arbitrary name for the volume and its components. The name
462 is composed of the text
464 and a small integer, for example
466 You can override this with the
468 option, which assigns the name specified to the volume. The plexes and subdisks
469 are named after the volume in the default manner.
471 There is no choice of name for the drives. If the drives have already been
474 drives, the name remains. Otherwise the drives are given names starting with
477 and a small integer, for example
483 option can be used to specify that a previous name should be overwritten. The
485 is used to specify verbose output.
488 .Sx SIMPLIFIED CONFIGURATION
489 below for some examples of this
498 is used to create any object. In view of the relatively complicated
499 relationship and the potential dangers involved in creating a
501 object, there is no interactive interface to this function. If you do not
504 starts an editor on a temporary file. If the environment variable
508 starts this editor. If not, it defaults to
511 .Sx CONFIGURATION FILE
512 below for more information on the format of
517 function is additive: if you run it multiple times, you will create multiple
518 copies of all unnamed objects.
522 command will not change the names of existing
524 drives, in order to avoid accidentally erasing them. The correct way to dispose
527 drives is to reset the configuration with the
529 command. In some cases, however, it may be necessary to create new data on
531 drives which can no longer be started. In this case, use the
537 without any arguments, is used to enter the remote kernel debugger. It is only
542 option. This option will stop the execution of the operating system until the
543 kernel debugger is exited. If remote debugging is set and there is no remote
544 connection for a kernel debugger, it will be necessary to reset the system and
545 reboot in order to leave the debugger.
547 .It Ic debug Ar flags
548 Set a bit mask of internal debugging flags. These will change without warning
549 as the product matures; to be certain, read the header file
550 .Aq Pa sys/dev/vinumvar.h .
551 The bit mask is composed of the following values:
552 .Bl -tag -width indent
553 .It Dv DEBUG_ADDRESSES Pq No 1
554 Show buffer information during requests
555 .\".It Dv DEBUG_NUMOUTPUT Pq No 2
557 .\".Va vp->v_numoutput .
558 .It Dv DEBUG_RESID Pq No 4
561 .It Dv DEBUG_LASTREQS Pq No 8
562 Keep a circular buffer of last requests.
563 .It Dv DEBUG_REVIVECONFLICT Pq No 16
564 Print info about revive conflicts.
565 .It Dv DEBUG_EOFINFO Pq No 32
566 Print information about internal state when returning an
569 .It Dv DEBUG_MEMFREE Pq No 64
570 Maintain a circular list of the last memory areas freed by the memory allocator.
571 .It Dv DEBUG_REMOTEGDB Pq No 256
577 .It Dv DEBUG_WARNINGS Pq No 512
578 Print some warnings about minor problems in the implementation.
581 .It Ic detach Oo Fl f Oc Ar plex
582 .It Ic detach Oo Fl f Oc Ar subdisk
584 removes the specified plex or subdisk from the volume or plex to which it is
585 attached. If removing the object would impair the data integrity of the volume,
586 the operation will fail unless the
588 option is specified. If the object is named after the object above it (for
593 the name will be changed
594 by prepending the text
597 .Li ex-vol1.p7.s0 ) .
598 If necessary, the name will be truncated in the
602 does not reduce the number of subdisks in a striped or RAID-5 plex. Instead,
603 the subdisk is marked absent, and can later be replaced with the
607 .It Ic dumpconfig Op Ar drive ...
610 shows the configuration information stored on the specified drives. If no drive
613 searches all drives on the system for Vinum partitions and dumps the
614 information. If configuration updates are disabled, it is possible that this
615 information is not the same as the information returned by the
617 command. This command is used primarily for maintenance and debugging.
621 displays information about
623 memory usage. This is intended primarily for debugging. With the
625 option, it will give detailed information about the memory areas in use.
631 displays information about the last up to 64 I/O requests handled by the
633 driver. This information is only collected if debug flag 8 is set. The format
638 Total of 38 blocks malloced, total memory: 16460
639 Maximum allocs: 56, malloc table at 0xf0f72dbc
641 Time Event Buf Dev Offset Bytes SD SDoff Doffset Goffset
643 14:40:00.637758 1VS Write 0xf2361f40 91.3 0x10 16384
644 14:40:00.639280 2LR Write 0xf2361f40 91.3 0x10 16384
645 14:40:00.639294 3RQ Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0
646 14:40:00.639455 3RQ Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0
647 14:40:00.639529 3RQ Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0
648 14:40:00.652978 4DN Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0
649 14:40:00.667040 4DN Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0
650 14:40:00.668556 4DN Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0
651 14:40:00.669777 6RP Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0
652 14:40:00.685547 4DN Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0
653 11:11:14.975184 Lock 0xc2374210 2 0x1f8001
654 11:11:15.018400 7VS Write 0xc2374210 0x7c0 32768 10
655 11:11:15.018456 8LR Write 0xc2374210 13.39 0xcc0c9 32768
656 11:11:15.046229 Unlock 0xc2374210 2 0x1f8001
661 field always contains the address of the user buffer header. This can be used
662 to identify the requests associated with a user request, though this is not 100%
663 reliable: theoretically two requests in sequence could use the same buffer
664 header, though this is not common. The beginning of a request can be identified
669 The first example above shows the requests involved in a user request. The
670 second is a subdisk I/O request with locking.
674 field contains information related to the sequence of events in the request
679 indicates the approximate sequence of events, and the two-letter abbreviation is
680 a mnemonic for the location:
681 .Bl -tag -width Lockwait
683 (vinumstrategy) shows information about the user request on entry to
685 The device number is the
687 device, and offset and length are the user parameters. This is always the
688 beginning of a request sequence.
690 (launch_requests) shows the user request just prior to launching the low-level
692 requests in the function
693 .Fn launch_requests .
694 The parameters should be the same as in the
699 In the following requests,
701 is the device number of the associated disk partition,
703 is the offset from the beginning of the partition,
705 is the subdisk index in
708 is the offset from the beginning of the subdisk,
710 is the offset of the associated data request, and
712 is the offset of the associated group request, where applicable.
713 .Bl -tag -width Lockwait
715 (request) shows one of possibly several low-level
717 requests which are launched to satisfy the high-level request. This information
719 .Fn launch_requests .
721 (done) is called from
723 showing the completion of a request. This completion should match a request
724 launched either at stage
727 .Fn launch_requests ,
729 .Fn complete_raid5_write
735 (RAID-5 data) is called from
736 .Fn complete_raid5_write
737 and represents the data written to a RAID-5 data stripe after calculating
740 (RAID-5 parity) is called from
741 .Fn complete_raid5_write
742 and represents the data written to a RAID-5 parity stripe after calculating
745 shows a subdisk I/O request. These requests are usually internal to
747 for operations like initialization or rebuilding plexes.
749 shows the low-level operation generated for a subdisk I/O request.
751 specifies that the process is waiting for a range lock. The parameters are the
752 buffer header associated with the request, the plex number and the block number.
753 For internal reasons the block number is one higher than the address of the
754 beginning of the stripe.
756 specifies that a range lock has been obtained. The parameters are the same as
759 specifies that a range lock has been released. The parameters are the same as
771 initializes a subdisk by writing zeroes to it. You can initialize all subdisks
772 in a plex by specifying the plex name. This is the only way to ensure
773 consistent data in a plex. You must perform this initialization before using a
774 RAID-5 plex. It is also recommended for other new plexes.
776 initializes all subdisks of a plex in parallel. Since this operation can take a
777 long time, it is normally performed in the background. If you want to wait for
778 completion of the command, use the
784 option if you want to write blocks of a different size from the default value of
787 prints a console message when the initialization is complete.
789 .It Ic label Ar volume
794 style volume label on a volume. It is a simple alternative to an appropriate
797 This is needed because some
799 commands still read the disk to find the label instead of using the correct
803 maintains a volume label separately from the volume data, so this command is not
806 This command is deprecated.
812 .Op Ar volume | plex | subdisk
818 .Op Ar volume | plex | subdisk
853 is used to show information about the specified object. If the argument is
854 omitted, information is shown about all objects known to
858 command is a synonym for
863 option relates to volumes and plexes: if specified, it recursively lists
864 information for the subdisks and (for a volume) plexes subordinate to the
865 objects. The commands
869 list only volumes, plexes, subdisks and drives respectively. This is
870 particularly useful when used without parameters.
876 to output device statistics, the
878 (verbose) option causes some additional information to be output, and the
880 causes considerable additional information to be output.
885 command removes the directory
887 and recreates it with device nodes
888 which reflect the current configuration. This command is not intended for
889 general use, and is provided for emergency use only.
901 command provides a simplified alternative to the
903 command for creating mirrored volumes. Without any options, it creates a RAID-1
904 (mirrored) volume with two concatenated plexes. The largest contiguous space
905 available on each drive is used to create the subdisks for the plexes. The
906 first plex is built from the odd-numbered drives in the list, and the second
907 plex is built from the even-numbered drives. If the drives are of different
908 sizes, the plexes will be of different sizes.
914 builds striped plexes with a stripe size of 256 kB. The size of the subdisks in
915 each plex is the size of the smallest contiguous storage available on any of the
916 drives which form the plex. Again, the plexes may differ in size.
920 command creates an arbitrary name for the volume and its components. The name
921 is composed of the text
923 and a small integer, for example
925 You can override this with the
927 option, which assigns the name specified to the volume. The plexes and subdisks
928 are named after the volume in the default manner.
930 There is no choice of name for the drives. If the drives have already been
933 drives, the name remains. Otherwise the drives are given names starting with
936 and a small integer, for example
942 option can be used to specify that a previous name should be overwritten. The
944 is used to specify verbose output.
947 .Sx SIMPLIFIED CONFIGURATION
948 below for some examples of this
951 .It Ic mv Fl f Ar drive object ...
952 .It Ic move Fl f Ar drive object ...
953 Move all the subdisks from the specified objects onto the new drive. The
954 objects may be subdisks, drives or plexes. When drives or plexes are specified,
955 all subdisks associated with the object are moved.
959 option is required for this function, since it currently does not preserve the
960 data in the subdisk. This functionality will be added at a later date. In this
961 form, however, it is suited to recovering a failed disk drive.
963 .It Ic printconfig Op Ar file
964 Write a copy of the current configuration to
966 in a format that can be used to recreate the
968 configuration. Unlike the configuration saved on disk, it includes definitions
969 of the drives. If you omit
978 program when running in interactive mode. Normally this would be done by
983 .It Ic read Ar disk ...
986 command scans the specified disks for
988 partitions containing previously created configuration information. It reads
989 the configuration in order from the most recently updated to least recently
990 updated configuration.
992 maintains an up-to-date copy of all configuration information on each disk
993 partition. You must specify all of the slices in a configuration as the
994 parameter to this command.
998 command is intended to selectively load a
1000 configuration on a system which has other
1002 partitions. If you want to start all partitions on the system, it is easier to
1009 encounters any errors during this command, it will turn off automatic
1010 configuration update to avoid corrupting the copies on disk. This will also
1011 happen if the configuration on disk indicates a configuration error (for
1012 example, subdisks which do not have a valid space specification). You can turn
1013 the updates on again with the
1017 commands. Reset bit 2 (numerical value 4) of the daemon options mask to
1018 re-enable configuration saves.
1027 Rebuild the parity blocks on the specified RAID-4 or RAID-5 plex. This
1028 operation maintains a pointer in the plex, so it can be stopped and later
1029 restarted from the same position if desired. In addition, this pointer is used
1032 command, so rebuilding the parity blocks need only start at the location where
1033 the first parity problem has been detected.
1039 starts rebuilding at the beginning of the plex. If the
1043 first checks the existing parity blocks prints information about those found to
1044 be incorrect before rebuilding. If the
1048 prints a running progress report.
1053 .Op Ar drive | subdisk | plex | volume
1056 Change the name of the specified object. If the
1058 option is specified, subordinate objects will be named by the default rules:
1059 plex names will be formed by appending
1061 to the volume name, and
1062 subdisk names will be formed by appending
1068 .\".Ar drive newdrive
1069 .\"Move all the subdisks from the specified drive onto the new drive. This will
1070 .\"attempt to recover those subdisks that can be recovered, and create the others
1071 .\"from scratch. If the new drive lacks the space for this operation, as many
1072 .\"subdisks as possible will be fitted onto the drive, and the rest will be left on
1073 .\"the original drive.
1078 command completely obliterates the
1080 configuration on a system. Use this command only when you want to completely
1081 delete the configuration.
1083 will ask for confirmation; you must type in the words
1086 .Bd -unfilled -offset indent
1087 .No # Nm Ic resetconfig
1089 WARNING! This command will completely wipe out your vinum
1090 configuration. All data will be lost. If you really want
1091 to do this, enter the text
1094 .No "Enter text ->" Sy "NO FUTURE"
1095 Vinum configuration obliterated
1098 As the message suggests, this is a last-ditch command. Don't use it unless you
1099 have an existing configuration which you never want to see again.
1104 .Op Ar volume | plex | subdisk
1107 maintains a number of statistical counters for each object. See the header file
1108 .Aq Pa sys/dev/vinumvar.h
1109 for more information.
1110 .\" XXX put it in here when it's finalized
1113 command to reset these counters. In conjunction with the
1117 also resets the counters of subordinate objects.
1123 .Ar volume | plex | subdisk
1126 removes an object from the
1128 configuration. Once an object has been removed, there is no way to recover it.
1131 performs a large amount of consistency checking before removing an object. The
1135 to omit this checking and remove the object anyway. Use this option with great
1136 care: it can result in total loss of data on a volume.
1140 refuses to remove a volume or plex if it has subordinate plexes or subdisks
1141 respectively. You can tell
1143 to remove the object anyway by using the
1145 option, or you can cause
1147 to remove the subordinate objects as well by using the
1149 (recursive) option. If you remove a volume with the
1151 option, it will remove both the plexes and the subdisks which belong to the
1155 Save the current configuration to disk. Normally this is not necessary, since
1157 automatically saves any change in configuration. If an error occurs on startup,
1158 updates will be disabled. When you reenable them with the
1162 does not automatically save the configuration to disk. Use this command to save
1169 .\".Ar volume | plex | subdisk | disk
1172 .\"sets the state of the specified object to one of the valid states (see
1173 .\".Sx OBJECT STATES
1176 .\"performs a large amount of consistency checking before making the change. The
1180 .\"to omit this checking and perform the change anyway. Use this option with great
1181 .\"care: it can result in total loss of data on a volume.
1183 .It Ic setdaemon Op Ar value
1185 sets a variable bitmask for the
1187 daemon. This command is temporary and will be replaced. Currently, the bit mask
1188 may contain the bits 1 (log every action to syslog) and 4 (don't update
1189 configuration). Option bit 4 can be useful for error recovery.
1192 .Ic setstate Ar state
1193 .Op Ar volume | plex | subdisk | drive
1196 sets the state of the specified objects to the specified state. This bypasses
1197 the usual consistency mechanism of
1199 and should be used only for recovery purposes. It is possible to crash the
1200 system by incorrect use of this command.
1204 .Op Fl i Ar interval
1207 .Op Ar plex | subdisk
1210 starts (brings into to the
1216 If no object names are specified,
1218 scans the disks known to the system for
1220 drives and then reads in the configuration as described under the
1224 drive contains a header with all information about the data stored on the drive,
1225 including the names of the other drives which are required in order to represent
1230 encounters any errors during this command, it will turn off automatic
1231 configuration update to avoid corrupting the copies on disk. This will also
1232 happen if the configuration on disk indicates a configuration error (for
1233 example, subdisks which do not have a valid space specification). You can turn
1234 the updates on again with the
1238 command. Reset bit 4 of the daemon options mask to re-enable configuration
1241 If object names are specified,
1243 starts them. Normally this operation is only of use with subdisks. The action
1244 depends on the current state of the object:
1247 If the object is already in the
1253 If the object is a subdisk in the
1263 If the object is a subdisk in the
1265 state, the change depends on the subdisk. If it is part of a plex which is part
1266 of a volume which contains other plexes,
1268 places the subdisk in the
1270 state and attempts to copy the data from the volume. When the operation
1271 completes, the subdisk is set into the
1273 state. If it is part of a plex which is part of a volume which contains no
1274 other plexes, or if it is not part of a plex,
1280 If the object is a subdisk in the
1284 continues the revive
1285 operation offline. When the operation completes, the subdisk is set into the
1290 When a subdisk comes into the
1294 automatically checks the state of any plex and volume to which it may belong and
1295 changes their state where appropriate.
1297 If the object is a plex,
1299 checks the state of the subordinate subdisks (and plexes in the case of a
1300 volume) and starts any subdisks which can be started.
1302 To start a plex in a multi-plex volume, the data must be copied from another
1303 plex in the volume. Since this frequently takes a long time, it is normally
1304 done in the background. If you want to wait for this operation to complete (for
1305 example, if you are performing this operation in a script), use the
1309 Copying data doesn't just take a long time, it can also place a significant load
1310 on the system. You can specify the transfer size in bytes or sectors with the
1312 option, and an interval (in milliseconds) to wait between copying each block with
1315 option. Both of these options lessen the load on the system.
1320 .Op Ar volume | plex | subdisk
1322 If no parameters are specified,
1328 This can only be done if no objects are active. In particular, the
1330 option does not override this requirement. Normally, the
1332 command writes the current configuration back to the drives before terminating.
1333 This will not be possible if configuration updates are disabled, so
1335 will not stop if configuration updates are disabled. You can override this by
1342 command can only work if
1344 has been loaded as a KLD, since it is not possible to unload a statically
1349 is statically configured.
1351 If object names are specified,
1353 disables access to the objects. If the objects have subordinate objects, they
1354 subordinate objects must either already be inactive (stopped or in error), or
1359 options must be specified. This command does not remove the objects from the
1360 configuration. They can be accessed again after a
1366 does not stop active objects. For example, you cannot stop a plex which is
1367 attached to an active volume, and you cannot stop a volume which is open. The
1371 to omit this checking and remove the object anyway. Use this option with great
1372 care and understanding: used incorrectly, it can result in serious data
1384 command provides a simplified alternative to the
1386 command for creating volumes with a single striped plex. The size of the
1387 subdisks is the size of the largest contiguous space available on all the
1388 specified drives. The stripe size is fixed at 256 kB.
1392 command creates an arbitrary name for the volume and its components. The name
1393 is composed of the text
1395 and a small integer, for example
1397 You can override this with the
1399 option, which assigns the name specified to the volume. The plexes and subdisks
1400 are named after the volume in the default manner.
1402 There is no choice of name for the drives. If the drives have already been
1405 drives, the name remains. Otherwise the drives are given names starting with
1408 and a small integer, for example
1409 .Dq Li vinumdrive7 .
1414 option can be used to specify that a previous name should be overwritten. The
1416 is used to specify verbose output.
1419 .Sx SIMPLIFIED CONFIGURATION
1420 below for some examples of this
1423 .Sh SIMPLIFIED CONFIGURATION
1424 This section describes a simplified interface to
1426 configuration using the
1431 commands. These commands create convenient configurations for some more normal
1432 situations, but they are not as flexible as the
1436 See above for the description of the commands. Here are some examples, all
1437 performed with the same collection of disks. Note that the first drive,
1439 is smaller than the others. This has an effect on the sizes chosen for each
1442 The following examples all use the
1444 option to show the commands passed to the system, and also to list the structure
1445 of the volume. Without the
1447 option, these commands produce no output.
1448 .Ss Volume with a single concatenated plex
1449 Use a volume with a single concatenated plex for the largest possible storage
1450 without resilience to drive failures:
1452 vinum -> concat -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1454 plex name vinum0.p0 org concat
1455 drive vinumdrive0 device /dev/da1h
1456 sd name vinum0.p0.s0 drive vinumdrive0 size 0
1457 drive vinumdrive1 device /dev/da2h
1458 sd name vinum0.p0.s1 drive vinumdrive1 size 0
1459 drive vinumdrive2 device /dev/da3h
1460 sd name vinum0.p0.s2 drive vinumdrive2 size 0
1461 drive vinumdrive3 device /dev/da4h
1462 sd name vinum0.p0.s3 drive vinumdrive3 size 0
1463 V vinum0 State: up Plexes: 1 Size: 2134 MB
1464 P vinum0.p0 C State: up Subdisks: 4 Size: 2134 MB
1465 S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB
1466 S vinum0.p0.s1 State: up PO: 414 MB Size: 573 MB
1467 S vinum0.p0.s2 State: up PO: 988 MB Size: 573 MB
1468 S vinum0.p0.s3 State: up PO: 1561 MB Size: 573 MB
1471 In this case, the complete space on all four disks was used, giving a volume
1473 .Ss Volume with a single striped plex
1474 A volume with a single striped plex may give better performance than a
1475 concatenated plex, but restrictions on striped plexes can mean that the volume
1476 is smaller. It will also not be resilient to a drive failure:
1478 vinum -> stripe -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1479 drive vinumdrive0 device /dev/da1h
1480 drive vinumdrive1 device /dev/da2h
1481 drive vinumdrive2 device /dev/da3h
1482 drive vinumdrive3 device /dev/da4h
1484 plex name vinum0.p0 org striped 256k
1485 sd name vinum0.p0.s0 drive vinumdrive0 size 849825b
1486 sd name vinum0.p0.s1 drive vinumdrive1 size 849825b
1487 sd name vinum0.p0.s2 drive vinumdrive2 size 849825b
1488 sd name vinum0.p0.s3 drive vinumdrive3 size 849825b
1489 V vinum0 State: up Plexes: 1 Size: 1659 MB
1490 P vinum0.p0 S State: up Subdisks: 4 Size: 1659 MB
1491 S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB
1492 S vinum0.p0.s1 State: up PO: 256 kB Size: 414 MB
1493 S vinum0.p0.s2 State: up PO: 512 kB Size: 414 MB
1494 S vinum0.p0.s3 State: up PO: 768 kB Size: 414 MB
1497 In this case, the size of the subdisks has been limited to the smallest
1498 available disk, so the resulting volume is only 1659 MB in size.
1499 .Ss Mirrored volume with two concatenated plexes
1500 For more reliability, use a mirrored, concatenated volume:
1502 vinum -> mirror -v -n mirror /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1503 drive vinumdrive0 device /dev/da1h
1504 drive vinumdrive1 device /dev/da2h
1505 drive vinumdrive2 device /dev/da3h
1506 drive vinumdrive3 device /dev/da4h
1507 volume mirror setupstate
1508 plex name mirror.p0 org concat
1509 sd name mirror.p0.s0 drive vinumdrive0 size 0b
1510 sd name mirror.p0.s1 drive vinumdrive2 size 0b
1511 plex name mirror.p1 org concat
1512 sd name mirror.p1.s0 drive vinumdrive1 size 0b
1513 sd name mirror.p1.s1 drive vinumdrive3 size 0b
1514 V mirror State: up Plexes: 2 Size: 1146 MB
1515 P mirror.p0 C State: up Subdisks: 2 Size: 988 MB
1516 P mirror.p1 C State: up Subdisks: 2 Size: 1146 MB
1517 S mirror.p0.s0 State: up PO: 0 B Size: 414 MB
1518 S mirror.p0.s1 State: up PO: 414 MB Size: 573 MB
1519 S mirror.p1.s0 State: up PO: 0 B Size: 573 MB
1520 S mirror.p1.s1 State: up PO: 573 MB Size: 573 MB
1523 This example specifies the name of the volume,
1525 Since one drive is smaller than the others, the two plexes are of different
1526 size, and the last 158 MB of the volume is non-resilient. To ensure complete
1527 reliability in such a situation, use the
1529 command to create a volume with 988 MB.
1530 .Ss Mirrored volume with two striped plexes
1531 Alternatively, use the
1533 option to create a mirrored volume with two striped plexes:
1535 vinum -> mirror -v -n raid10 -s /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1536 drive vinumdrive0 device /dev/da1h
1537 drive vinumdrive1 device /dev/da2h
1538 drive vinumdrive2 device /dev/da3h
1539 drive vinumdrive3 device /dev/da4h
1540 volume raid10 setupstate
1541 plex name raid10.p0 org striped 256k
1542 sd name raid10.p0.s0 drive vinumdrive0 size 849825b
1543 sd name raid10.p0.s1 drive vinumdrive2 size 849825b
1544 plex name raid10.p1 org striped 256k
1545 sd name raid10.p1.s0 drive vinumdrive1 size 1173665b
1546 sd name raid10.p1.s1 drive vinumdrive3 size 1173665b
1547 V raid10 State: up Plexes: 2 Size: 1146 MB
1548 P raid10.p0 S State: up Subdisks: 2 Size: 829 MB
1549 P raid10.p1 S State: up Subdisks: 2 Size: 1146 MB
1550 S raid10.p0.s0 State: up PO: 0 B Size: 414 MB
1551 S raid10.p0.s1 State: up PO: 256 kB Size: 414 MB
1552 S raid10.p1.s0 State: up PO: 0 B Size: 573 MB
1553 S raid10.p1.s1 State: up PO: 256 kB Size: 573 MB
1556 In this case, the usable part of the volume is even smaller, since the first
1557 plex has shrunken to match the smallest drive.
1558 .Sh CONFIGURATION FILE
1560 requires that all parameters to the
1562 commands must be in a configuration file. Entries in the configuration file
1563 define volumes, plexes and subdisks, and may be in free format, except that each
1564 entry must be on a single line.
1566 Some configuration file parameters specify a size (lengths, stripe sizes).
1567 These values can be specified as bytes, or one of the following scale factors
1569 .Bl -tag -width indent
1571 specifies that the value is a number of sectors of 512 bytes.
1573 specifies that the value is a number of kilobytes (1024 bytes).
1575 specifies that the value is a number of megabytes (1048576 bytes).
1577 specifies that the value is a number of gigabytes (1073741824 bytes).
1579 is used for compatibility with
1581 It stands for blocks of 512 bytes.
1582 This abbreviation is confusing, since the word
1584 is used in different
1585 meanings, and its use is deprecated.
1588 For example, the value 16777216 bytes can also be written as
1594 The configuration file can contain the following entries:
1596 .It Ic drive Ar name devicename Op Ar options
1597 Define a drive. The options are:
1599 .It Cm device Ar devicename
1600 Specify the device on which the drive resides.
1602 must be the name of a disk partition, for example
1606 and it must be of type
1610 partition, which is reserved for the complete disk.
1612 Define the drive to be a
1614 drive, which is maintained to automatically replace a failed drive.
1616 does not allow this drive to be used for any other purpose. In particular, it
1617 is not possible to create subdisks on it. This functionality has not been
1618 completely implemented.
1620 .It Ic volume Ar name Op Ar options
1621 Define a volume with name
1625 .It Cm plex Ar plexname
1626 Add the specified plex to the volume. If
1631 will look for the definition of the plex as the next possible entry in the
1632 configuration file after the definition of the volume.
1633 .It Cm readpol Ar policy
1641 .Cm prefer Ar plexname .
1643 satisfies a read request from only one of the plexes. A
1645 read policy specifies that each read should be performed from a different plex
1650 read policy reads from the specified plex every time.
1652 When creating a multi-plex volume, assume that the contents of all the plexes
1653 are consistent. This is normally not the case, so by default
1655 sets all plexes except the first one to the
1659 command to first bring them to a consistent state. In the case of striped and
1660 concatenated plexes, however, it does not normally cause problems to leave them
1661 inconsistent: when using a volume for a file system or a swap partition, the
1662 previous contents of the disks are not of interest, so they may be ignored.
1663 If you want to take this risk, use the
1665 keyword. It will only apply to the plexes defined immediately after the volume
1666 in the configuration file. If you add plexes to a volume at a later time, you
1667 must integrate them manually with the
1675 command with RAID-5 plexes: otherwise extreme data corruption will result if one
1678 .It Ic plex Op Ar options
1679 Define a plex. Unlike a volume, a plex does not need a name. The options may
1682 .It Cm name Ar plexname
1683 Specify the name of the plex. Note that you must use the keyword
1685 when naming a plex or subdisk.
1686 .It Cm org Ar organization Op Ar stripesize
1687 Specify the organization of the plex.
1690 .Cm concat , striped
1697 plexes, the parameter
1699 must be specified, while for
1701 it must be omitted. For type
1703 it specifies the width of each stripe. For type
1705 it specifies the size of a group. A group is a portion of a plex which
1706 stores the parity bits all in the same subdisk. It must be a factor of the plex size (in
1707 other words, the result of dividing the plex size by the stripe size must be an
1708 integer), and it must be a multiple of a disk sector (512 bytes).
1710 For optimum performance, stripes should be at least 128 kB in size: anything
1711 smaller will result in a significant increase in I/O activity due to mapping of
1712 individual requests over multiple disks. The performance improvement due to the
1713 increased number of concurrent transfers caused by this mapping will not make up
1714 for the performance drop due to the increase in latency. A good guideline for
1715 stripe size is between 256 kB and 512 kB. Avoid powers of 2, however: they tend
1716 to cause all superblocks to be placed on the first subdisk.
1718 A striped plex must have at least two subdisks (otherwise it is a concatenated
1719 plex), and each must be the same size. A RAID-5 plex must have at least three
1720 subdisks, and each must be the same size. In practice, a RAID-5 plex should
1721 have at least 5 subdisks.
1722 .It Cm volume Ar volname
1723 Add the plex to the specified volume. If no
1725 keyword is specified, the plex will be added to the last volume mentioned in the
1727 .It Cm sd Ar sdname offset
1728 Add the specified subdisk to the plex at offset
1731 .It Ic subdisk Op Ar options
1732 Define a subdisk. Options may be:
1733 .Bl -hang -width 18n
1735 Specify the name of a subdisk. It is not necessary to specify a name for a
1738 above. Note that you must specify the keyword
1740 if you wish to name a subdisk.
1741 .It Cm plexoffset Ar offset
1742 Specify the starting offset of the subdisk in the plex. If not specified,
1744 allocates the space immediately after the previous subdisk, if any, or otherwise
1745 at the beginning of the plex.
1746 .It Cm driveoffset Ar offset
1747 Specify the starting offset of the subdisk in the drive. If not specified,
1749 allocates the first contiguous
1751 bytes of free space on the drive.
1752 .It Cm length Ar length
1753 Specify the length of the subdisk. This keyword must be specified. There is no
1754 default, but the value 0 may be specified to mean
1755 .Dq "use the largest available contiguous free area on the drive" .
1756 If the drive is empty, this means that the entire drive will be used for the
1762 Specify the plex to which the subdisk belongs. By default, the subdisk belongs
1763 to the last plex specified.
1764 .It Cm drive Ar drive
1765 Specify the drive on which the subdisk resides. By default, the subdisk resides
1766 on the last drive specified.
1769 .Sh EXAMPLE CONFIGURATION FILE
1771 # Sample vinum configuration file
1774 drive drive1 device /dev/da1h
1775 drive drive2 device /dev/da2h
1776 drive drive3 device /dev/da3h
1777 drive drive4 device /dev/da4h
1778 drive drive5 device /dev/da5h
1779 drive drive6 device /dev/da6h
1780 # A volume with one striped plex
1782 plex org striped 512b
1783 sd length 64m drive drive2
1784 sd length 64m drive drive4
1786 plex org striped 512b
1787 sd length 512m drive drive2
1788 sd length 512m drive drive4
1792 sd length 100m drive drive2
1793 sd length 50m drive drive4
1795 sd length 150m drive drive4
1796 # A volume with one striped plex and one concatenated plex
1798 plex org striped 512b
1799 sd length 100m drive drive2
1800 sd length 100m drive drive4
1802 sd length 150m drive drive2
1803 sd length 50m drive drive4
1804 # a volume with a RAID-5 and a striped plex
1805 # note that the RAID-5 volume is longer by
1806 # the length of one subdisk
1808 plex org striped 64k
1809 sd length 1000m drive drive2
1810 sd length 1000m drive drive4
1812 sd length 500m drive drive1
1813 sd length 500m drive drive2
1814 sd length 500m drive drive3
1815 sd length 500m drive drive4
1816 sd length 500m drive drive5
1818 .Sh DRIVE LAYOUT CONSIDERATIONS
1820 drives are currently
1822 disk partitions. They must be of type
1824 in order to avoid overwriting data used for other purposes. Use
1826 to edit a partition type definition. The following display shows a typical
1827 partition layout as shown by
1831 # size offset fstype [fsize bsize bps/cpg]
1832 a: 81920 344064 4.2BSD 0 0 0 # (Cyl. 240*- 297*)
1833 b: 262144 81920 swap # (Cyl. 57*- 240*)
1834 c: 4226725 0 unused 0 0 # (Cyl. 0 - 2955*)
1835 e: 81920 0 4.2BSD 0 0 0 # (Cyl. 0 - 57*)
1836 f: 1900000 425984 4.2BSD 0 0 0 # (Cyl. 297*- 1626*)
1837 g: 1900741 2325984 vinum 0 0 0 # (Cyl. 1626*- 2955*)
1840 In this example, partition
1844 partition. Partitions
1853 partitions. Partition
1855 is a swap partition, and partition
1857 represents the whole disk and should not be used for any other purpose.
1860 uses the first 265 sectors on each partition for configuration information, so
1861 the maximum size of a subdisk is 265 sectors smaller than the drive.
1864 maintains a log file, by default
1865 .Pa /var/tmp/vinum_history ,
1866 in which it keeps track of the commands issued to
1868 You can override the name of this file by setting the environment variable
1870 to the name of the file.
1872 Each message in the log file is preceded by a date. The default format is
1873 .Qq Li %e %b %Y %H:%M:%S .
1876 for further details of the format string. It can be overridden by the
1877 environment variable
1878 .Ev VINUM_DATEFORMAT .
1879 .Sh HOW TO SET UP VINUM
1880 This section gives practical advice about how to implement a
1883 .Ss Where to put the data
1884 The first choice you need to make is where to put the data. You need dedicated
1887 They should be partitions, not devices, and they should not be partition
1889 For example, good names are
1897 both of which represent a device, not a partition, and
1899 which represents a complete disk and should be of type
1901 See the example under
1902 .Sx DRIVE LAYOUT CONSIDERATIONS
1904 .Ss Designing volumes
1907 volumes depends on your intentions. There are a number of possibilities:
1910 You may want to join up a number of small disks to make a reasonable sized file
1911 system. For example, if you had five small drives and wanted to use all the
1912 space for a single volume, you might write a configuration file like:
1913 .Bd -literal -offset indent
1914 drive d1 device /dev/da2e
1915 drive d2 device /dev/da3e
1916 drive d3 device /dev/da4e
1917 drive d4 device /dev/da5e
1918 drive d5 device /dev/da6e
1921 sd length 0 drive d1
1922 sd length 0 drive d2
1923 sd length 0 drive d3
1924 sd length 0 drive d4
1925 sd length 0 drive d5
1928 In this case, you specify the length of the subdisks as 0, which means
1929 .Dq "use the largest area of free space that you can find on the drive" .
1930 If the subdisk is the only subdisk on the drive, it will use all available
1935 to obtain additional resilience against disk failures. You have the choice of
1938 or RAID-5, also called
1941 To set up mirroring, create multiple plexes in a volume. For example, to create
1942 a mirrored volume of 2 GB, you might create the following configuration file:
1943 .Bd -literal -offset indent
1944 drive d1 device /dev/da2e
1945 drive d2 device /dev/da3e
1948 sd length 2g drive d1
1950 sd length 2g drive d2
1953 When creating mirrored drives, it is important to ensure that the data from each
1954 plex is on a different physical disk so that
1956 can access the complete address space of the volume even if a drive fails.
1957 Note that each plex requires as much data as the complete volume: in this
1958 example, the volume has a size of 2 GB, but each plex (and each subdisk)
1959 requires 2 GB, so the total disk storage requirement is 4 GB.
1961 To set up RAID-5, create a single plex of type
1963 For example, to create an equivalent resilient volume of 2 GB, you might use the
1964 following configuration file:
1965 .Bd -literal -offset indent
1966 drive d1 device /dev/da2e
1967 drive d2 device /dev/da3e
1968 drive d3 device /dev/da4e
1969 drive d4 device /dev/da5e
1970 drive d5 device /dev/da6e
1973 sd length 512m drive d1
1974 sd length 512m drive d2
1975 sd length 512m drive d3
1976 sd length 512m drive d4
1977 sd length 512m drive d5
1980 RAID-5 plexes require at least three subdisks, one of which is used for storing
1981 parity information and is lost for data storage. The more disks you use, the
1982 greater the proportion of the disk storage can be used for data storage. In
1983 this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored
1984 configuration. If you were to use the minimum of only three disks, you would
1985 require 3 GB to store the information, for example:
1986 .Bd -literal -offset indent
1987 drive d1 device /dev/da2e
1988 drive d2 device /dev/da3e
1989 drive d3 device /dev/da4e
1992 sd length 1g drive d1
1993 sd length 1g drive d2
1994 sd length 1g drive d3
1997 As with creating mirrored drives, it is important to ensure that the data from
1998 each subdisk is on a different physical disk so that
2000 can access the complete address space of the volume even if a drive fails.
2004 to allow more concurrent access to a file system. In many cases, access to a
2005 file system is limited by the speed of the disk. By spreading the volume across
2006 multiple disks, you can increase the throughput in multi-access environments.
2007 This technique shows little or no performance improvement in single-access
2010 uses a technique called
2012 or sometimes RAID-0, to increase this concurrency of access. The name RAID-0 is
2013 misleading: striping does not provide any redundancy or additional reliability.
2014 In fact, it decreases the reliability, since the failure of a single disk will
2015 render the volume useless, and the more disks you have, the more likely it is
2016 that one of them will fail.
2018 To implement striping, use a
2021 .Bd -literal -offset indent
2022 drive d1 device /dev/da2e
2023 drive d2 device /dev/da3e
2024 drive d3 device /dev/da4e
2025 drive d4 device /dev/da5e
2027 plex org striped 512k
2028 sd length 512m drive d1
2029 sd length 512m drive d2
2030 sd length 512m drive d3
2031 sd length 512m drive d4
2034 A striped plex must have at least two subdisks, but the increase in performance
2035 is greater if you have a larger number of disks.
2037 You may want to have the best of both worlds and have both resilience and
2038 performance. This is sometimes called RAID-10 (a combination of RAID-1 and
2039 RAID-0), though again this name is misleading. With
2041 you can do this with the following configuration file:
2042 .Bd -literal -offset indent
2043 drive d1 device /dev/da2e
2044 drive d2 device /dev/da3e
2045 drive d3 device /dev/da4e
2046 drive d4 device /dev/da5e
2047 volume raid setupstate
2048 plex org striped 512k
2049 sd length 512m drive d1
2050 sd length 512m drive d2
2051 sd length 512m drive d3
2052 sd length 512m drive d4
2053 plex org striped 512k
2054 sd length 512m drive d4
2055 sd length 512m drive d3
2056 sd length 512m drive d2
2057 sd length 512m drive d1
2060 Here the plexes are striped, increasing performance, and there are two of them,
2061 increasing reliability. Note that this example shows the subdisks of the second
2062 plex in reverse order from the first plex. This is for performance reasons and
2063 will be discussed below. In addition, the volume specification includes the
2066 which ensures that all plexes are
2070 .Ss Creating the volumes
2071 Once you have created your configuration files, start
2073 and create the volumes. In this example, the configuration is in the file
2075 .Bd -literal -offset 2n
2076 # vinum create -v configfile
2077 1: drive d1 device /dev/da2e
2078 2: drive d2 device /dev/da3e
2081 5: sd length 2g drive d1
2083 7: sd length 2g drive d2
2084 Configuration summary
2086 Drives: 2 (4 configured)
2087 Volumes: 1 (4 configured)
2088 Plexes: 2 (8 configured)
2089 Subdisks: 2 (16 configured)
2091 Drive d1: Device /dev/da2e
2092 Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999
2093 Config last updated Tue Mar 23 14:30:32 1999
2094 Size: 60105216000 bytes (57320 MB)
2095 Used: 2147619328 bytes (2048 MB)
2096 Available: 57957596672 bytes (55272 MB)
2099 Drive d2: Device /dev/da3e
2100 Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999
2101 Config last updated Tue Mar 23 14:30:33 1999
2102 Size: 60105216000 bytes (57320 MB)
2103 Used: 2147619328 bytes (2048 MB)
2104 Available: 57957596672 bytes (55272 MB)
2108 Volume mirror: Size: 2147483648 bytes (2048 MB)
2112 Read policy: round robin
2114 Plex mirror.p0: Size: 2147483648 bytes (2048 MB)
2117 Organization: concat
2118 Part of volume mirror
2119 Plex mirror.p1: Size: 2147483648 bytes (2048 MB)
2122 Organization: concat
2123 Part of volume mirror
2125 Subdisk mirror.p0.s0:
2126 Size: 2147483648 bytes (2048 MB)
2128 Plex mirror.p0 at offset 0
2130 Subdisk mirror.p1.s0:
2131 Size: 2147483648 bytes (2048 MB)
2133 Plex mirror.p1 at offset 0
2140 to list the file as it configures. Subsequently it lists the current
2141 configuration in the same format as the
2144 .Ss Creating more volumes
2145 Once you have created the
2149 keeps track of them in its internal configuration files. You do not need to
2150 create them again. In particular, if you run the
2152 command again, you will create additional objects:
2154 # vinum create sampleconfig
2155 Configuration summary
2157 Drives: 2 (4 configured)
2158 Volumes: 1 (4 configured)
2159 Plexes: 4 (8 configured)
2160 Subdisks: 4 (16 configured)
2162 D d1 State: up Device /dev/da2e Avail: 53224/57320 MB (92%)
2163 D d2 State: up Device /dev/da3e Avail: 53224/57320 MB (92%)
2165 V mirror State: up Plexes: 4 Size: 2048 MB
2167 P mirror.p0 C State: up Subdisks: 1 Size: 2048 MB
2168 P mirror.p1 C State: up Subdisks: 1 Size: 2048 MB
2169 P mirror.p2 C State: up Subdisks: 1 Size: 2048 MB
2170 P mirror.p3 C State: up Subdisks: 1 Size: 2048 MB
2172 S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB
2173 S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB
2174 S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB
2175 S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB
2178 As this example (this time with the
2180 option) shows, re-running the
2182 has created four new plexes, each with a new subdisk. If you want to add other
2183 volumes, create new configuration files for them. They do not need to reference
2186 already knows about. For example, to create a volume
2189 .Pa /dev/da1e , /dev/da2e , /dev/da3e
2192 you only need to mention the other two:
2193 .Bd -literal -offset indent
2194 drive d3 device /dev/da1e
2195 drive d4 device /dev/da4e
2204 With this configuration file, we get:
2206 # vinum create newconfig
2207 Configuration summary
2209 Drives: 4 (4 configured)
2210 Volumes: 2 (4 configured)
2211 Plexes: 5 (8 configured)
2212 Subdisks: 8 (16 configured)
2214 D d1 State: up Device /dev/da2e Avail: 51176/57320 MB (89%)
2215 D d2 State: up Device /dev/da3e Avail: 53220/57320 MB (89%)
2216 D d3 State: up Device /dev/da1e Avail: 53224/57320 MB (92%)
2217 D d4 State: up Device /dev/da4e Avail: 53224/57320 MB (92%)
2219 V mirror State: down Plexes: 4 Size: 2048 MB
2220 V raid State: down Plexes: 1 Size: 6144 MB
2222 P mirror.p0 C State: init Subdisks: 1 Size: 2048 MB
2223 P mirror.p1 C State: init Subdisks: 1 Size: 2048 MB
2224 P mirror.p2 C State: init Subdisks: 1 Size: 2048 MB
2225 P mirror.p3 C State: init Subdisks: 1 Size: 2048 MB
2226 P raid.p0 R5 State: init Subdisks: 4 Size: 6144 MB
2228 S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB
2229 S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB
2230 S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB
2231 S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB
2232 S raid.p0.s0 State: empty PO: 0 B Size: 2048 MB
2233 S raid.p0.s1 State: empty PO: 512 kB Size: 2048 MB
2234 S raid.p0.s2 State: empty PO: 1024 kB Size: 2048 MB
2235 S raid.p0.s3 State: empty PO: 1536 kB Size: 2048 MB
2238 Note the size of the RAID-5 plex: it is only 6 GB, although together its
2239 components use 8 GB of disk space. This is because the equivalent of one
2240 subdisk is used for storing parity data.
2241 .Ss Restarting Vinum
2242 On rebooting the system, start
2250 This will start all the
2252 drives in the system. If for some reason you wish to start only some of them,
2256 .Ss Performance considerations
2257 A number of misconceptions exist about how to set up a RAID array for best
2258 performance. In particular, most systems use far too small a stripe size. The
2259 following discussion applies to all RAID systems, not just to
2264 block I/O system issues requests of between .5kB and 128 kB; a
2265 typical mix is somewhere round 8 kB. You can't stop any striping system from
2266 breaking a request into two physical requests, and if you make the stripe small
2267 enough, it can be broken into several. This will result in a significant drop
2268 in performance: the decrease in transfer time per disk is offset by the order of
2269 magnitude greater increase in latency.
2271 With modern disk sizes and the
2273 I/O system, you can expect to have a
2274 reasonably small number of fragmented requests with a stripe size between 256 kB
2275 and 512 kB; with correct RAID implementations there is no obvious reason not to
2276 increase the size to 2 or 4 MB on a large disk.
2278 When choosing a stripe size, consider that most current UFS file systems have
2279 cylinder groups 32 MB in size. If you have a stripe size and number of disks
2280 both of which are a power of two, it is probable that all superblocks and inodes
2281 will be placed on the same subdisk, which will impact performance significantly.
2282 Choose an odd number instead, for example 479 kB.
2284 The easiest way to consider the impact of any transfer in a multi-access system
2285 is to look at it from the point of view of the potential bottleneck, the disk
2286 subsystem: how much total disk time does the transfer use?
2288 everything is cached, the time relationship between the request and its
2289 completion is not so important: the important parameter is the total time that
2290 the request keeps the disks active, the time when the disks are not available to
2291 perform other transfers. As a result, it doesn't really matter if the transfers
2292 are happening at the same time or different times. In practical terms, the time
2293 we're looking at is the sum of the total latency (positioning time and
2294 rotational latency, or the time it takes for the data to arrive under the disk
2295 heads) and the total transfer time. For a given transfer to disks of the same
2296 speed, the transfer time depends only on the total size of the transfer.
2298 Consider a typical news article or web page of 24 kB, which will probably be
2299 read in a single I/O. Take disks with a transfer rate of 6 MB/s and an average
2300 positioning time of 8 ms, and a file system with 4 kB blocks. Since it's 24 kB,
2301 we don't have to worry about fragments, so the file will start on a 4 kB
2302 boundary. The number of transfers required depends on where the block starts:
2303 it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is
2304 the file size in file system blocks.
2307 Stripe size of 4 kB. You'll have 6 transfers. Total subsystem load: 48 ms
2308 latency, 2 ms transfer, 50 ms total.
2310 Stripe size of 8 kB. On average, you'll have 3.5 transfers. Total subsystem
2311 load: 28 ms latency, 2 ms transfer, 30 ms total.
2313 Stripe size of 16 kB. On average, you'll have 2.25 transfers. Total subsystem
2314 load: 18 ms latency, 2 ms transfer, 20 ms total.
2316 Stripe size of 256 kB. On average, you'll have 1.08 transfers. Total subsystem
2317 load: 8.6 ms latency, 2 ms transfer, 10.6 ms total.
2319 Stripe size of 4 MB. On average, you'll have 1.0009 transfers. Total subsystem
2320 load: 8.01 ms latency, 2 ms transfer, 10.01 ms total.
2323 It appears that some hardware RAID systems have problems with large stripes:
2324 they appear to always transfer a complete stripe to or from disk, so that a
2325 large stripe size will have an adverse effect on performance.
2327 does not suffer from this problem: it optimizes all disk transfers and does not
2328 transfer unneeded data.
2330 Note that no well-known benchmark program tests true multi-access conditions
2331 (more than 100 concurrent users), so it is difficult to demonstrate the validity
2332 of these statements.
2334 Given these considerations, the following factors affect the performance of a
2339 Striping improves performance for multiple access only, since it increases the
2340 chance of individual requests being on different drives.
2342 Concatenating UFS file systems across multiple drives can also improve
2343 performance for multiple file access, since UFS divides a file system into
2344 cylinder groups and attempts to keep files in a single cylinder group. In
2345 general, it is not as effective as striping.
2347 Mirroring can improve multi-access performance for reads, since by default
2349 issues consecutive reads to consecutive plexes.
2351 Mirroring decreases performance for all writes, whether multi-access or single
2352 access, since the data must be written to both plexes. This explains the
2353 subdisk layout in the example of a mirroring configuration above: if the
2354 corresponding subdisk in each plex is on a different physical disk, the write
2355 commands can be issued in parallel, whereas if they are on the same physical
2356 disk, they will be performed sequentially.
2358 RAID-5 reads have essentially the same considerations as striped reads, unless
2359 the striped plex is part of a mirrored volume, in which case the performance of
2360 the mirrored volume will be better.
2362 RAID-5 writes are approximately 25% of the speed of striped writes: to perform
2365 must first read the data block and the corresponding parity block, perform some
2366 calculations and write back the parity block and the data block, four times as
2367 many transfers as for writing a striped plex. On the other hand, this is offset
2368 by the cost of mirroring, so writes to a volume with a single RAID-5 plex are
2369 approximately half the speed of writes to a correctly configured volume with two
2374 configuration changes (for example, adding or removing objects, or the change of
2375 state of one of the objects),
2377 writes up to 128 kB of updated configuration to each drive. The larger the
2378 number of drives, the longer this takes.
2380 .Ss Creating file systems on Vinum volumes
2381 You do not need to run
2383 before creating a file system on a
2389 option to state that the device is not divided into partitions. For example, to
2390 create a file system on volume
2392 enter the following command:
2394 .Dl "# newfs -v /dev/vinum/mirror"
2396 A number of other considerations apply to
2401 There is no advantage in creating multiple drives on a single disk. Each drive
2402 uses 131.5 kB of data for label and configuration information, and performance
2403 will suffer when the configuration changes. Use appropriately sized subdisks instead.
2405 It is possible to increase the size of a concatenated
2407 plex, but currently the size of striped and RAID-5 plexes cannot be increased.
2408 Currently the size of an existing UFS file system also cannot be increased, but
2409 it is planned to make both plexes and file systems extensible.
2411 .Sh STATE MANAGEMENT
2412 Vinum objects have the concept of
2416 for more details. They are only completely accessible if their state is
2418 To change an object state to
2422 command. To change an object state to
2426 command. Normally other states are created automatically by the relationship
2427 between objects. For example, if you add a plex to a volume, the subdisks of
2428 the plex will be set in the
2430 state, indicating that, though the hardware is accessible, the data on the
2431 subdisk is invalid. As a result of this state, the plex will be set in the
2434 .Ss The `reviving' state
2435 In many cases, when you start a subdisk the system must copy data to the
2436 subdisk. Depending on the size of the subdisk, this can take a long time.
2437 During this time, the subdisk is set in the
2439 state. On successful completion of the copy operation, it is automatically set
2442 state. It is possible for the process performing the revive to be stopped and
2443 restarted. The system keeps track of how far the subdisk has been revived, and
2446 command is reissued, the copying continues from this point.
2448 In order to maintain the consistency of a volume while one or more of its plexes
2451 writes to subdisks which have been revived up to the point of the write. It may
2452 also read from the plex if the area being read has already been revived.
2454 The following points are not bugs, and they have good reasons for existing, but
2455 they have shown to cause confusion. Each is discussed in the appropriate
2462 disk partitions and must have the partition type
2464 This is different from
2466 which expects partitions of type
2470 is an invitation to shoot yourself in the foot: with
2472 you can easily overwrite a file system.
2474 will not permit this.
2476 For similar reasons, the
2478 command will not accept a drive on partition
2482 is used by the system to represent the whole disk, and must be of type
2484 Clearly there is a conflict here, which
2486 resolves by not using the
2490 When you create a volume with multiple plexes,
2492 does not automatically initialize the plexes. This means that the contents are
2493 not known, but they are certainly not consistent. As a result, by default
2495 sets the state of all newly-created plexes except the first to
2497 In order to synchronize them with the first plex, you must
2501 to copy the data from a plex which is in the
2503 state. Depending on the size of the subdisks involved, this can take a long
2506 In practice, people aren't too interested in what was in the plex when it was
2507 created, and other volume managers cheat by setting them
2511 provides two ways to ensure that newly created plexes are
2515 Create the plexes and then synchronize them with
2518 Create the volume (not the plex) with the keyword
2522 to ignore any possible inconsistency and set the plexes to be
2526 Some of the commands currently supported by
2528 are not really needed. For reasons which I don't understand, however, I find
2529 that users frequently try the
2533 commands, though especially
2535 outputs all sort of dire warnings. Don't use these commands unless you have a
2536 good reason to do so.
2538 Some state transitions are not very intuitive. In fact, it's not clear whether
2539 this is a bug or a feature. If you find that you can't start an object in some
2540 strange state, such as a
2542 subdisk, try first to get it into
2548 commands. If that works, you should then be able to start it. If you find
2549 that this is the only way to get out of a position where easier methods fail,
2550 please report the situation.
2552 If you build the kernel module with the
2553 .Fl D Ns Dv VINUMDEBUG
2554 option, you must also build
2557 .Fl D Ns Dv VINUMDEBUG
2558 option, since the size of some data objects used by both components depends on
2559 this option. If you don't do so, commands will fail with the message
2560 .Sy Invalid argument ,
2561 and a console message will be logged such as
2563 .It "vinumioctl: invalid ioctl from process 247 (vinum): c0e44642"
2566 This error may also occur if you use old versions of KLD or userland program.
2570 command has a particularly emetic syntax. Once it was the only way to start
2572 but now the preferred method is with
2575 should be used for maintenance purposes only. Note that its syntax has changed,
2576 and the arguments must be disk slices, such as
2578 not partitions such as
2583 .Bl -tag -width /dev/vinum/control -compact
2585 directory with device nodes for
2588 .It Pa /dev/vinum/control
2591 .It Pa /dev/vinum/plex
2592 directory containing device nodes for
2595 .It Pa /dev/vinum/sd
2596 directory containing device nodes for
2601 .Bl -tag -width VINUM_DATEFORMAT
2602 .It Ev VINUM_HISTORY
2603 The name of the log file, by default
2604 .Pa /var/log/vinum_history .
2605 .It Ev VINUM_DATEFORMAT
2606 The format of dates in the log file, by default
2607 .Qq Li %e %b %Y %H:%M:%S .
2609 The name of the editor to use for editing configuration files, by default
2618 .Pa http://www.vinumvm.org/vinum/ ,
2619 .Pa http://www.vinumvm.org/vinum/how-to-debug.html .
2621 .An Greg Lehey Aq grog@lemis.com
2625 command first appeared in
2627 The RAID-5 component of
2629 was developed for Cybernet Inc.\&
2630 .Pq Pa www.cybernet.com
2631 for its NetMAX product.