Cloudibee

linux - storage - virtualization
OCFS2 is a POSIX-compliant shared-disk cluster file system for Linux capable of providing both high performance and high availability.  Cluster-aware applications can make use of parallel I/O for higher performance. OCFS2 is mostly used to host Oracle Real application clusters (RAC) database on Linux clusters. 
The below steps shows how to create ocfs2 filesystem on top a multipath’d SAN lun and mount it on Linux clusters.
  1. Identify the nodes that will be part of your cluster.
  2. Export/Zone the LUNs on the SAN end and check whether they are accessible on all the hosts of the cluster. (fdisk -l or multipath -ll)
  3. If you need multipathing, configure multipath and the multipathing policy based on your requirement. For Linux multipath setup, refer Redhat’s multipath guide.
  4. Create OCFS2 configuration file (/etc/ocfs2/cluster.conf) on all the cluster nodes.
  5. The example presents you a sample cluster.conf for a 3 node pool. If you have heartbeat IP configured on these cluster nodes, use the heartbeat IP for ocfs2 cluster communication and specify the hostname (without FQDN). Copy the same file to all the hosts in the cluster.

    [[email protected] ~]# cat /etc/ocfs2/cluster.conf
    node:
            ip_port = 7777
            ip_address = 203.21.2.101
            number = 0
            name = oracle-cluster-1
            cluster = ocfs2

    node:
            ip_port = 7777
            ip_address = 203.21.2.102
            number = 1
            name = oracle-cluster-2
            cluster = ocfs2

    node:
            ip_port = 7777
            ip_address = 203.21.2.103
            number = 2
            name = oracle-cluster-3
            cluster = ocfs2

    cluster:
            node_count = 3
            name = ocfs2

    [[email protected] ~]#

  6. On each node check the status of OCFS2 cluster service and stop “o2cb” if the service is already running.

    # service o2cb status
    # service o2cb stop

  7. On each node, load the OCFS2 module.

    # service o2cb load

  8. Make the OCFS2 service online on all the nodes.

    # service o2cb online
  9. Now your OCFS2 cluster is ready.
  10. Format the SAN lun device from any one of the cluster node.

    # mkfs.ocfs2 -b 4k -C 32k -L oraclerac /dev/mapper/mpath0

    -b : Block size (values are 512, 1K, 2K and 4K bytes per block)
    -C : Cluster size (values are 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K and 1M)
    -L : Label


    Note : Replace /dev/mapper/mpath0 with your device name.
  11. Update /etc/fstab on all the nodes in the cluster with the mount point.

    Like : /dev/mapper/mpath0 /u01 ocfs2 _netdev 0 0

  12. Mount the /u01 volume using mount command

    # mount /u01

  13. Enable ocfs and o2b service at runlevel 3.

    # chkconfig –level 345 o2cb on ; chkconfig –level 345 ocfs2 on

  14. The /u01 repository setup on a SAN Lun is done.
  15. You can now configure Oracle RAC database on this filesystem.

Here are some explanations on the columns of netapp sysstat command.

Cache age : The age in minutes of the oldest read-only blocks in the buffer cache. Data in this column indicates how fast read operations are cycling through system memory; when the filer is reading very large files, buffer cache age will be very low. Also if reads are random, the cache age will be low. If you have a performance problem, where the read performance is poor, this number may indicate you need a larger memory system or  analyze the application to reduce the randomness of the workload.

Cache hit : This is the WAFL cache hit rate percentage. This is the percentage of times where WAFL tried to read a data block from disk that and the data was found already cached in memory. A dash in this column indicates that WAFL did not attempt to load any blocks during the measurement interval.

CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are as follows:

  • No CP started during sampling interval (no writes happened to disk at this point of time)
  • number Number of CPs started during sampling interval
  • B Back to back CPs (CP generated CP) (The filer is having a tough time keeping up with writes)
  • b Deferred back to back CPs (CP generated CP) (the back to back condition is getting worse)
  • F CP caused by full NVLog (one half of the nvram log was full, and so was flushed)
  • H CP caused by high water mark (rare to see this. The filer was at half way full on one side of the nvram logs, so decides to write on disk).
  • L CP caused by low water mark
  • S CP caused by snapshot operation
  • T CP caused by timer (every 10 seconds filer data is flushed to disk)
  • U CP caused by flush
  • : continuation of CP from previous interval (means, A cp is still going on, during 1 second intervals)

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:

  • 0 Initializing
  • n Processing normal files
  • s Processing special files
  • f Flushing modified data to disk
  • v Flushing modified superblock to disk

CP util : The Consistency Point (CP) utilization, the % of time spent in a CP.  100% time in CP is a good thing. It means, the amount of time, used out of the cpu, that was dedicated to writing data, 100% of it was used. 75% means, that only 75% of the time allocated to writing data was utilized, which means we wasted 25% of that time. A good CP percentage has to be at or near 100%.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Netapp sysstat is like vmstat and iostat rolled into one command. It reports filer performance statistics like CPU utilization, the amount of disk traffic, and tape traffic. When run with out options, sysstat will prints a new line every 15 seconds, of just a basic amount of information. You have to use control-C (^c) or set the interval count (-c count ) to stop sysstat after time. For more detailed information, use the -u option. For specific information to one particular protocol, you can use other options. I’ll list them here.

  • -f FCP statistics
  • -i iSCSI statistics
  • -b SAN (blocks) extended statistics
  • -u extended utilization statistics
  • -x extended output format. This includes all available output fields. Be aware that this produces output that is longer than 80 columns and is generally intended for “off-line” types of analysis and not for “real-time” viewing.
  • -m Displays multi-processor CPU utilization statistics. In addition to the percentage of the time that one or more CPUs were busy (ANY), the average (AVG) is displayed, as well as, the individual utilization of each processor. This is only handy on multi proc systems. Won’t work on single processor machines.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Snapmirror is an licensed utility in Netapp to do data transfer across filers. Snapmirror works at Volume level or Qtree level. Snapmirror is mainly used for disaster recovery and replication.

Snapmirrror needs a source and destination filer. (When source and destination are the same filer, the snapmirror happens on local filer itself.  This is when you have to replicate volumes inside a filer. If you need DR capabilities of a volume inside a filer, you have to try syncmirror ).

Synchronous SnapMirror is a SnapMirror feature in which the data on one system is replicated on another system at, or near, the same time it is written to the first system. Synchronous SnapMirror synchronously replicates data between single or clustered storage systems situated at remote sites using either an IP or a Fibre Channel connection. Before Data ONTAP saves data to disk, it collects written data in NVRAM. Then, at a point in time called a consistency point, it sends the data to disk.

When the Synchronous SnapMirror feature is enabled, the source system forwards data to the destination system as it is written in NVRAM. Then, at the consistency point, the source system sends its data to disk and tells the destination system to also send its data to disk.

This guides you quickly through the Snapmirror setup and commands.

1) Enable Snapmirror on source and destination filer

source-filer> options snapmirror.enable
snapmirror.enable            on
source-filer>
source-filer> options snapmirror.access
snapmirror.access            legacy
source-filer>

2) Snapmirror Access

Make sure destination filer has snapmirror access to the source filer. The snapmirror filer’s name or IP address should be in /etc/snapmirror.allow. Use wrfile to add entries to /etc/snapmirror.allow.

source-filer> rdfile /etc/snapmirror.allow
destination-filer
destination-filer2
source-filer>

3) Initializing a Snapmirror relation

Volume snapmirror : Create a destination volume on destination netapp filer, of same size as source volume or greater size. For volume snapmirror, the destination volume should be in restricted mode. For example, let us consider we are snapmirroring a 100G volume – we create the destination volume and make it restricted.

destination-filer> vol create demo_destination aggr01 100G
destination-filer> vol restrict demo_destination

Volume SnapMirror creates a Snapshot copy before performing the initial transfer. This copy is referred to as the baseline Snapshot copy. After performing an initial transfer of all data in the volume, VSM (Volume SnapMirror) sends to the destination only the blocks that have changed since the last successful replication. When SnapMirror performs an update transfer, it creates another new Snapshot copy and compares the changed blocks. These changed blocks are sent as part of the update transfer.

Snapmirror is always destination filer driven. So the snapmirror initialize has to be done on destination filer. The below command starts the baseline transfer.

destination-filer> snapmirror initialize -S source-filer:demo_source destination-filer:demo_destination
Transfer started.
Monitor progress with ‘snapmirror status’ or the snapmirror log.
destination-filer>

Qtree Snapmirror : For qtree snapmirror, you should not create the destination qtree. The snapmirror command automatically creates the destination qtree. So just volume creation of required size is good enough.

Qtree SnapMirror determines changed data by first looking through the inode file for inodes that have changed and changed inodes of the interesting qtree for changed data blocks. The SnapMirror software then transfers only the new or changed data blocks from this Snapshot copy that is associated with the designated qtree. On the destination volume, a new Snapshot copy is then created that contains a complete point-in-time copy of the entire destination volume, but that is associated specifically with the particular qtree that has been replicated.

destination-filer> snapmirror initialize -S source-filer:/vol/demo1/qtree destination-filer:/vol/demo1/qtree
Transfer started.
Monitor progress with ‘snapmirror status’ or the snapmirror log.

4) Monitoring the status : Snapmirror data transfer status can be monitored either from source or destination filer. Use “snapmirror status” to check the status.

destination-filer> snapmirror status
Snapmirror is on.
Source                          Destination                          State          Lag Status
source-filer:demo_source        destination-filer:demo_destination   Uninitialized  –   Transferring (1690 MB done)
source-filer:/vol/demo1/qtree   destination-filer:/vol/demo1/qtree   Uninitialized  –   Transferring (32 MB done)
destination-filer>

5) Snapmirror schedule : This is the schedule used by the destination filer for updating the mirror. It informs the SnapMirror scheduler when transfers will be initiated. The schedule field can either contain the word sync to specify synchronous mirroring or a cron-style specification of when to update the mirror. The cronstyle schedule contains four space-separated fields.

If you want to sync the data on a scheduled frequency, you can set that in destination filer’s /etc/snapmirror.conf . The time settings are similar to Unix cron. You can set a synchronous snapmirror schedule in /etc/snapmirror.conf by adding “sync” instead of the cron style frequency.

destination-filer> rdfile /etc/snapmirror.conf
source-filer:demo_source        destination-filer:demo_destination – 0 * * *  # This syncs every hour
source-filer:/vol/demo1/qtree   destination-filer:/vol/demo1/qtree – 0 21 * * # This syncs every 9:00 pm
destination-filer>

6) Other Snapmirror commands

  • To break snapmirror relation – do snapmirror quiesce and snapmirror break.
  • To update snapmirror data  – do snapmirror update
  • To resync a broken relation – do snapmirror resync.
  • To abort a relation – do snapmirror abort

Snapmirror do provide multipath support. More than one physical path between a source and a destination system might be desired for a mirror relationship. Multipath support allows SnapMirror traffic to be load balanced between these paths and provides for failover in the event of a network outage.

To read how to tune the performance & speed of the netapp snapmirror or snapvault replication transfers and adjust the transfer bandwidth , go to Tuning Snapmirror & Snapvault replication data transfer speed