Cloudibee

linux - storage - virtualization
Deduplication refers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. De-duplication is able to reduce the required storage capacity since only the unique data is stored. 
Netapp supports deduplication where only unique blocks in the flex volume is stored and it creates a small amount of additional metadata in the dedup process. The NetApp deduplication technology allows duplicate 4KB blocks anywhere in the flexible volume to be deleted and stores a unique one.
The core enabling technology of deduplication is fingerprints. These are unique digital signatures for every 4KB data block in the flexible volume.
When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used blocks in the flexible volume. After the fingerprint file is created, fingerprints are checked for duplicates and if found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical. If they are found to be identical, the block’s pointer is updated to the already existing data block and the duplicate data block is released and inode is updated.
 Netapp Deduplication commands:

  1. Enable dedup (asis) license.

    fractal-design> sis on /vol/demovol

  2. If you have a new flex volume which was just created, follow this step to enable ASIS deduplication

    fractal-design> sis on /vol/demovol
    Deduplication for “/vol/demovol” is enabled.
    Already existing data could be processed by running “sis start -s /vol/demovol”

  3. If you have already existing flex volume with data in it, follow this step.

    fractal-design> sis start -s /vol/demovol

  4. Checking the status of deduplication.
    fractal-design> vol status demovol
    Volume          State   Status          Options
    VolArchive      online  raid_dp, flex   nosnap=on
                            sis
    Containing aggregate: ‘aggr0’
    fractal-design>

    fractal-design> sis status /vol/demovol
    Path            State   Status      Progress
    /vol/demovol    Enabled Idle        Idle for 00:02:12
    fractal-design>
  5. Check the storage space saved due to deduplication
    fractal-design> df -s /vol/demovol
    Filesystem      used    saved   %saved
    /vol/demovol/   9316052 0       0%
    fractal-design>

  6. If you have to run deduplication at a later point of time on this volume, just do a “sis start /vol/demovol”.
  7. The sis can be scheduled using “sis config” command.
  8. Done.

More netapp blog posts at : http://unixfoo.blogspot.com/search/label/netapp

OCFS2 is a POSIX-compliant shared-disk cluster file system for Linux capable of providing both high performance and high availability.  Cluster-aware applications can make use of parallel I/O for higher performance. OCFS2 is mostly used to host Oracle Real application clusters (RAC) database on Linux clusters. 
The below steps shows how to create ocfs2 filesystem on top a multipath’d SAN lun and mount it on Linux clusters.
  1. Identify the nodes that will be part of your cluster.
  2. Export/Zone the LUNs on the SAN end and check whether they are accessible on all the hosts of the cluster. (fdisk -l or multipath -ll)
  3. If you need multipathing, configure multipath and the multipathing policy based on your requirement. For Linux multipath setup, refer Redhat’s multipath guide.
  4. Create OCFS2 configuration file (/etc/ocfs2/cluster.conf) on all the cluster nodes.
  5. The example presents you a sample cluster.conf for a 3 node pool. If you have heartbeat IP configured on these cluster nodes, use the heartbeat IP for ocfs2 cluster communication and specify the hostname (without FQDN). Copy the same file to all the hosts in the cluster.

    [[email protected] ~]# cat /etc/ocfs2/cluster.conf
    node:
            ip_port = 7777
            ip_address = 203.21.2.101
            number = 0
            name = oracle-cluster-1
            cluster = ocfs2

    node:
            ip_port = 7777
            ip_address = 203.21.2.102
            number = 1
            name = oracle-cluster-2
            cluster = ocfs2

    node:
            ip_port = 7777
            ip_address = 203.21.2.103
            number = 2
            name = oracle-cluster-3
            cluster = ocfs2

    cluster:
            node_count = 3
            name = ocfs2

    [[email protected] ~]#

  6. On each node check the status of OCFS2 cluster service and stop “o2cb” if the service is already running.

    # service o2cb status
    # service o2cb stop

  7. On each node, load the OCFS2 module.

    # service o2cb load

  8. Make the OCFS2 service online on all the nodes.

    # service o2cb online
  9. Now your OCFS2 cluster is ready.
  10. Format the SAN lun device from any one of the cluster node.

    # mkfs.ocfs2 -b 4k -C 32k -L oraclerac /dev/mapper/mpath0

    -b : Block size (values are 512, 1K, 2K and 4K bytes per block)
    -C : Cluster size (values are 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K and 1M)
    -L : Label


    Note : Replace /dev/mapper/mpath0 with your device name.
  11. Update /etc/fstab on all the nodes in the cluster with the mount point.

    Like : /dev/mapper/mpath0 /u01 ocfs2 _netdev 0 0

  12. Mount the /u01 volume using mount command

    # mount /u01

  13. Enable ocfs and o2b service at runlevel 3.

    # chkconfig –level 345 o2cb on ; chkconfig –level 345 ocfs2 on

  14. The /u01 repository setup on a SAN Lun is done.
  15. You can now configure Oracle RAC database on this filesystem.

Here are some explanations on the columns of netapp sysstat command.

Cache age : The age in minutes of the oldest read-only blocks in the buffer cache. Data in this column indicates how fast read operations are cycling through system memory; when the filer is reading very large files, buffer cache age will be very low. Also if reads are random, the cache age will be low. If you have a performance problem, where the read performance is poor, this number may indicate you need a larger memory system or  analyze the application to reduce the randomness of the workload.

Cache hit : This is the WAFL cache hit rate percentage. This is the percentage of times where WAFL tried to read a data block from disk that and the data was found already cached in memory. A dash in this column indicates that WAFL did not attempt to load any blocks during the measurement interval.

CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are as follows:

  • No CP started during sampling interval (no writes happened to disk at this point of time)
  • number Number of CPs started during sampling interval
  • B Back to back CPs (CP generated CP) (The filer is having a tough time keeping up with writes)
  • b Deferred back to back CPs (CP generated CP) (the back to back condition is getting worse)
  • F CP caused by full NVLog (one half of the nvram log was full, and so was flushed)
  • H CP caused by high water mark (rare to see this. The filer was at half way full on one side of the nvram logs, so decides to write on disk).
  • L CP caused by low water mark
  • S CP caused by snapshot operation
  • T CP caused by timer (every 10 seconds filer data is flushed to disk)
  • U CP caused by flush
  • : continuation of CP from previous interval (means, A cp is still going on, during 1 second intervals)

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:

  • 0 Initializing
  • n Processing normal files
  • s Processing special files
  • f Flushing modified data to disk
  • v Flushing modified superblock to disk

CP util : The Consistency Point (CP) utilization, the % of time spent in a CP.  100% time in CP is a good thing. It means, the amount of time, used out of the cpu, that was dedicated to writing data, 100% of it was used. 75% means, that only 75% of the time allocated to writing data was utilized, which means we wasted 25% of that time. A good CP percentage has to be at or near 100%.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Netapp sysstat is like vmstat and iostat rolled into one command. It reports filer performance statistics like CPU utilization, the amount of disk traffic, and tape traffic. When run with out options, sysstat will prints a new line every 15 seconds, of just a basic amount of information. You have to use control-C (^c) or set the interval count (-c count ) to stop sysstat after time. For more detailed information, use the -u option. For specific information to one particular protocol, you can use other options. I’ll list them here.

  • -f FCP statistics
  • -i iSCSI statistics
  • -b SAN (blocks) extended statistics
  • -u extended utilization statistics
  • -x extended output format. This includes all available output fields. Be aware that this produces output that is longer than 80 columns and is generally intended for “off-line” types of analysis and not for “real-time” viewing.
  • -m Displays multi-processor CPU utilization statistics. In addition to the percentage of the time that one or more CPUs were busy (ANY), the average (AVG) is displayed, as well as, the individual utilization of each processor. This is only handy on multi proc systems. Won’t work on single processor machines.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.