Cloudibee

linux - storage - virtualization
Deduplication refers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. De-duplication is able to reduce the required storage capacity since only the unique data is stored. 
Netapp supports deduplication where only unique blocks in the flex volume is stored and it creates a small amount of additional metadata in the dedup process. The NetApp deduplication technology allows duplicate 4KB blocks anywhere in the flexible volume to be deleted and stores a unique one.
The core enabling technology of deduplication is fingerprints. These are unique digital signatures for every 4KB data block in the flexible volume.
When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used blocks in the flexible volume. After the fingerprint file is created, fingerprints are checked for duplicates and if found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical. If they are found to be identical, the block’s pointer is updated to the already existing data block and the duplicate data block is released and inode is updated.
 Netapp Deduplication commands:

  1. Enable dedup (asis) license.

    fractal-design> sis on /vol/demovol

  2. If you have a new flex volume which was just created, follow this step to enable ASIS deduplication

    fractal-design> sis on /vol/demovol
    Deduplication for “/vol/demovol” is enabled.
    Already existing data could be processed by running “sis start -s /vol/demovol”

  3. If you have already existing flex volume with data in it, follow this step.

    fractal-design> sis start -s /vol/demovol

  4. Checking the status of deduplication.
    fractal-design> vol status demovol
    Volume          State   Status          Options
    VolArchive      online  raid_dp, flex   nosnap=on
                            sis
    Containing aggregate: ‘aggr0’
    fractal-design>

    fractal-design> sis status /vol/demovol
    Path            State   Status      Progress
    /vol/demovol    Enabled Idle        Idle for 00:02:12
    fractal-design>
  5. Check the storage space saved due to deduplication
    fractal-design> df -s /vol/demovol
    Filesystem      used    saved   %saved
    /vol/demovol/   9316052 0       0%
    fractal-design>

  6. If you have to run deduplication at a later point of time on this volume, just do a “sis start /vol/demovol”.
  7. The sis can be scheduled using “sis config” command.
  8. Done.

More netapp blog posts at : http://unixfoo.blogspot.com/search/label/netapp

This post contains the list of commands that will be most used and will come handy when managing or monitoring or troubleshooting a Netapp filer in 7-mode.

  • sysconfig -a : shows hardware configuration with more verbose information
  • sysconfig -d : shows information of the disk attached to the filer
  • version : shows the netapp Ontap OS version.
  • uptime : shows the filer uptime
  • dns info : this shows the dns resolvers, the no of hits and misses and other info
  • nis info : this shows the nis domain name, yp servers etc.
  • rdfile : Like “cat” in Linux, used to read contents of text files/
  • wrfile : Creates/Overwrites a file. Similar to “cat > filename” in Linux
  • aggr status : Shows the aggregate status
  • aggr status -r : Shows the raid configuration, reconstruction information of the disks in filer
  • aggr show_space : Shows the disk usage of the aggreate, WAFL reserve, overheads etc.
  • vol status : Shows the volume information
  • vol status -s : Displays the spare disks on the filer
  • vol status -f : Displays the failed disks on the filer
  • vol status -r : Shows the raid configuration, reconstruction information of the disks
  • df -h : Displays volume disk usage
  • df -i : Shows the inode counts of all the volumes
  • df -Ah : Shows “df” information of the aggregate
  • license : Displays/add/removes license on a netapp filer
  • maxfiles : Displays and adds more inodes to a volume
  • aggr create : Creates aggregate
  • vol create : Creates volume in an aggregate
  • vol offline : Offlines a volume
  • vol online : Onlines a volume
  • vol destroy : Destroys and removes an volume
  • vol size [+|-] : Resize a volume in netapp filer
  • vol options : Displays/Changes volume options in a netapp filer
  • qtree create : Creates qtree
  • qtree status : Displays the status of qtrees
  • quota on : Enables quota on a netapp filer
  • quota off : Disables quota
  • quota resize : Resizes quota
  • quota report : Reports the quota and usage
  • snap list : Displays all snapshots on a volume
  • snap create : Create snapshot
  • snap sched : Schedule snapshot creation
  • snap reserve : Display/set snapshot reserve space in volume
  • /etc/exports : File that manages the NFS exports
  • rdfile /etc/exports : Read the NFS exports file
  • wrfile /etc/exports : Write to NFS exports file
  • exportfs -a : Exports all the filesystems listed in /etc/exports
  • cifs setup : Setup cifs
  • cifs shares : Create/displays cifs shares
  • cifs access : Changes access of cifs shares
  • lun create : Creates iscsi or fcp luns on a netapp filer
  • lun map : Maps lun to an igroup
  • lun show : Show all the luns on a filer
  • igroup create : Creates netapp igroup
  • lun stats : Show lun I/O statistics
  • disk show : Shows all the disk on the filer
  • disk zero spares : Zeros the spare disks
  • disk_fw_update : Upgrades the disk firmware on all disks
  • options : Display/Set options on netapp filer
  • options nfs : Display/Set NFS options
  • options timed : Display/Set NTP options on netapp.
  • options autosupport : Display/Set autosupport options
  • options cifs : Display/Set cifs options
  • options tcp : Display/Set TCP options
  • options net : Display/Set network options
  • ndmpcopy : Initiates ndmpcopy
  • ndmpd status : Displays status of ndmpd
  • ndmpd killall : Terminates all the ndmpd processes.
  • ifconfig : Displays/Sets IP address on a network/vif interface
  • vif create : Creates a VIF (bonding/trunking/teaming)
  • vif status : Displays status of a vif
  • netstat : Displays network statistics
  • sysstat -us 1 : begins a 1 second sample of the filer’s current utilization (crtl – c to end)
  • nfsstat : Shows nfs statistics
  • nfsstat -l : Displays nfs stats per client
  • nfs_hist : Displays nfs historgram
  • statit : beings/ends a performance workload sampling [-b starts / -e ends]
  • stats : Displays stats for every counter on netapp. Read stats man page for more info
  • ifstat : Displays Network interface stats
  • qtree stats : displays I/O stats of qtree
  • environment : display environment status on shelves and chassis of the filer
  • storage show <disk|shelf|adapter> : Shows storage component details
  • snapmirror intialize : Initialize a snapmirror relation
  • snapmirror update : Manually Update snapmirror relation
  • snapmirror resync : Resyns a broken snapmirror
  • snapmirror quiesce : Quiesces a snapmirror bond
  • snapmirror break : Breakes a snapmirror relation
  • snapmirror abort : Abort a running snapmirror
  • snapmirror status : Shows snapmirror status
  • lock status -h : Displays locks held by filer
  • sm_mon : Manage the locks
  • storage download shelf : Installs the shelf firmware
  • software get : Download the Netapp OS software
  • software install : Installs OS
  • download : Updates the installed OS
  • cf status : Displays cluster status
  • cf takeover : Takes over the cluster partner
  • cf giveback : Gives back control to the cluster partner
  • reboot : Reboots a filer
If you are not aware of the complete details of these commands and need more information on these commands, refer the Netapp Data Ontap administration manual from now site.

More netapp blog posts at : http://linux.cloudibee.com/tag/netapp/

Here are some explanations on the columns of netapp sysstat command.

Cache age : The age in minutes of the oldest read-only blocks in the buffer cache. Data in this column indicates how fast read operations are cycling through system memory; when the filer is reading very large files, buffer cache age will be very low. Also if reads are random, the cache age will be low. If you have a performance problem, where the read performance is poor, this number may indicate you need a larger memory system or  analyze the application to reduce the randomness of the workload.

Cache hit : This is the WAFL cache hit rate percentage. This is the percentage of times where WAFL tried to read a data block from disk that and the data was found already cached in memory. A dash in this column indicates that WAFL did not attempt to load any blocks during the measurement interval.

CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are as follows:

  • No CP started during sampling interval (no writes happened to disk at this point of time)
  • number Number of CPs started during sampling interval
  • B Back to back CPs (CP generated CP) (The filer is having a tough time keeping up with writes)
  • b Deferred back to back CPs (CP generated CP) (the back to back condition is getting worse)
  • F CP caused by full NVLog (one half of the nvram log was full, and so was flushed)
  • H CP caused by high water mark (rare to see this. The filer was at half way full on one side of the nvram logs, so decides to write on disk).
  • L CP caused by low water mark
  • S CP caused by snapshot operation
  • T CP caused by timer (every 10 seconds filer data is flushed to disk)
  • U CP caused by flush
  • : continuation of CP from previous interval (means, A cp is still going on, during 1 second intervals)

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:

  • 0 Initializing
  • n Processing normal files
  • s Processing special files
  • f Flushing modified data to disk
  • v Flushing modified superblock to disk

CP util : The Consistency Point (CP) utilization, the % of time spent in a CP.  100% time in CP is a good thing. It means, the amount of time, used out of the cpu, that was dedicated to writing data, 100% of it was used. 75% means, that only 75% of the time allocated to writing data was utilized, which means we wasted 25% of that time. A good CP percentage has to be at or near 100%.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Netapp sysstat is like vmstat and iostat rolled into one command. It reports filer performance statistics like CPU utilization, the amount of disk traffic, and tape traffic. When run with out options, sysstat will prints a new line every 15 seconds, of just a basic amount of information. You have to use control-C (^c) or set the interval count (-c count ) to stop sysstat after time. For more detailed information, use the -u option. For specific information to one particular protocol, you can use other options. I’ll list them here.

  • -f FCP statistics
  • -i iSCSI statistics
  • -b SAN (blocks) extended statistics
  • -u extended utilization statistics
  • -x extended output format. This includes all available output fields. Be aware that this produces output that is longer than 80 columns and is generally intended for “off-line” types of analysis and not for “real-time” viewing.
  • -m Displays multi-processor CPU utilization statistics. In addition to the percentage of the time that one or more CPUs were busy (ANY), the average (AVG) is displayed, as well as, the individual utilization of each processor. This is only handy on multi proc systems. Won’t work on single processor machines.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.