Posts Tagged ‘netapp’

How to assign permissions for Domain User * This didn’t work straight away so had to create local user and auth that way

vserver cifs users-and-groups local-group add-members -vserver %Vserver% -group-name BUILTIN\Administrators -member-names domain\username

Verify permissions 

vserver cifs users-and-groups local-group show-members

cifs share access-control show

vserver security file-directory show -vserver %Vserver% -path /CIFS/Folder

How to assign permissions for Local User 

vserver cifs users-and-groups local-user create -vserver syg-svm03 -user-name CIFSSERVER\adminlocal -full-name “adminlocal”

GD Star Rating
loading...
GD Star Rating
loading...

#Unlock diag user and set password

NETAPP::> security login unlock -username diag

NETAPP::> security login password -username diag

Go into Privileged Mode

NETAPP::> set -privilege advanced

Change to Diag User

NETAPP::> set diag

NETAPP::> systemshell local

 

Once here you can telnet like normal

NETAPP%>telnet mail.domain.com 25

to Break out 

CTRL C and CTRL D 

Relock Diag Account

NETAPP::> security login unlock -username diag

GD Star Rating
loading...
GD Star Rating
loading...

After adding new disks to a controller you might come across an error about Max Raid Size Group

Default Raidsize group depends on size of disk but this will be 16 , FYI – The maximum RAID group size is 28

There does not seem to be an easy way to do this via GUI

Open Cluster Shell , enter the following to get into Aggr Options

aggr

 

Show your existing maxraidsize

 

aggr show -fields maxraidsize

Change the Value

aggr modify -aggregate aggrname -maxraidsize 18

 

GD Star Rating
loading...
GD Star Rating
loading...

 

7 Mode 

  1. Configure SSL on the storage system by issuing the following command:
    filerprompt> secureadmin setup ssl

    Enter the following information when prompted:
    Country Name (2 letter code) [US]:
    State or Province Name (full name) [California]:
    Locality Name (city, town, etc.) [Santa Clara]:
    Organization Name (company) [Your Company]:
    Organization Unit Name (division):
    Common Name (fully qualified domain name) [filer.domain]:
    Administrator email:
    Days until expires [5475] :
    Key length (bits) [512] :


    After entering the requested values, the following message is displayed:
    [rc:info]: Starting SSL with new certificate.
  2. After configuring SSL on the storage system, issue the following command to enable HTTPS access:
    filerprompt> options httpd.admin.ssl.enable on

Disabling HTTP access:
To disable HTTP access to FilerView, issue the following command:
filerprompt> options httpd.admin.enable off

 

Cluster Mode

cluster1::> vserver services web modify -vserver %vservername% -name portal -ssl-only true
 
GD Star Rating
loading...
GD Star Rating
loading...

When using Netapp Snap Mirror to clone Volumes to another site / netapp containing Datastores , you can back this up with Vmware’s Site Recovery Manager to be able to spin up the envrioment in a diaster. Using Netapp Mirroring has advantages such as cloning Physical and Virtual RDM’s as well as coping with DeDuplication.

You should already have a SnapMirror and SRM enabled , if not see this guide

http://vknowledge.net/2012/07/14/srm-tutorial-part-5-configure-netapp-snapmirror/

Now the important thing will be then to Create a Protection Group or Edit ) , and only add the Datastores to the Datastore Groups that are replicated above to the Protection Groups instead of adding the indvidual machines. This will then add all the VM’s on this protection group to SRM

GD Star Rating
loading...
GD Star Rating
loading...

You can loign to here https://signin.netapp.com/oamext/login.html for upgrade advisor for OnTap Data Upgrades , should have shout home stuff about your drives firmware recommendations 

Drive Firmware

Use to find the current firmware

*> storage show disk -x

To do the disk FW upgrade on the background, check the following is enabled:

options raid.background_disk_fw_update.enable

Disk Firmware Update Location : http://mysupport.netapp.com/NOW/download/tools/diskfw/

Download the .LOD files to \\fascontrollerip\c$\etc\disk_fw

Disk firmware files should have been installed correctly to the system. Within two minutes the system should detect and begin updating any eligible drives.

Shelf Firmware 

Check this setting is enabled

options shelf.fw.ndu.enable

 

 

Disk Shelf Update Location : http://mysupport.netapp.com/NOW/download/tools/diskshelf/

Download the .SFW files to \\fascontrollerip\c$\etc\shelf_fw

 

storage download shelf

GD Star Rating
loading...
GD Star Rating
loading...

Recently we had an issue where a Aggrigate on a netapp grew by 10% over the weekend. A check on the volumes showed nothing had grown which suggested something in the Aggrigate layer.

 

aggr status -v
 
df -Agr

We called NetApp Support who confirmed it was a known bug

Summary

 Deduplication identifies and removes duplicate blocks in a volume, storing only unique blocks.

Deduplication requires the use of a certain amount of metadata, including a ‘fingerprint‘ summary to keep track of data in the blocks. When the data in the blocks changes frequently, the fingerprints become stale.

When the sis start command is running, any stale fingerprint metadata is normally detected and removed. If the deletion of stale fingerprint metadata fails, the stale fingerprint data will linger and consume space in the volume, and can significantly slow deduplication processing.

Issue Description

When the sis start command is running on a flex volume, the deduplication subsystem of Data ONTAP performs in several phases:

  • Fingerprint gathering
  • Fingerprint sorting
  • Fingerprint compressing
  • Block sharing

Normally, if the fraction of stale fingerprints in the database increases to greater than 20 percent, an additional ‘fingerprint checking’ phase is also performed, which cleans up the data. However, there is an issue in some releases of Data ONTAP (Data ONTAP 8.1, 8.1.1 and 8.1.2 and P/D-patch derivatives) that might cause the percentage to be calculated incorrectly, such that the checking phase is never performed. For more information, see BUG ID: 657692.

Symptom

The stale fingerprints in the fingerprint database are not deleted; the excess data lingers and consumes space in the volume.

As more stale fingerprints accumulate, the increasing size of the fingerprint metadata increases the deduplication workload on the system, with the sorting and merging phases running for a long time. In aggravated cases, storage clients might experience a slow response.

This issue is more likely to be observed on a volume where there is a lot of file delete activity.

Diagnosis:
To determine if a flex volume on a storage system is experiencing this issue, the output of two administrative commands can be examined for numeric values from which a calculation can be made. The commands are:

  • sis check  -c <vol>
  • sis status -l <vol>

Note: Run the sis check command from the diag node.

For Example:
The output of sis check -c for a volume includes the following lines:
Checking fingerprint  ...  18115836411 records
Number of Segments: 3
Number of Records:  18003077302, 53607122, 59151987
Checking fingerprint.vvol  ...  56538330 records
Checking fingerprint.vvol.delta  ...  2665604040 records


The important value is in the first line, the total of checked records, 18115836411, which will be called ‘TOTALCHECKED’ here.

In the output of sis status -l for the same volume, the following line is included:
Logical Data:                    3509 GB/49 TB (7%)

The important value is displayed first, the size of the logical data, 3509.

Take the logical-data size (in gigabytes) and apply the following calculation, which yields the number of storage blocks occupied by the logical data.
LOGICALBLOCKS = (LOGICALSIZE * 1024 * 1024) / 4
In this case, (3509 * 1024 * 1024) / 4 = 919863296 is the LOGICALBLOCKS value.

To calculate the percentage of stale fingerprints, take the total of checked records from the sis check -c output and use it in the following equation:
PERCENTSTALE = ((TOTALCHECKED - LOGICALBLOCKS) * 100) / LOGICALBLOCKS

In this case, ((18115836411 - 919863296) * 100) / 919863296 gives a PERCENTSTALE result of 1869.

As the result, 1869 is much larger than 20. The conclusion is that the triggering of sis check at 20 percent stale did not occur, and thus the volume and storage system are experiencing the issue.

Workaround

A cleanup of the fingerprint database on a volume impacted by this issue is accomplished by running the following command:
sis start -s <vol>

This is resource intensive and a very long-running process as it deletes (entirely) the old Fingerprint Database to reclaim volume space and then builds a brand new copy of the Fingerprint Database.

If the workload imposed on the storage system by running sis start -s is extremely large, a NetApp Support Engineer can guide the user to use the following advanced-mode command on the impacted volume:
sis check -d <vol>

Note: Dedupe operations for any new data will not be performed while ‘sis check -d‘ is running, expect to use more space from the volume until this command finishes.

In addition, the ‘sis check -d’ command requires an amount of free space in the volume greater than or equal to twice the size of the Fingerprint Database files. You can estimate the size of the Fingerprint Database by running ‘sis check -c‘ and adding the number of records in three files, then multiplying by 32 bytes which is the size of each record. To estimate the amount of free space required, in bytes, use this formula:
Number of records in [fingerprint.vvol + fingerprint.vvol.delta + fingerprint.vvol.delta.old (if present)  ] * 32bytes = records (or database size)

Ensure that there is sufficient free space prior to running ‘sis check -d’.

'sis check -d' is invalid on a snapvault secondary

Solution

Users should upgrade to Data ONTAP release 8.1.2P4 or later.

After upgrading to a release with the fix, running deduplication twice (sis start) on each volume will automatically remove these stale fingerprints.
Note: If there is no new data added to the volume, deduplication will not go through all its phases, including the phase responsible for cleaning up stale fingerprints (Verify Phase). Deletes on the volume would not cause deduplication to initiate. Data deletions from the volume will definitely create stale fingerprint metadata in the volume.

The first deduplication job post upgrade might take longer time than expected. Subsequent operations will complete at normal operating times. This process of removing the stale fingerprints will temporarily consume additional space in both the deduplication enabled FlexVol volumes and their containing aggregates.

Also, to confirm that the controller running Data ONTAP version with the fix is not witnessing this issue, please check sis logs. Logs should have the following two lines mentioned:
<timestamp> /vol/<volname> Begin (sis check)
<timestamp> /vol/<volname> Sis Verify Phase 1

 

You can do the formula in the Link to check if it’s applying to you to check PERCENTSTALE is over 20

Workaround

Run theses on all of the volumes which come back with PERCENTSTALE is over 20 , run once at a time to stop high I/O on San

 

priv set advanced 
 
sis check -c /vol/volume
 
sis status -l /vol/volume

Fix

Upgrade Ontap to new version

 

 

GD Star Rating
loading...
GD Star Rating
loading...

When your LUN useage shows useage that does not match your datastore useage you will need to perform a reclaim on the lun. Remember the LUN will need to be Thin provisioned with Space Reservation Disabled on a NetApp. To check you LUN is supported , login to an ESX Host and run

esxcli storage core device vaai status get

On the volume it need to have :

Delete Status: supported

Recalim can be done manually using the putty commands on a VMWare host attached to the datastore

 

cd /vmfs/volumes/datastorename/
 
vmkfstools -y %percenttoclear%

 

It is recommended that the %percenttoclear% increases from say 10% up to 80% in 10% Blocks

There’s a nice script here : https://kallesplayground.wordpress.com/2014/04/03/storage-reclamation-part1/ to automate the increase

****WARNING**** Do only one reclaim at a time , and it sharpley increases the IOP’s on the San to best to do out of hours

We had to do a few of these out of hours at around 3am , I actually automated this as a scheduled task using putty’s plink.exe which meants it ran in the very early hours of the morning! I couldn’t use the Kallesplaygroup .sh script as this didn’t work sending from plink , the script would have to be on the local esx box which it’s practical to to the multiple hosts our current cluster runs

FYI you will see you have to manually enter the password. A more recommended solution would be to use certificate authentication with putty

https://www.virten.net/2014/02/howto-esxi-ssh-public-key-authentication/

 

plink.exe %esxboxwithsshenabled% -ssh -batch -l root -pw %rootpasswordofesxbox% -m C:\Scripts\filewithabovelinesin.sh > c:\log.txt

 

**** Update in 5.5 this has changed to 

storage vmfs unmap -l Datastore name -n unit

 

 

GD Star Rating
loading...
GD Star Rating
loading...

Netapp has a Gui for DFM , however my experience of this is slow and clunky ( compared to a Nimble SAN ) and hard to find realtime slowdown causes at the SAN I/O level. Vcenter VOps is good for monitoring however it doesn’t have the needed intelligence to be that accurate. It purely guesstimate on unusual spikes and useage. Perfstats logging is the best metric however trying to keep a continual capture until the problem repeats itself is difficult.

Login to the NetApp via SSH and run below on each controller

 

sysstat -ux 1

Keep an eye on the Disk Usage Metric

If you see the disk usage over 80% you might want to drill down deeper to the volume or lun level to find out what’s causing it use :

 

stats show volume::total_ops
 
stats show lun::total_ops

GD Star Rating
loading...
GD Star Rating
loading...