Quantcast
Channel: Splunk Blogs » Adrian Hall
Viewing all 68 articles
Browse latest View live

Monitoring Processes on Windows

$
0
0

We get a lot of questions here at the Splunk Microsoft Practice – not just on our apps (which are awesome starting points for common Microsoft workloads), but also how to do specific things in Windows. One of the things I recently got asked was “how do I get a top-10 type report of processes on a system and who is running them?” This should be fairly straight-forward. After all, Microsoft provides a perfmon object called “Process” – maybe I can just monitor that. Unfortunately, the owner is not available. Ditto with WMI. Once I’ve exhausted the built-in methods of getting information, I turn to my favorite tool – PowerShell.

There are two methods of getting the list of processes on a system. Get-Process is the de-facto standard for getting a process list from PowerShell, but I prefer the WMI approach – Get-WmiObject –class win32_process. The reason for the choice is that the objects that you get back have a bunch of useful methods on them, one of which is GetOwner() that retrieves the owner of the process – just what we are looking for. You can always get the list of things you can do by piping the command to Get-Member. For example:

Get-WmiObject -class win32_process | Get-Member

In order to get the owner information into the objects, we have to do a little work. Joel Bennett assisted with this small scriptlet:

Get-WmiObject –class win32_process |
    Add-Member -MemberType ScriptProperty -PassThru -Name Username -Value {
        $ud = $this.GetOwner();
        $user=$ud.Domain+"\"+$ud.User;
        if ($user -eq "\") { "SYSTEM" } else { $user }
    }

Although I have split this over multiple lines for readability, you should type this all on the same line. What this does is add an “Owner” property to each object in the pipeline, and it gets the value by called GetOwner() on the object. There is a special case when the process does not have an owner, and in this case, we set the owner to “SYSTEM”.

You will notice an awful lot of properties being returned when you run this command. We will fix that when we start importing it into Splunk. Speaking of which, how do we do that? We turn to one of my favorite addons – SA-ModularInput-PowerShell. You can download it from Splunkbase. This addon persists a PowerShell scripting host for running scripts and gathering the results. Any objects that are output by our script are converted into key-value pairs and sent on to Splunk. You need to install the .NET 4.5 framework and WinRM 3.0 as well as the Splunk Universal Forwarder for Windows.

Since the SA-ModularInput-PowerShell addon does not define any scripts, you need to add your script to the inputs.conf of an app. Our script would appear like this:

[powershell://Processes]
script = Get-WmiObject -class win32_process | Add-Member -MemberType ScriptProperty -PassThru -Name Username -Value { $ud = $this.GetOwner();  $user=$ud.Domain+"\"+$ud.User;  if ($user -eq "\") { "SYSTEM" } else { $user } }|select ProcessId, Name, Username, Priority, ReadOperationCount, WriteOperationCount, CreationDate, Handle, VirtualSize, WorkingSetSize, UserModeTime, ThreadCount
schedule = 0,15,30,45 * * ? * *
source = PowerShell
sourcetype = PowerShell:Process

Our script is fairly evident, but we have added a Select to limit the properties that are sent on to Splunk. I’ve picked some interesting ones around memory usage, thread counts and IOPS. The schedule will be recognizable as a cron-style scheduler. The SA-ModularInput-PowerShell is based on Quartz.NET – a well known open-source scheduling system for the .NET framework.

Once the data is flowing into Splunk (check the splunkd.log file if it isn’t), we need a search that will get us the processes at any given time. Here is my search:

sourcetype=PowerShell:Process |
    stats count as Polls,
        latest(Name) as Name,
        latest(Username) as Username,
        latest(Priority) as Priority,
        max(ReadOperationCount) as ReadOperationCount,
        max(WriteOperationCount) as WriteOperationCount,
        latest(Handle) as Handle,
        max(VirtualSize) as VirtualSize,
        latest(WorkingSetSize) as WorkingSetSize,
        latest(UserModeTime) as UserModeTime,
        max(ThreadCount) as ThreadCount by host,ProcessId,CreationDate

Again, run this all together on the same line – it’s just split up for readability. We need the CreationDate field because a ProcessId can be recycled on a given host. By utilizing the host, ProcessId and CreationDate, we get a unique key to identify each process. I normally place useful searches like this in a macro – either by editing my macros.conf file or in the Manager. I’ve named my macro “all-windows-processes”.

So, what about that top-ten processes. Well, it depends on how you measure the top ten. Here are some interesting searches using that macro:

Top 10 Processes run by users that have the largest virtual memory footprint

`all-windows-processes` | search Username!=”SYSTEM” | top VirtualSize

Top 10 Processes that have the largest amount of disk activity

`all-windows-processes` | eval DiskActivity = ReadOperationCount + WriteOperationCount | top DiskActivity

Top 10 Users that are running the most processes

`all-windows-processes` | stats count by Username,host | top count

Top 10 longest running user processes

`all-windows-processes` | search Username!=”SYSTEM” | top Polls

Hopefully, this gives you some ideas on what you can do to monitor processes on your Windows systems, and if you are wondering how to monitor something on your Windows systems, let us know at microsoft@splunk.com or use Ask an Expert – just look for my picture.


Catching Errors in PowerShell

$
0
0

I’ve been recently writing a lot of PowerShell for the SA-ModularInput-PowerShell addon. It’s amazingly flexible at capturing data that is embedded in the .NET framework and many Microsoft products provide convenient access to their monitoring counters via PowerShell. This modular input can replace perfmon, regmon, WMI and all the other things we used to use for monitoring Windows boxes. However, sometimes bad things happen. Scripts don’t work as expected. In the Splunk world, permissions, connectivity and other problems make the diagnosis of scripted inputs a problem. I can run the script myself and get the right stuff, but when I put it in an inputs.conf file, it breaks.

One way to get some diagnostics in there is to ensure the script throws exceptions when necessary and then use a wrapper script to capture those exceptions and produce log events from them. We use this a lot within new apps, and if you have signed up for the Splunk App for SQL Server Beta Program, you will know that all our PowerShell scripts are wrapped in this manner. You can download and view the script on Github, so I am not going to reproduce it here.

This script traps errors. Along the way it writes out two events for you. The first (with sourcetype=PowerShell:ScriptExecutionSummary) contains an Identity field (more on that later), InvocationLine and TerminatingError fields. The more important one from a diagnostics point of view is the second event (with sourcetype=PowerShell:ScriptExecutionErrorRecord) has a ParentIdentity (which matches the Identity field from the first event so you can correlate the two events), and all the Error information as fields. Just in case that wans’t enough, it adds timing information to the ScriptExecutionSummary so you can see how long your script is running.

Using this script is easy. In your addon, create a bin directory for your PowerShell scripts and place the above script in the bin directory as “Invoke-MonitoredScript.ps1” as well. Let’s take a look at the normal running of a script and the wrapped version. Here is our normal inputs.conf stanza for a typical script, taken from the addon for Microsoft SQL Server:

[powershell://DBInstances]
script = & "$SplunkHome\etc\apps\TA-SQLServer\bin\dbinstances.ps1"
schedule = 0 */5 * ? * *
index = mssql
sourcetype = MSSQL:Instance:Information
source = Powershell

Now let’s take a look at the modified version for producing the error information:

[powershell://DBInstances]
script = & "$SplunkHome\etc\apps\TA-SQLServer\bin\Invoke-MonitoredScript.ps1" -Command ".\dbinstances.ps1"
schedule = 0 */5 * ? * *
index = mssql
sourcetype = MSSQL:Instance:Information
source = Powershell

The script you want to run is not affected – only the execution of the script is adjusted. Now you will be able to see any errors that are produced within the monitored script. I have added an Errors dashboard that shows the errors I get combined with the parent invocation information to show timing as well.

Audit File Access and Change in Windows

$
0
0

One of the bigger problems that we come across is auditing of file systems – specifically, you want to know who read, modified, deleted or created files in a shared area. This is not an unreasonable task, but it is different in every single operating system. Windows has built-in facilities for doing this. We just need to do a few things to get the information into Splunk.

• Object Access Auditing needs to be turned on
• The Shared Folder needs to have auditing enabled
• You need to collect and interpret events from the system

To turn on object access auditing, you need to alter the local security policy. This can be done centrally via a group policy object or it can be done on the local machine. You may even have this turned on already. To turn on object access audit using the local security policy, following this process:

1. Open up Administrative Tools -> Local Security Policy, or run secpol.msc
2. Open Local Policies -> Audit Policy
3. Right-click on “Object Access Audit” and select Properties
4. Ensure “Success” and “Failure” are both checked
5. Click on OK, then close the Local Security Policy window.

You can do a similar thing in group policy – create a new group policy object, edit it, open Computer Configuration and find the Local Security Policy, then adjust as described above, save it and then apply it to some machines in the normal manner. Once it is distributed (which happens roughly every 4 hours by default), your selected systems will have audit forced on.

The next piece is to turn on auditing for a specific folder (and all its sub-folders and files). You normally do this for only a select few places and users, since the information generated is very chatty. For each folder, following this process:

1. Open up the File Explorer by right-clicking and selecting Run As Administrator.
2. Browse to the folder you want to turn auditing on.
3. Right-click on the folder and select Properties.
4. Select the Security Tab.
5. Click on Advanced, then Auditing.
6. Click on Add
7. Enter the name of the users you wish auditing (Everyone is usually a good choice!), click on Find Now to ensure it is registered, then click on OK
8. Check the Successful and Failed boxes, then click on OK
9. Close the windows by clicking OK

Remember that the exact process changes slightly between versions of Windows Server, so be aware that the exact paths may be slightly modified, but they will be called the same thing.

You should be able to see audit information in your Security event log. The final step is to make that information appear in a Splunk instance. I generally install the Splunk Universal Forwarder on the host and deploy the Splunk_TA_windows to the host. This is an essential add-on that collects the Windows Security Event Log by default for you.

Once you are gathering the data, you will see four distinct event codes produces. On NT5 systems (Windows Server 2003 and prior), event codes 560 (open object) and 562 (close object) are produced. On NT6 systems (Windows Server 2008 and later), codes 4656 (open object) and 4658 (close object) are created. Here is an example of Event Code 4656

A handle to an object was requested.
Subject:
   Security ID:  SHELL\ahall
   Account Name:  ahall
   Account Domain:  SHELL
   Logon ID:  0x1ff76
Object:
   Object Server:  Security
   Object Type:  File
   Object Name:  C:\Finance\Accounts.xlsx
   Handle ID:  0x994678
Process Information:
   Process ID:  0xff1
   Process Name:  C:\Program Files\Microsoft Office\Office15\EXCEL.EXE
Access Request Information:
   Transaction ID:  {00000000-0000-0000-0000-000000000000}
   Accesses:  READ_CONTROL
     SYNCHRONIZE
     ReadData
     ReadEA
     ReadAttributes
   Access Mask:  0x120089
   Privileges Used for Access Check: -
   Restricted SID Count: 0

You can see the person who is accessing the resource, the resource itself and the program used to access the resource are all available. In addition, the Logon ID is available. If you have Account Logon Audit turned on, then a logon EventCode (528, 540, 4624) will have been logged from the same machine with the same Logon ID. In addition, you can see how long the file was opened by looking for a corresponding close from the same host with the same Handle ID.

On my search head, I have defined a new event type called “windows-fileaudit” – this is defined in eventtypes.conf, but you can also define it in the Manager. Add this to your eventtypes.conf:

[windows-fileaudit]
search = sourcetype=WinEventLog:Security (EventCode=560 OR EventCode=562 OR EventCode=4656 OR EventCode=4658)

As an example, let’s find all the accesses to the C:\Finance area on host FINANCE, who opened the files and how long they had them open for.

eventtype=windows-fileaudit host=FINANCE Object_Type=”File” Object_Name=”C:\\Finance\\*” | eval CodeType=if(EventCode=560 OR EventCode=4656,”Open”,”Close”) | transaction startswith=CodeType=Open endswith=CodeType=Close host Handle_ID | table _time Security_ID Object_Name Process_Name duration

One word of warning in closing. The object access audit security log events are extremely chatty, so you may want to look at methods of controlling what gets indexed, and perhaps set up a small free version of Splunk to allow you to discover how much data will be logged before moving the data over to your main Splunk index.

Statistics and Windows Perfmon

$
0
0

Sometimes, things that you expect to be trivial are less so, and you learn by experience of the pitfalls that you may fall into. One such this is Windows Perfmon. In order to save valuable license space, the Splunk Perfmon implementation squashed zero values. In other words, a zero value is not logged. This is normally not a big deal – after all, if you are recording a time chart of the % Processor Time, you might do something like this:

index=perfmon counter=”% Processor Time” instance=”_Total” | timechart avg(Value) by host

When you turn this into a chart, you can specify that null values be rendered as zero and you have a nice chart.

But is it correct?

Let’s say you are monitoring your perfmon counters every minute. At each minute interval, the splunk-perfmon.exe process wakes up and polls for the counter value. If it is not zero, it emits an event with the value. Let’s look at an example. Let’s say your counter is the number of current connections to the IIS process. We monitor this every 60 seconds and get our results. Now, when we do our timechart, we put these values into time-span buckets. Maybe our bucket is every 5 minutes. If the bucket is full, then the value reported by avg(Value) is correct. Similarly, if the bucket is empty, then the value is null, which is handled properly by the nullValueMode on the chart. But what if the bucket is partially full?

In our example, let’s say we get the following samples: {0,3,2,0,0} for our five one-minute intervals. The average for this set is 1 (a total of 5 connections divided by the 5 sample entries). But zero-values are squashed (i.e. not emitted), so what the timechart sees in the bucket is {3,2} for an average of 2.5. This is way off what we expect. The unfortunate thing here is that we don’t know why the zero is squashed – it could be because the value is zero, but it could also be because the server is down. When doing statistical analysis, zero is relevant, so we need to fix that.

Fortunately, we have a good way to correct this. Go on to your Splunk Universal Forwarder and edit the file %SPLUNK_HOME%\bin\scripts\splunk-perfmon.path. This is a text file and contains the following:

$SPLUNK_HOME\bin\splunk-perfmon.exe

Basically, our path file executes the normal executable. However, splunk-perfmon.exe also can take arguments, and one is of interest to us. Change this file to:

$SPLUNK_HOME\bin\splunk-perfmon.exe -showzero

The showzero argument tells the splunk-perfmon.exe to emit zero values. Now you can do statistical analysis on your perfmon data. This not only includes averages, but statistics like 95th percentile.

There are a couple of obvious caveats here:
1. This is a system-wide change. All the perfmon data from all apps will now record zero values.
2. This will increase your license usage. How much? That all depends on how many zero values you are getting.

Ultimately, the decision rests with you – do you do statistics on your perfmon data? If so, you need to make this change. If your needs are a little less statistical (maybe correlation with the Windows Event Logs), then you probably don’t need this change.

SharePoint, PowerShell and Network Latency

$
0
0

I was listening to Todd Klindt and Shane Young at TechEd North America this year (the link goes to the recording of the session). The session was on the basics of SharePoint 2013 administration. During the session, Todd mentioned an interesting metric – all the servers in a SharePoint farm should have no more than 1ms of network latency between them.  I, of course, had what I thought was a perfectly reasonable question – how do you go about monitoring that? The answer was a little less than satisfactory to me as I am somewhat of a monitoring guy. Use ping, but only when you suspect latency is an issue.

Problems rarely happen at convenient times when I can measure latency in this way. Normally, it occurs when I like to be asleep, which is why I like automated collection of metrics. How do we automate this collection and display it in a reasonable manner? A little scripting later and the use of my favorite tools – I now have a solution. Let’s take a look at the script first.

$LocalIPAddresses = Get-LocalIPAddress
Get-NetworkConnections `
| Where-Object { $LocalIPAddresses -contains $_.LocalAddress -and $_.State -eq "ESTABLISHED" } `
| Select-Object -Unique RemoteAddress `
| Foreach-Object { Test-Connection -Count 1 -ComputerName $_.RemoteAddress -ErrorAction SilentlyContinue } `
| Select-Object Address,ResponseTime

Ok – I cheated a little bit here. There are a couple of cmdlets that I failed to include, and I will get to them shortly. Firstly, let’s take a look at the basics. My first action is to get a list of local IP addresses. I want to record the latency of the connections that this computer makes outbound, so it makes sense to only log connections originating from one of these IP addresses. Then I get a list of network connections (more on this later), filter out those connections where this computer is not the origin, de-duplicate the resulting list, then use Test-Connection to ping the other computer, and finally record the address and the response time.

Getting the list of local IP addresses is found within the WMI Win32_NetworkAdapterConfiguration class, and my function for getting this list is simple:

function Get-LocalIPAddress
{
    $AdapterSet = Get-WmiObject -Class Win32_NetworkAdapterConfiguration
    $IPAddressSet = @()
    foreach ($adapter in $AdapterSet) {
        foreach ($ipaddress in $adapter.IPAddress) {
            $IPAddressSet += $ipaddress
        }
    }
    $IPAddressSet
}

Each network adapter has a list of associated IPv4 and IPv6 addresses, so we loop over each network adapter and get the list of addresses, then add each address individually. The Get-NetworkConnections function uses the netstat –no command to get a list of current connections. The code is slightly longer, so I’ve put it on Github (along with the complete script), so feel to download. When you run the command, you get something akin to the following:

Address      : 216.221.226.22
ResponseTime : 302

Address      : 64.78.56.109
ResponseTime : 68

Address      : 165.254.99.48
ResponseTime : 95

Address      : 74.112.184.86
ResponseTime : 65

Address      : fe80::b51b:8187:4746:9b7e
ResponseTime : 0

Our next task is to get the data into Splunk on a regular basis. For this, I turn to the SA-ModularInput-PowerShell add-on, which runs PowerShell scripts in a host process on a regular basis. All I need to do is to add the following to the inputs.conf of my app:

[powershell://NetworkLatency]
script = & "$SplunkHome\etc\apps\MyApp\bin\get-networkconnections.ps1"
schedule = 0 */5 * ? * *
sourcetype = MSWindows:NetworkLatency
source = Powershell

As you can see, I run this every 5 minutes so that I can get continuous information on network latency throughout the day. I have a lookup in an upcoming app that provides a list of SQL Servers that my SharePoint servers communicate with, so now I can monitor just the latency values that matter with this search

sourcetype=MSWindows:NetworkLatency | lookup SPServer Address OUTPUT Type | where Type=”SQLServer” | timechart ResponseTime by Address

In closing, if it can go wrong, it’s worth monitoring. If you monitor it, you can correlate the changes with other things going on within your systems. This allows you to get to the root cause of a failure much quicker, which allows you to reduce the time your service is off the air.

PowerShell Profiles and Add-Path

$
0
0

I often blog about Splunk, but that’s not the only thing that is on my mind. One of the more common things on my mind is PowerShell and how it has affected how I do my work. It’s been hugely impactful. However, it does require a little bit of forethought in terms of setting up your environment. When you first get started with PowerShell, you double-click on the little PS icon and get a perfectly suitable environment for doing basic tasks. However, it can be improved. I used to be a Linux administrator and used the Korn shell for my work. In order to set up my environment, I used a .kshrc file. Similarly, PowerShell has a profile that you can use to customize your environment.

First things first – you need to create a place for your environment. This does it:

PS> mkdir ~\Documents\WindowsPowerShell

Now that you are there, you can edit your profile. You can see what file is going to be edited using:

PS> $profile
C:\Users\Adrian\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

There is a different file for the Integrated Scripting Environment (ISE), so you can have one profile for your PS> prompt and another for the ISE. What you put in there is up to you. One of the things that I do is to set up my development environment. I edit the XML and configuration files with Notepad++ (which you can download for free from their web site). However, that is not added to the PATH by default. I’ve added a short cmdlet for processing the PATH (called Add-Path) and then I use that to alter the path. It’s probably not the best way of doing this (and PowerShell purists can correct me if they like, or point me to http://poshcode.org). But – like many scripting languages – there are many ways of completing the same task, and this is mine. You can also find this function on my Github at https://gist.github.com/adrianhall/956311662fc2a218d9fa

function Add-Path {
  <#
    .SYNOPSIS
      Adds a Directory to the Current Path
    .DESCRIPTION
      Add a directory to the current path.  This is useful for
      temporary changes to the path or, when run from your
      profile, for adjusting the path within your powershell
      prompt.
    .EXAMPLE
      Add-Path -Directory "C:\Program Files\Notepad++"
    .PARAMETER Directory
      The name of the directory to add to the current path.
  #>

  [CmdletBinding()]
  param (
    [Parameter(
      Mandatory=$True,
      ValueFromPipeline=$True,
      ValueFromPipelineByPropertyName=$True,
      HelpMessage='What directory would you like to add?')]
    [Alias('dir')]
    [string[]]$Directory
  )

  PROCESS {
    $Path = $env:PATH.Split(';')

    foreach ($dir in $Directory) {
      if ($Path -contains $dir) {
        Write-Verbose "$dir is already present in PATH"
      } else {
        if (-not (Test-Path $dir)) {
          Write-Verbose "$dir does not exist in the filesystem"
        } else {
          $Path += $dir
        }
      }
    }

    $env:PATH = [String]::Join(';', $Path)
  }
}

Add-Path -Directory “C:\Program Files (x86)\Notepad++”
Set-Alias edit notepad++.exe

Add-Path –Directory “C:\Program Files\Splunk\bin”
Add-Path –Directory “C:\Program Files (x86)\PuTTY”

My profile gives me access to the handy Add-Path cmdlet, adds a few directories to my path to set up ssh (I use PuTTY), Splunk and my editor, and then sets up an alias so I can edit files with an “edit” command. Of course, I do other things in my PowerShell prompt, such as setting up GitHub and remote access privileges for my remote instances, giving me the ability to run “connect <host>” where the host is picked up from a CSV file and transitioned to an IP address – all designed to make my working time as productive as possible. Ultimately, what you put in your profile will depend on how you work and what you need.

Do you have an idea of something you put in your profile, or a favorite tool you can’t do without? Let me know via Twitter at @splunk_ahall.

Monitoring Local Administrators on Remote Windows Systems

$
0
0

One of our field people asked me if we could use the Splunk App for Active Directory to monitor Local Administrators on a list of hosts. The Splunk App for Active Directory monitors domain administrators (mostly through the SA-ldapsearch application, which provides custom commands for retrieving LDAP groups, and the Windows Security Event Log, which provides change monitoring through the audit configuration on a domain controller). So I put my thinking cap on and came up with this WMI methodology.

Firstly, a bit of background. WMI is the Windows Management Instrumentation – a sub-system within Windows that allows remote and local users to query the internals of the Windows OS. Most Splunkers use this to get things like the Win32_BIOS information, remote perfmon and event logs and similar things. We are going to use this for getting the contents of the local users and groups table.

WMI is split into classes, and the class we want is called Win32_GroupUser. Click on the link to get more detailed MSDN documentation on the class. We need a simple entry in a wmi.conf file like this:

[WMI:LocalAdmins]
interval = 3600
disabled = 0
wql = SELECT * FROM Win32_GroupUser

This will give us an entry per user within a group. Each event looks like this:

20130726124002.422783
GroupComponent=\\SQL12\root\cimv2:Win32_Group.Domain="SQL12",Name="Administrators"
PartComponent=\\SQL12\root\cimv2:Win32_UserAccount.Domain="SQL12",Name="Administrator"
wmi_type=UserGroup

This is from my SQL server SQL12, so you can see all the entries are for the local machine. We can now do searches with this. The main one is to provide the list of users on each host that are in the Administrators group:

sourcetype="WMI:LocalAdmins" Name="Administrators" |
rex field=_raw "PartComp.*?,Name=\"(?<UserName>[^\"]+)\"" |
dedup host,Name,UserName |
transaction host,Name |
table host,UserName

For each host that is reporting, you will see a line with the name of the host and the list of local administrators. You can even monitor this remotely through WMI by adding a server list to the wmi.conf stanza.

Want to monitor more Windows stuff? Let us know what you want to monitor and it will make its way into a future blog post!

Detecting Your Hypervisor from within a Windows Guest OS

$
0
0

Let’s face it – most of our applications run on hypervisors – Microsoft Hyper-V, VMware or Citrix XenServer seem to be the top contenders. This makes our technology stacks that much more complex since we have added a layer of abstraction between the application and the bare metal. Instead of a stack that includes Compute, Storage, OS, and Application, we’ve added Hypervisor to the mix. How do we correlate what is happening on the compute platform to what is happening on the application level? How do we understand which other applications are running on the same hypervisor? One common instance, for instance, is in memory management. An application runs out of memory, but the hypervisor has memory that has not been allocated to the guest because the memory metric from the hypervisor perspective doesn’t reflect the fact that the application is under memory stress (which is generally because the hypervisor has no visibility into how guests use the allocated memory, so they can’t see the difference between cache memory, paged memory, and pooled memory).

The key to all of this is understanding our correlation points. In the case of hypervisors, the most obvious correlation points are the MAC address of the guest OS and the type of Hypervisor that the guest is running on. For this work, we will turn to my favorite data input workhorse, the SA-ModularInput-PowerShell addon. With this addon, we can write small PowerShell scripts that run on a regular basis to capture the information. Since the SA-ModularInput-PowerShell is based on PowerShell 3.0, we have a couple of thousand PowerShell cmdlets to choose from. Normally, we will be monitoring the guest OS, so let’s get this correlation information from there.

Let’s start with getting the MAC address of the guest OS. One of the many cmdlets in the PowerShell 3.0 set is the Get-NetAdapter cmdlet. This returns an object per “real” interface. The command I use is:

Get-NetAdapter | Select-Object -Property Name,MacAddress,LinkSpeed

Here is an example output from my VMware server:

Name                            MacAddress                      LinkSpeed
----                            ----------                      ---------
Ethernet 8                      8A-AF-38-2E-D8-D1               1 Gbps
Ethernet 5                      0A-75-BB-0D-CF-D7               1 Gbps
Ethernet 7                      9A-0F-47-69-CD-D8               1 Gbps
Ethernet 6                      EE-2D-D1-D5-58-75               1 Gbps

This is all good information that we will need to accomplish the first task in our list. If you want a correlation between the IP address and the network adapter, then you can add ifIndex to the list of properties and use the following command to get the list of IP addresses:

Get-NetIPAddress | Where-Object PrefixOrigin -ne "WellKnown"

Our network adapter information does not include the hypervisor information. For this, we need WMI information – in this case, the Win32_ComputerSystem class. This has a property called Manufacturer that follows a standard format:

Get-WmiObject –query ‘select * from Win32_ComputerSystem’
Domain              : bd.splunk.com
Manufacturer        : Xen
Model               : HVM domU
Name                : BD-XD7-01
PrimaryOwnerName    : Windows User
TotalPhysicalMemory : 1069137920

This gives you a bunch of useful information, so much so that I do this query on all my Windows systems. For our purposes, I will note that Manufacturer line. This is a standard value:

 Manufacturer Value  Hypervisor
 Xen  Citrix XenServer
 VMware, Inc.  VMware ESXi
 Microsoft Hyper-V  Microsoft Hyper-V

If the host is not housed on a Hypervisor, then the manufacturer will be a PC manufacturer like “Lenovo”, “Dell, Inc.” or “Hewlett-Packard”. Now that we have that, we can add the hypervisor information to our network adapter information to get a combined lookup:

Get-NetAdapter | `
    Select-Object Name,MacAddress,LinkSpeed | `
    Add-Member -PassThru -Name HWManufacturer -Value (gwmi -query 'Select * From Win32_ComputerSystem').Manufacturer

Even better, we can correlate the hypervisor, IP Address and Mac Address together for a great correlation lookup:

Get-NetIPAddress | Where Prefix-Origin -ne "WellKnown" | `
    Select IPAddress,AddressFamily, `
        @{n='MacAddress';e={(Get-NetAdapter -InterfaceIndex $_.ifIndex).MacAddress}}, `
        @{n='Manufacturer';e={(Get-WmiObject -query 'SELECT * FROM Win32_ComputerSystem').Manufacturer}}

This syntax may be a little unusual to the PowerShell novice. It is known as a computed property and allows you to use the results of other cmdlets (or indeed any PowerShell script) as a value in the object that is created.

Now that we have our little script ready, we can run this on a regular basis – say, at 2am each day – by adding it to an inputs.conf file:

script = Get-NetIPAddress | Where Prefix-Origin -ne "WellKnown" | Select IPAddress,AddressFamily, @{n='MacAddress';e={(Get-NetAdapter -InterfaceIndex $_.ifIndex).MacAddress}}, @{n='Manufacturer';e={(Get-WmiObject -query 'SELECT * FROM Win32_ComputerSystem').Manufacturer}}
schedule = 0 0 2 * ? *
sourcetype = PowerShell:NetAdapter

Yes – that script line needs to be typed all on the same line. You will get four fields in each event – an IP address, address family (IPv4 or IPv6), a MAC address and a manufacturer. Now you can create a lookup within Splunk for easy correlations:

sourcetype = PowerShell:NetAdapter | stats values(MacAddress) as MacAddress, values(Manufacturer) by as Manufacturer by host,IPAddress | outputlookup HostIPInformation

Turn this search into a saved search and run it every 24 hours to get the right information. Finally, we need to use this information. Let’s say you have a search that outputs an IP address and you want to know if it’s on a hypervisor, how about something like this:

`mysearch` | lookup HostIPInformation src_ip as IPAddress OUTPUT Manufacturer,MacAddress | eval IsHypervisor=case(Manufacturer=="*VMware*",true,Manufacturer=="*Xen*",true,Manufacturer=="*Hyper-V*",true,host=*,false)

You can use this information to correlate the applications running on the guest OS to the hypervisor it is in by using the Splunk App for VMware or the Splunk App for Server Virtualization.


Splunk Universal Forwarders and the Domain User

$
0
0

One of the things that you have to decide right up front on Windows is how to run the Universal Forwarder. For most situations, running as the Local System account is adequate, providing access to all necessary resources. Other times, you need to run as a domain user; either because of local security policies or because what you are monitoring requires a domain account. For example, SharePoint, SQL Server and remote WMI access all require a domain account. I’ve blogged about how to do the necessary security changes using GPO before, but GPO has some drawbacks. The most notable one is that you cannot have different group policies managing the user rights because the last group policy will overwrite the earlier ones.

As a result, many organizations decide to leave the user rights assignment to the local security policy, which means you now have to go through all of your Windows hosts that require a domain account to run Splunk and update the local security policy. What we all need is a scripted method of doing all the changes necessary to install the Splunk Universal Forwarder so we can install to hundreds of hosts using a remoting method like PowerShell.

Fortunately, Microsoft likes large enterprises and has provided tools to allow us to do this. We first need to create a single system with the right local security policy. Just log on to your favorite test machine and do the changes to the local security policy. Then open up a PowerShell prompt as the Administrator and run the following command:

secedit /export /cfg splunk-lsp.inf /areas USER_RIGHTS

Secedit is a useful command that exports and imports the security configuration. This command will create a small text file for us to edit. Before we edit the exported file, we need to know the Security Identifier (or SID) of the user that will run Splunk, normally specified as DOMAIN\user – in my case, it’s BD\sp-domain. I can find the SID by using this PowerShell snippet:

$user = New-Object System.Security.Principal.NTAccount("BD\sp-domain")
$user.Translate([System.Security.Principal.SecurityIdentifier]).Value

This will produce a string starting with S- and with a whole lot of numbers after it. We will need this number to recognize our user in the inf file we created in the first step. Our next step is to edit the splunk-lsp.inf file so that it only includes the local security rights we are interested in. Here is my resulting file:

[Unicode]
Unicode=yes

[Privilege Rights]
SeTcbPrivilege = *S-1-5-21-2882450500-3417635276-1240590811-1206
SeChangeNotifyPrivilege = *S-1-1-0,*S-1-5-19,*S-1-5-20,*S-1-5-21-2882450500-3417635276-1240590811-1206,*S-1-5-32-544,*S-1-5-32-545,*S-1-5-32-551,*S-1-5-90-0
SeBatchLogonRight = *S-1-5-21-2882450500-3417635276-1240590811-1206,*S-1-5-32-544,*S-1-5-32-551,*S-1-5-32-559
SeServiceLogonRight = *S-1-5-21-2882450500-3417635276-1240590811-1206,*S-1-5-80-0
SeSystemProfilePrivilege = *S-1-5-32-544,*S-1-5-80-3139157870-2983391045-3678747466-658725712-1809340420
SeAssignPrimaryTokenPrivilege = *S-1-5-19,*S-1-5-20,*S-1-5-21-2882450500-3417635276-1240590811-1206

You will note that this file has six privileges, not five as per the Splunk installation manual. That’s because there is not a one-to-one relationship from the displayed privileges in the Local System Policy to the security policy underlying those privileges. You can read all about the other policy decisions in the file C:\Windows\inf\defltsv.inf.

Now that you have the security policy file, you have one more task before bulk installation. You have to add the designated user to your local administrators group. This can be done through a GPO but you can do this with the following PowerShell ADSI command:

([ADSI]"WinNT://${env:COMPUTERNAME}/Administrators,group").Add("WinNT://BD/sp-domain")

Now you can create an installer script for your Splunk Universal Forwarder. Most organizations have a software repository that is mounted automatically. I mount mine at S:\ and the Splunk stuff is in the S:\Splunk area. My installer script is called “installad.ps1″, and here it is:

secedit /import /cfg S:\Splunk\splunk-lsp.inf /db C:\splunk-lsp.sdb
secedit /configure /db C:\splunk-lsp.sdb
Remove-Item C:\splunk-lsp.sdb
([ADSI]"WinNT://${env:COMPUTERNAME}/Administrators,group").Add("WinNT://BD/sp-domain")
msiexec.exe /i splunkforwarder.msi AGREETOLICENSE=Yes DEPLOYMENT_SERVER="sp-deploy:8089" LOGON_USERNAME="BD\sp-domain" LOGON_PASSWORD="changeme" INSTALL_SHORTCUT=0 /quiet

With a little planning and preparation, you can deploy the Splunk Universal Forwarder across your domain in a very automated fashion.

PowerShell version 2

$
0
0

By now, you are probably aware that I love PowerShell as a method of getting things on Windows. It’s your one stop method for getting all sorts of nice things. However, our SA-ModularInput-PowerShell module had certain limitations. Most notably, it could only work with .NET 4.5 and CLR4 – aka PowerShell v3. This was great for your one-off scripts where you weren’t adding in any plug-ins. In particular, Microsoft applications such as SharePoint 2010 and Exchange 2007 require PowerShell v2 support because their plug-ins are distributed for .NET Framework 3.5.

I’m happy to announce that one of our PowerShell MVPs – Joel Bennett – has updated the Splunk Addon for Microsoft PowerShell to support .NET 3.5 and CLR 2.

There are a couple of common gotchas. The first is in handling PowerShell snap-ins using the Add-PsSnapIn cmdlet. If the cmdlet is run twice in a row, then an error occurs. The problem is that our resident PowerShell host continually runs. The major performance increase obtained between the SA-ModularInput-PowerShell and a standard scripted input is that you aren’t spinning up a PowerShell executable every time – it’s always running. That also means that any snapin that you load is perpetually in memory.

You can ignore errors by utilizing the -ErrorAction parameter, like this:

Add-PsSnapIn Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue

The second problem that is common is that you actually need to use Select-Object at the end to ensure that the modular input knows what to log and what not to log. There are a lot of properties and methods on a typical PowerShell object and most of them are ignored. For example, check out this simple usage:

[powershell2://test-service-health]
script = Get-Service | Select-Object Name, DisplayName, Status
schedule = 0 0/5 * ? * *

In this input, we are grabbing the services that are running on the local machine. However, if we don’t select the properties we want, the call will fail because one of the properties is a ServiceHandle, which is not available and we get an error instead. Rule of Thumb is to always end your pipeline with a Select-Object to get the things you are interested in.

My final advice is on errors. We now have two PowerShell hosts, each with different requirements. When installed on a standard Windows Server 2008R2 host with no updates, only PowerShell2.exe will be running because the .NET Framework 4.5 is not available. You will see errors in the splunkd.log pertaining to the inability to start the PowerShell.exe. In a similar manner, when installed on a standard Windows Server 2012 host, only PowerShell.exe will be running because the .NET Framework 3.5 is not available. All our logs are available in the _internal index, so you can do a search for “index=_internal powershell” to find all the problems with PowerShell scripts.

Finally, check out my other posts on using PowerShell!

Exporting Large Results Sets to CSV

$
0
0

You want to get data out of Splunk. So you do the search you want and create the table you want in the search app. The results are hundreds of thousands of rows, which is good. So you click on the Export button and download the results to CSV. When you open the file, you see 50,000 rows. Is this a common problem? Not really. It’s a large enough result set that most people want to keep it in Splunk for analysis. However, there are times when such a large export is required. You really don’t want to log on to the Splunk server to get it either. So how do you progress?

I recently bumped into this problem myself while working on a new app. When developing a new app, we don’t work on production data. That’s a bad idea. However, we have an event generator that allows us to replay log files into our test environment so that we have a large data set to work with. In my case, this was a ULS log set from a SharePoint farm. One days log can be several hundred megabytes of data. However, the production data is in California and I was in Australia.

Fortunately, we do have tools available to do this. One of these is the RESTful interface to the backend of the search head. This is great for developers (and if you are one of these developers, then head over to dev.splunk.com for information on our SDK interfaces for .NET, Java, Python, Ruby and JavaScript). It’s also great for these larger export jobs and automation.

My tool of choice for this is curl. This is standard issue on Linux systems, but there are downloads available for Windows as well, which is my platform of choice. (Note, if you are doing this in PowerShell, you will need to remove the alias for curl, which uses Invoke-WebMethod instead. The Invoke-WebMethod is not the same thing at all!) So how do we do this? First off, figure out your search. In my case, I want a particular sourcetype for one day – let’s say 2 days ago. So here is my search:

index=mssharepoint sourcetype=MSSharePoint:2013:ULSAudit host=SP-APP01 | table index,host,source,sourcetype,_raw

Note that I am explicitly setting the fields I want and putting the results into a table. I want to store the results of this search into a file called sp-app01.csv. The REST endpoint we are going to use is the /search/jobs/export endpoint, and you use it like this:

curl -k -u admin:mypassword https://myhost:8089/services/search/jobs/export --data-urlencode search='search index=mssharepoint sourcetype=MSSharePoint:2013:ULSAudit host=SP-APP01 | table index,host,source,sourcetype,_raw' -d output_mode=csv -d earliest_time='-2d@d' -d latest_time='-1d@d' -o sp-app01.csv

If you leave off the -o option you will get the output streamed to your console – given the amount of data you are grabbing, this is not optimal. If you include the -o, then you get a nicely formatted display of progress.

  % Total    % Received % Xferd    Time    Time     Time  Current
                                   Total   Spent    Left  Speed
100  369M    0  369M    0   187 --:--:--  0:11:42 --:--:--     0

The numbers update during the process telling you how much has been downloaded and the speed at which the data is coming across. But 12 minutes later, I have a 369Mb file with well over 500,000 lines of data. However, it’s all formatted in the CSV file. If you are doing this for event generation, then this isn’t the end of the line. You need to remove the header, reverse the lines and then add the header back in. I find the head, tail and tac utilities that are provided on Linux systems useful here:

head -1 sp-app01.csv > header.txt
tail -n +2 sp-app01.csv > sp-app01.xxx
tac sp-app01.xxx > sp-app01.yyy
cat header.txt sp-app01.yyy > sp-app01.csv

If you don’t have a Linux system handy, then you can do the same things using the GNU tools for Windows. Now that file is ready for event generation.

Use this facility when you want to export a large amount of data from Splunk across the network.

Monitoring Scheduled Tasks with PowerShell

$
0
0

I did the unthinkable yesterday. I combed through my posts for non-spam comments. I apologize to everyone whom I didn’t answer – we get a lot of comment spam that I have to wade through when I do this. However, there were a couple of requests in there for future topics and I’ll try and cover those requests in the next few weeks.

The first request was for monitoring scheduled tasks. I’m going to read this as “given a Windows host, how do you determine what scheduled tasks are enabled and whether they are failing or succeeding?”. That’s a tall order, so I looked to my favorite tool – PowerShell – for the answer.

PowerShell v3 has a bunch of cmdlets that manage scheduled tasks. The first – Get-ScheduledTask – gets a list of scheduled tasks along with some information about them. Looking at the Get-Member results, we see the following:

PS> Get-ScheduledTask | Get-Member

   TypeName: Microsoft.Management.Infrastructure.CimInstance#Root/Microsoft/Windows/TaskScheduler/MSFT_ScheduledTask

Name                      MemberType     Definition
----                      ----------     ----------
Clone                     Method         System.Object ICloneable.Clone()
Dispose                   Method         void Dispose(), void IDisposable.Dispose()
Equals                    Method         bool Equals(System.Object obj)
GetCimSessionComputerName Method         string GetCimSessionComputerName()
GetCimSessionInstanceId   Method         guid GetCimSessionInstanceId()
GetHashCode               Method         int GetHashCode()
GetObjectData             Method         void GetObjectData(System.Runtime.Serialization.SerializationInfo info, Sys...
GetType                   Method         type GetType()
ToString                  Method         string ToString()
Actions                   Property       CimInstance#InstanceArray Actions {get;set;}
Author                    Property       string Author {get;set;}
Date                      Property       string Date {get;set;}
Description               Property       string Description {get;set;}
Documentation             Property       string Documentation {get;set;}
Principal                 Property       CimInstance#Instance Principal {get;set;}
PSComputerName            Property       string PSComputerName {get;}
SecurityDescriptor        Property       string SecurityDescriptor {get;set;}
Settings                  Property       CimInstance#Instance Settings {get;set;}
Source                    Property       string Source {get;set;}
TaskName                  Property       string TaskName {get;}
TaskPath                  Property       string TaskPath {get;}
Triggers                  Property       CimInstance#InstanceArray Triggers {get;set;}
URI                       Property       string URI {get;}
Version                   Property       string Version {get;set;}
State                     ScriptProperty System.Object State {get=[Microsoft.PowerShell.Cmdletization.GeneratedTypes...

You can see from this that it's just getting the information from WMI (CIM is the new WMI in PowerShell v3 and above). Thus, we can easily get a list of the scheduled tasks using the following script:

Get-ScheduledTask | Where State -ne "Disabled" | Select TaskName,TaskPath,Source,Description,Author,State,URI,Version

That gets us the first part of the problem. Now we need the second part - how do we know when they ran and the status of the last run. There is another cmdlet for this: Get-ScheduledTaskInfo. We can run this by using the following script:

Get-ScheduledTask | Where State -ne "Disabled" | Get-ScheduledTaskInfo | Select TaskName,TaskPath,LastRunTime, LastTaskResult,NextRunTime,NumberofMissedRuns

To actually implement a monitor for scheduled tasks, I would schedule these differently. My inputs.conf (using the handy SA-ModularInput-PowerShell) would look like this:

[powershell://scheduled-tasks]
script = Get-ScheduledTask | Where State -ne "Disabled" | Select TaskName,TaskPath,Source,Description,Author,State,URI,Version
schedule = 0 30 2 ? * *
source = PowerShell
sourcetype = Windows:ScheduledTask

[powershell://scheduled-taskinfo]
script = Get-ScheduledTask | Where State -ne "Disabled" | Get-ScheduledTaskInfo | Select TaskName,TaskPath,LastRunTime, LastTaskResult,NextRunTime,NumberofMissedRuns
schedule = 0 45 * ? * *
source = PowerShell
sourcetype = Windows:ScheduledTaskInfo

The first input stanza runs at 2:30am local time and the second input stanza runs every 60 minutes. Our list of scheduled tasks won’t change very much, so let’s create a lookup to enhance our work. This will turn a host, TaskName and TaskPath into the associated information. The search to run is this:

sourcetype=Windows:ScheduledTask |
    stats latest(Source) as Source,
        latest(Description) as Description,
        latest(Author) as Author,
        latest(State) as State,
        latest(URI) as URI,
        latest(Version) as Version
        by TaskName,TaskPath,host |
    outputlookup WindowsScheduledTask.csv

As normal, enter this all on one line. Turn this into a lookup (either through the manager or via the configuration files) and you are ready to go.

There are three things we can with the scheduled task information. Each will require its own search.

  1. Show Failed Tasks
  2. Show Missed Tasks
  3. Show Last Run Time of all Tasks

The two interesting ones are the failed tasks and missed tasks. Failed tasks can be found by looking at the LastTaskResult. The LastTaskResult is 0 on success and an error code otherwise. Run this search over the last 60 minutes:

sourcetype=Windows:ScheduledTaskInfo LastTaskResult!=0 |
    lookup WindowsScheduledTask host,TaskName,TaskPath OUTPUT Source,Description,Author,URI,Version |
    table host,TaskName,TaskPath,Description,Author,URI,LastRunTime,NextRunTime

The missed tasks search uses the NumberOfMissedRuns instead:

sourcetype=Windows:ScheduledTaskInfo NumberOfMissedRuns!=0 |
    lookup WindowsScheduledTask host,TaskName,TaskPath OUTPUT Source,Description,Author,URI,Version |
    table host,TaskName,TaskPath,Description,Author,URI,NumberOfMissedRuns,LastRunTime,NextRunTime

I mentioned earlier that the Get-ScheduledTask series of cmdlets use the CIM/WMI underneath. However, they apparently only work on NT 6.2 and above; also known as Windows Server 2012 or Windows 8. Unfortunately, this is one area of Microsoft land that changes frequently. For earlier versions, there is a WMI interface (Win32_ScheduledJob) that can be used, but it provides different information. Also, there is a log file that is maintained by the scheduler (C:\Windows\Tasks\SchedLgU.txt). However, the log file has an issue – it is exactly 32Kb in size and the system locks it and overwrites the contents constantly. Once it gets to the end, it starts at the beginning of the file again. This is good for diagnosis, but not good for monitoring purposes. Hopefully Microsoft will maintain the PowerShell cmdlets “as is” for future versions of Windows!

Monitoring Windows Shares with Splunk and PowerShell

$
0
0

I sometimes get emails after blog posts. One of the (fair) criticisms is that I sometimes do something in PowerShell that can be quite legitimately done via another data input like WMI. While this is true for simple cases, it’s not always true. Take the request, for example, of monitoring network shares. There are three parts to this. Firstly, producing a monitor of the share itself; secondly, producing a monitor of the permissions on the share; and finally, monitoring the file accesses utilizing that share. I’ve already blogged about the last one. Let’s take a look at the first two.

You can actually monitor the share itself using WMI. Network Shares are exposed via a WMI class Win32_Share. However, I wanted to go a little further – I wanted to show that we can expose the shares in a way that allows us to monitor changes to the shares. As is most often the case, I’m going to use the SA-ModularInput-PowerShell data input for this purpose. This modular input has a key feature we are going to use – the ability to save state between script executions.

Let’s take a look at the code for getting the data first. It’s relatively simple:

Gwmi Win32_Share | Where Type –eq 0 | Select Name,Path,Status,MaximumAllowed,AllowMaximum

The Share Type 0 is a “standard Windows share” and not something else (like an admin connection or a printer share). If we were doing a WMI connection, then we could construct this to allow us to output this with a WMI query and it would be logged every X minutes. However, we want to go further – we want to show off changes as well. To do this, we utilize the LocalStorage module that is distributed with the SA-ModularInput-PowerShell addon. The basics are simple. First, we set up a LocalStorage hash to use:

$State = Import-LocalStorage ‘Win32_Share.xml’ –DefaultValue (NewObject PSObject –Property @{ S = @{}})

Note the default value – we are setting up a hash that will be persistently stored on each server and will be used to store the current settings. At the end of our script, we want to ensure that we store any updates we made:

$State | Export-LocalStorage ‘Win32_Share.xml’

In between these statements we can handle all the stuff we need. Here is the complete script:

$State = Import-LocalStorage "Win32_Share.xml" -DefaultValue (New-Object PSObject -Property @{ S = @{} })

$shares = (Get-WmiObject -Class Win32_Share | Where-Object Type -eq 0 | Select-Object Name,Path,Status,MaximumAllowed,AllowMaximum)
foreach ($share in $shares) {
    $Emit = $false

    if (-not $State.S.ContainsKey($share.Name)) {
        $Emit = $true
    } else {
        $cache = $State.S.Get_Item($share.Name)
        if (($cache.Path -ne $share.Path) -or 
            ($cache.Status -ne $share.Status) -or
            ($cache.MaximumAllowed -ne $share.MaximumAllowed) -or
            ($cache.AllowMaximum -ne $share.AllowMaximum)) {
            $Emit = $true
        }
    }

    if ($Emit -eq $true) {
        Write-Output $share
        $State.S.Set_Item($share.Name, $share)
    }
}

$State | Export-LocalStorage "Win32_Share.xml"

What we are basically doing here is saying “if the share does not exist in our cache or anything has changed about the share compared to the cache, then output the share to the pipeline and store the new share in the cache. To run this, you will need to add a stanza to inputs.conf. I’ve added this script to my script repository in TA-windows-local/bin, so here is my stanza from that same app:

[powershell://Win32_Share]
script = . “$SplunkHome\etc\apps\TA-windows-local\bin\win32_share.ps1”
schedule = 0 0/5 * ? * *
index = win
sourcetype = Windows:Win32_Share

The output looks like this:

Name=Drivers
Path=C:\Drivers
Status=OK
AllowMaximum=True

There are a couple of improvements we could do to this script. Firstly, adding a “last emitted time” to the event and storing that in the cache would allow us to add a condition that states “if the share has not been emitted in the last 24 hours, then emit the event”. This allows us to restrict the search parameters we use when utilizing this data source to the last 24 hours. Secondly, we can do a second pass – over the cache instead of the shares – and see if any cache entries are not in the share list. This allows us to detect share deletions as well.

Next week, I will cover the second part of this problem – getting the permissions for each share. Until then, keep those ideas for Windows data inputs coming!

Monitoring Windows File Share Permissions with Splunk and PowerShell

$
0
0

I stopped my last blog post on Windows File Shares noting that there was still more to do. Monitoring Windows File Shares is a three part puzzle:

  1. Accesses
  2. Share Changes
  3. Permission Changes

We have already handled the first two, so this blog post is all about the final one – monitoring permission changes.

Let’s first consider how one would do this generically. As with the file shares, there is a WMI class for monitoring permissions, but it’s harder to use. You need to do it on a per-share basis, like this:

gwmi Win32_LogicalShareSecuritySetting -Filter "Name='$shareName'"

The Win32_LogicalShareSecuritySetting is a complex beast. Fortunately, we only need to know a couple of things. The most important one is the security descriptor. You can get the security descriptor like this:

$ss = gwmi Win32_LogicalShareSecuritySetting -Filter "Name='$shareName'"
$sd = $ss.InvokeMethod('GetSecurityDescriptor',$null,$null)

Once you have the security descriptor, the ACLs are in a property called DACL (which is actually an array – one for each entry in the ACL), and the user or group is embedded in another property inside the DACL called Trustee. If you need more information on this object, I suggest reading the excellent blog post by Andrew Buford.

To aid me in this, I created a short script. You can download it from github. It contains two cmdlets that are fairly central to this process – Get-NetShare encapsulates WMI call for obtaining the list of network shares. I use this to feed into the Gte-NetShareSecurity cmdlet, which produces more objects. Now I can do the following:

Get-NetShare | Get-NetShareSecurity

There is more going on within the script though, as it is meant to be run as part of the SA-ModularInput-PowerShell addon. Specifically, it encapsulates the logic from last week for emitting the shares only when they change. I’ve done a few changes – I’ve added a checksum field so that I only have to store and check the checksum. I’ve also added a type – is it a new share, updated share or just a periodic emission? Finally, I’ve handled deletions as well by checking the cache against the current list of shares.

I do pretty much the same thing for the permissions. In the file share example, the share name is the primary key. In the permissions example, we have to construct a primary key – I’ve used the share name and the Security ID (SID) of the user or group as the primary key. Other than that, it’s exactly the same code.

One final note – since this script is outputting two different types of data, I leverage a feature of the SA-ModularInput-PowerShell that allows me to set the source type within the object. The property for this is called SplunkSourceType. You can use Add-Member to add this to the objects you are emitting.

If you are going to .conf 2013 next week, feel free to stop by the Microsoft booth on the third floor in the Apps Showcase and chat with me about Microsoft, PowerShell and getting data into Splunk.

Windows Host Monitoring in Splunk 6

$
0
0

Splunk 6 is out! While the most flashy and awesome of features rightly got their day in the sun at the recent .conf 2013, there is lots to love in there for Windows admins as well. I’m going to spend the next few weeks explaining in detail what some of those things are and how you can make use of them. First up – Windows Host Monitoring is one of a bevy of new data inputs available in the Splunk 6 Universal Forwarder.

When I design a new Microsoft related app, inevitably I need some indication of a what sort of host is being monitored. Such information includes the hardware (memory and processor) and operating system. I also need an indication of services running so I can do service monitoring, disk space utilization and so on. All of this information used to be gathered via either PowerShell or WMI, depending on my mood at the time I wrote the app. It was inconsistent at best and didn’t allow for generic monitoring of Windows hosts.

In Splunk 6, we fixed that. There is a new modular input that is distributed with the Splunk 6 Universal Forwarder called WinHostMon. You can configure it like this within inputs.conf:

[WinHostMon://computer]
interval = -1
type = computer
index = windows

The interval of -1 indicates that the host monitor should gather the information once (per reboot) and then never again, so you have to take account of this in your searches and lookups. There are a whole series of things you can retrieve, but you can only retrieve one thing per stanza. When it sends an event, the event looks like this:

Type=Computer
Name="DC"
Domain="splk.com"
Manufacturer="Microsoft Corporation"
Model="Virtual Machine"

Repeat the stanza for type=operatingSystem (with a new name) to get the OS information.

I don’t want to be searching over all-time every time I want to join an event stream to get windows host information – that’s bad. The normal thing to do would be to turn this into a lookup – even at 100K hosts, a lookup won’t be terrible. If you do have 100K endpoints then you already have methods of dealing with large lookups! My search to convert the events into a lookup is as follows:

index=windows (Type=Computer OR Type=OperatingSystem)|stats latest(_time) as _time,latest(OS) as OS,latest(Architecture) as Architecture,latest(Version) as Version,latest(BuildNumber) as BuildNumber,latest(Name) as Name,latest(Domain) as Domain,latest(Manufacturer) as Manufacturer,latest(Model) as Model by host|inputlookup append=T WinHosts.csv|sort _time|stats latest(_time) as _time,latest(OS) as OS,latest(Architecture) as Architecture,latest(Version) as Version,latest(BuildNumber) as BuildNumber,latest(Name) as Name,latest(Domain) as Domain,latest(Manufacturer) as Manufacturer,latest(Model) as Model by host|outputlookup WinHosts.csv

The host is our primary key here. if we were gathering the informationevery X hours, then we would not need the second stats call – we could just call the first stats command and then pipe that to the outputlookup. We have old events to consider, so we create a table of all the new events that have come in within the last X hours, then append the old results, and finally re-do the stats command so we can get the very latest information before pushing it back out to the CSV file that backs our lookup. Set this up as a saved search. You can then add the lookup file as a lookup via a transforms.conf file. If you want this available in other apps, don’t forget to export both the lookup CSV file and the transform entry.

There are other WinHostMon methods for getting service status, processes, installed drivers and installed applications on a single host. This modular input basically unifies all of our differing methods for getting the data under one roof without the need for dealing with multiple WMI calls or installing additional apps like the SA-ModularInput-PowerShell app. As such, it should become part of your toolbox.

Not all apps will work with a Splunk 6 universal forwarder as yet (most notably, all the Microsoft apps are still relying on Splunk 5) so they won’t take advantage of the new universal forwarder features. However, you can bet on us utilizing this functionality in all future development.


Windows Event Logs in Splunk 6

$
0
0

Quite a while ago I wrote a blog post entitled The Splunk App for Active Directory and How I tamed the Security Log. It detailed how to limit the amount of data that was going into the Splunk index through filtering. I included two techniques – firstly, filtering by event code so that you didn’t include the events you didn’t want; and secondly, filtering the explanatory text on the end of each event. Splunk 6 makes this so much easier that the prior blog post is not even relevant any more.

Let’s say you don’t want firewall events. From the previous blog post, event ID 5156 and 5157 detail the firewall connection accept and deny messages. Let’s say those are not relevant to us. Previously, we had to add a props.conf stanza to initiate a filtering action that was done in transforms.conf – it was complicated. In Splunk 6, everything is done in inputs.conf. Here is a new inputs.conf stanza for you:

[WinEventLog:Security]
disabled = false
blacklist = 5156-5157

There are two new parameters you can specify – the first, shown here, is a black list of all the event IDs you don’t want to monitor. You can use ranges (as I did here), or comma-separate the event IDs or event comma-separate ranges of event IDs. The second parameter is a whitelist – if you have more that you don’t want to keep than you want to keep. It follows the same format.

The second facility I wrote about was suppressing the explanatory text. Splunk 6 makes this easier as well. Let’s take a look at a typical windows event prior to the text suppression:

10/14/2013 08:29:33 AM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4634
EventType=0
Type=Information
ComputerName=SP-SQL.bd.splunk.com
TaskCategory=Logoff
OpCode=Info
RecordNumber=3544
Keywords=Audit Success
Message=An account was logged off.

Subject:
	Security ID:		BD\a-ahall
	Account Name:		a-ahall
	Account Domain:		BD
	Logon ID:		0x5886A

Logon Type:			3

This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.

You see that “This event is generated…” text – that’s the explanatory text. It’s the same for every single event. Since these events get generated every 10-15 minutes for every single user on your domain controllers and they are 100+ bytes, you can see how they can add up. And that’s just one example. Every single security event has similar explanatory text. In Splunk 6, you can add a new parameter to your inputs.conf stanza to supress the Message field:

[WinEventLog:Security]
disabled = 0
suppress_text = 1

Now when you get those events, this is what they look like:

10/14/2013 08:43:07 AM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4672
EventType=0
Type=Information
ComputerName=SP-SQL.bd.splunk.com
TaskCategory=Special Logon
OpCode=Info
RecordNumber=3546
Keywords=Audit Success
Message=

You will note that is NO message text at all. This is fine for some logs (usually custom service logs) where the message is not important. However, you will still need to use the same transform as before if you want some of the message but not all – for example, with the Security log. Since all the stanzas of the same name are munged together, you should be careful about setting the suppress_text parameter. In particular, do not set the suppress_text parameter on WinEventLog:Security as it will not log any of the important contextual information within the security log.

These two changes can make your windows event log gathering more efficient, but as always – be careful of what you throw away.

Fixing Windows Time Problems for Splunk

$
0
0

I’ve just been bitten. We all do, eventually. The case of the dreaded time sync problem. I had a Universal Forwarder sending my Indexer a whole bunch of data. But my searches were not seeing the data because I had a time synchronization error – my Universal Forwarder was a little in advance of my indexer – enough that it was a problem.

Of course, tracking this down is difficult, and there are various techniques you can use. My favorite is using the metrics.log file on the universal forwarder to see if data is being sent. You might also use the “All Time” approach, although I don’t recommend that if you have a lot of data.

So, how do you fix time sync issues. The short version – NTP is your friend. Let’s go through all the bits you have to do.

Firstly, if your host is running on a virtual machine, you need to turn off time sync. Your host is probably getting the time from the underlying VM. Install NTP on your hypervisor and/or turn off time sync within the guest. If you have VMware, you can do this with the following on Windows.

'C:\Program Files\VMware\WMware Tools\VMwareToolboxCmd.exe' timesync disable

If you are on a Hyper-V based hypervisor, you need to open up VM Settings, then go to Management and then Integration Services. Uncheck the Time Synchronization box.

Now that you have the hypervisor behaving and not setting the time for you, you need to set up NTP on all your hosts. On Linux this is relatively easy – edit /etc/ntp.conf to include a server list and then run ntpdate with the new server list. For example, I use pool.ntp.org as my time source, so I have “server pool.ntp.org” in my /etc/ntp.conf file and then I do the following:

ntpdate pool.ntp.org

On Windows there is a little more work to do. Open up a PowerShell window as Administrator (right-click and use Run As Administrator) and enter the following commands:

w32tm /config /manualpeerlist:pool.ntp.org /syncfromflags:MANUAL
restart-service w32time
w32tm /resync

W32tm is a Microsoft tool for managing the ntp peer list from the commandline. Your clock in the system tray will update within a few seconds. Make sure you use the same server list as your Linux boxes. If you need to specify multiple servers, you can space-separate them and use quotes, like this:

w32tm /config /manualpeerlist:"0.pool.ntp.org 1.pool.ntp.org" /syncfromflags:MANUAL

Don’t forget to restart any Splunk searches you were running after you have set the time. With time all set and properly synchronized, you can go about your merry splunking.

New Features for Perfmon in Splunk 6

$
0
0

Continuing our delve into the new Microsoft features that were introduced in Splunk Enterprise 6, let’s take a look at performance metrics. We had a couple of really cool features here, dealing with zero values, object recognition, and multi-kv.

The first has to do with how we record zero values. By default, the perfmon data input does not record zero values. This has implications if you want to do things like 95th percentile as I discussed in this blog post. In that blog post, I suggested you might want to (shudder) alter the splunk-perfmon.path file to provide a command line argument. The developers obviously read my blog and decided to make it easier. Now you can just add a parameter to your inputs.conf stanza, like this:

[perfmon://MemoryStats]
object = Memory
counters = *
instances = *
interval = 60
showZeroValue = 1

The important bit is that last parameter. If it is set to 1, then the zeros are recorded for each counter/instance combination. The default is the old behavior (i.e. showZeroValue = 0).

Our second feature is the ability to include regular expressions in the object. Let’s take an example of Microsoft SQL Server. If you create a named instance, the SQL Server will maintain counters that look like this:

MSSQL$INSTANCE:Transactions

The INSTANCE is the instance name. This presents a problem when you are trying to provide a generic add-on that reads performance data for Microsoft SQL Server. Our developers came to the rescur again – we can now specify a regular expression. If the object name matches exactly then the regular expression is never consulted. However, if there is no exact match, the object turns into a regular expression match and all objects matching the regular expression are added to the monitoring list. You can do things like this:

[perfmon://MSSQL:Transactions]
object = MSSQL[^:]*:Transactions
counters = *
instances = *
interval = 60
showZeroValue = 1

Our final update to perfmon collection deals with an output change. Normally, you will get events that look like this:

10/15/2013 13:39:04.044 -0700
collection=Processor
object=Processor
counter="% Processor Time"
instance=_Total
Value=2.2639598290484009

This is good, but what if you have a lot of perfmon to gather – thousands of events? This is not exactly the most efficient method of storing the data. Let’s take a look at an example for an alternate mechanism:

[perfmon://Memory]
object = Memory
counters = *
interval = 60
mode = multikv
showZeroValue = 1

That new parameter – mode – switches between multikv and simple outputs. A multikv entry looks like this:

0	33.85326809566461	7734845440	7975034880	14696304640	0	0	3.982737423019366	29.870530672645245	1.991368711509683	1.991368711509683	1.991368711509683	0	299855872	94121984	0	312774	180313	33556298	66895872	89628672	290975744	0	9474048	10956800	8876032	66895872	54.26557951373551	7553560	7376	0	1615249408	1789952	31371264	6088224768	0	14400

Ok – it isn’t the most readable event in the world. But try this:

sourcetype=PerfmonMk:Memory| table Available_MBytes,Committed_Bytes,"Demand_Zero_Faults/sec"

All of a sudden, the information in this table is decoded for you. The format is much more compressed and that makes it take up less license room. Note that the source type has changed (it’s now prefixed with PerfmonMk instead of Perfmon), but other than that – it’s good. What’s more, if you do this to something with instances (for example, LogicalDisk), you still get one event per instance and the first element is the instance name:

C:	82.221877626972017	100744	0	0.893482164153651	0.0089348216415365105	0	0	0.893482164153651	0.0089348216415365105	0.00092000519619113603	0	0.00092000519619113603	9.8839878234777032	0	9.8839878234777032	64775.70259994347	0	64775.70259994347	6553.6000000000004	0	6553.6000000000004	97.408979418055637	1.9767975646955405

All of this makes it very possible to get more performance data to correlate against your other event data and store it in the most efficient way possible. Of course, you do need to upgrade your Universal Forwarder to 6.0, so be aware of that small wrinkle.

Splunking Windows PowerShell Commands

$
0
0

This years user conference was another great conference and we got a ton of questions from you during the conference. Some of them I couldn’t answer at the time – I’m making up for that in between blog posts about new features. The first one was “Is there any way I can splunk what PowerShell commands are being executed on a server?”

There are two pieces of this puzzle: firstly – can I turn on an audit log that includes all the PowerShell commands that are executed within the system? We do that normally through group policy. Open up the group policy management console and take yourself to:

Computer Configuration\Administrative Templates\Windows Components\Windows PowerShell

In this group policy container there is a setting called “Turn On Module Logging”. It’s either enabled or disabled – enable it to turn on logging. You also need to set the list of modules that are logged. Wildcards are allowed, so feel free to set this to *. Apply your group policy change to the list of servers that you want to log and wait for the change to propagate (or run GPUPDATE /FORCE on the target systems).

Now that you have module logging turned on, the PowerShell commands appear in a Windows Event Log called “Microsoft-Windows-PowerShell/Operational” – you will most certainly want to install a Splunk 6 Universal Forwarder on each server that you are targeting to read this event log. You can do this by utilizing the following inputs.conf stanza:

[WinEventLog://Microsoft-Windows-PowerShell/Operational]
disabled = false

Push that out to your target servers and you will start getting events like the following back:

10/23/2013 10:20:43 AM
LogName=Microsoft-Windows-PowerShell/Operational
SourceName=Microsoft-Windows-PowerShell
EventCode=4103
EventType=4
Type=Information
ComputerName=EX-BES10.bd.splunk.com
User=a-ahall
Sid=S-1-5-21-2882450500-3417635276-1240590811-1179
SidType=1
TaskCategory=Executing Pipeline
OpCode=To be used when operation is just executing a method
RecordNumber=133
Keywords=None
Message=ParameterBinding(Get-Service): name="Name"; value="SplunkForwarder"


Context:
        Severity = Informational
        Host Name = ConsoleHost
        Host Version = 3.0
        Host ID = e6323c96-aa4d-48c3-87a1-b97e01c63afa
        Engine Version = 3.0
        Runspace ID = b2be7033-a9e5-43c1-b356-fedb9ccd34cf
        Pipeline ID = 20
        Command Name = Get-Service
        Command Type = Cmdlet
        Script Name = 
        Command Path = 
        Sequence Number = 42
        User = BD\a-ahall
        Shell ID = Microsoft.PowerShell

From this, you can see all the information that you need to determine what was run, who ran it, what machine it was run from and when it was run. You will need to do the normal extractions to get this information – remember that this is a multi-line event, so ensure you use the ?gms version of the extractions in props.conf to handle multi-line regular expressions.

As for the cmd prompt – sorry, there is no equivalent log for that.

Installing the Splunk 6.0 Universal Forwarder on Windows

$
0
0

I’m currently working on getting all the Splunk apps that I am responsible for upgraded so that they use the Splunk 6 Universal Forwarder. Naturally, that means a whole slew of installs on Windows Server in various configurations. I bumped into a small hitch while I was doing Microsoft SharePoint. SharePoint requires me to run the Universal Forwarder as a domain user.

No problem, you say. Just follow the instructions in the excellent documentation, or one of my many blog posts. However, I came across a hitch. You see, in the Splunk 6 Universal Forwarder installer, we check for a few things to ensure you are installing things properly. As a result, if you are installing Splunk 6 to run as a domain user, you must run the installer as an Administrator.

The problem is, of course, how do you do this? I’m not using an automated solution (like a PowerShell script on start-up, or Systems Center Configuration Manager); I’m just using the regular MSI. Double-clicking on it doesn’t run the MSI as an Administrator. If you right-click and select Compatibility, you will note that the “Run this program as an administrator” is greyed out. I even tried running PowerShell as an Administrator and then running msiexec by hand (I got a very weird error when doing this).

Finally, I turned to our engineering team – after all, they must have tested this! It turns out they did, with the lowly cmd prompt. The best method on Windows Server 2012 (where cmd.exe is hidden) is to start up a PowerShell prompt running as Administrator, then run cmd.exe inside of the PowerShell prompt. Now you can just run that .msi file directly and the right thing will happen.

Of course, you will want to add on some command-line arguments. Mine are:

msiexec /i splunkforwarder.msi AGREETOLICENSE=Yes DEPLOYMENT_SERVER="DEPLOY:8089" LOGON_USERNAME="SPLK\splunk" LOGON_PASS="xxxxxx" /quiet

Everything else will be defaulted, and it will look to a machine called DEPLOY on the local domain for the configuration. I configure “DEPLOY” as a CNAME in DNS on my local Active Directory domain controller.

I mentioned earlier that you can do this via PowerShell. I don’t like to install Splunk universal forwarders via Group Policy Software Installation for two reasons – firstly, it requires a reboot of the server which isn’t really required by the installer, and secondly, I have to produce a transforms file (or MST file) instead of providing just the command line arguments. Instead, I have a PowerShell script that I can run on each server that will install the Splunk Universal Forwarder if required. I get this to run as part of the start-up process. If I need to install Splunk on another server that is already running, I can do it without a reboot easily using the same script.

But that script – that’s another blog post in the works.

Viewing all 68 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>