Splunk Blogs » Adrian Hall

I’ve come across a couple of reasons to have a correlation between the IP Address (at a point in time) and a hostname. For most normal cases, you can use the nslookup script to do a reverse lookup. However, sometimes it’s a good idea to get the list and store it in a lookup. Unfortunately, this is actually a difficult piece of information to get outside of the DNS system. Here are my favorite ways of doing it.

Firstly, there is the PowerShell way. This uses the SA-ModularInput-PowerShell addon to run a PowerShell command to get the list for you. Here is the function you need:

function Get-LocalIPAddress
{
    $AdapterSet = Get-WmiObject -Class Win32_NetworkAdapterConfiguration
    $IPAddressSet = @()
    foreach ($adapter in $AdapterSet) {
        if ($adapter.IPAddress -ne $null) {
            foreach ($ipaddress in $adapter.IPAddress) {
                $IPAddressSet = $IPAddressSet + $ipaddress
            }
        }
    }
    $IPAddressSet
}

With this function, you can create a data input script like this:

$HostInfo = New-Object PSObject
$HostInfo | Add-Member -PassThru -MemberType NoteProperty IPAddressList -Value ((Get-LocalIPAddress) -join ",")

Add the function and the script together, place in a *.ps1 file in your bin directory and utilize the SA-ModularInput-PowerShell scheduling feature to run this once a day. You will get one event per day per host with the list of IP Addresses. Once you have this, you can set up a saved search like this:

sourcetype=PowerShell:IPAddressList 
| eval IPAddress=split(IPAddressList,",") 
| mvexpand IPAddress 
| table IPAddress,host 
| outputlookup IPAddressToHost

This is nice if you are doing other things with PowerShell and has the advantage of a limited search range – you only need to search over the last 24 hours (assuming you are running the script once a day) in order to generate the lookup. However, it does require another app on your endpoints and the requirement for installing PowerShell 3 and .NET Framework 4.5 on your systems. This is something that is not always feasible. For that reason we have another way – one using just the regular data inputs provided with the Splunk Universal Forwarder. You might be forgiven here for guessing WMI. WMI has a great WQL command for retrieving the list and you can try it for yourself:

Get-WmiObject -query 'SELECT * FROM Win32_NetworkAdapterConfiguration WHERE IPEnabled="TRUE"'

However, you will note here that the IP Address List field returns an array. The Splunk WMI reader can’t deal with arrays today. Fortunately, WMI isn’t the only place where this information is stored. It’s also stored in the registry in the HKLM hive. You can set up a data input as follows:

[WinRegMon://IPAddresses]
hive=HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters\\Interfaces\\?.*
baseline=1
baseline_interval=86400
disabled=false

Note that the syntax I have provided works with Splunk 6 Universal Forwarders. If you have an earlier version, you will need to convert this to the older regmon-filters.conf syntax. Every day, the WinRegMon will do a baseline crawl through this section. An example event is here:

11/13/2013 07:57:16.548
registry_type="baseline"
key_path="\REGISTRY\MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\TCPIP\PARAMETERS\INTERFACES\{0A44AA72-2BC4-400F-B345-E13CA9DA76C1}\DhcpIPAddress"
data_type="REG_SZ"
data="172.20.254.12"

There are a couple of interesting things here. Firstly, you will note that there is a GUID in the key_path – this is a unique ID for the interface, so it’s definitely something we are interested in. Secondly, note the timestamp. The baseline timestamp is not the time when the baseline is captured, but rather the last time the key has been updated. You may see duplicate baseline events for the same key with the same timestamp if the key has not been updated. In the case of IP addresses, that is certainly possible. To turn this into a lookup, we need to do some work. The first task is to extract the GUID, which is conveniently located between two curly-braces in the key_path field. The second is to get a link between host, GUID and IP Address, which we can do with a simple stats call. Finally, we need to bring in the existing lookup table since the events may be generated way in the past. Here is how I do it:

sourcetype=WinRegistry registry_type="baseline" key_path="*IPAddress"
| rex field=key_path "{(?<GUID>[^}]+"
| stats latest(_time) as _time,latest(data) as IPAddress by host,GUID
| inputlookup append=T IPAddressToHost
| sort _time
| stats latest(_time) as _time,latest(data) as IPAddress by host,GUID
| outputlookup IPAddressToHost

You can use this version of the lookup the same way as the PowerShell version, but it has two extra fields – the _time field is the time that the interface last updated its IP Address, and the GUID is the internal GUID of the interface. You can use it wherever you have an IP address like this:

sourcetype=iis | lookup IPAddressToHost IPAddress as src_ip OUTPUTNEW host AS src_host

Everyone (just about) knows that there is a table of status codes that HTTP/1.1 defines. However, IIS gives you two more status codes in the log files. The HTTP/1.1 status is stored in sc_status (and it is automagically decoded for you in Splunk 6). There is also an extended code called sc_substatus and a Win32 error code. How can you really decode these, especially since the sc_win32_status seems to have really large numbers?

Let’s start with the sc_status and sc_substatus codes. These are normally written together as a decimal number. So, for instance, 401.1 means an sc_status of 401 and an sc_substatus of 1. The sc_status codes follow a pattern: 1xx are informational, 2xx indicate success, 3xx indicate redirection, and higher ones are for errors. Here is the big table:

Status Code	Meaning
100	Continue
101	Switching Protocols
200	Client Request Succeeeded
201	Created
202	Accepted
203	Non-authoritative information
204	No content
205	Reset content
206	Partial content
301	Moved Permanently
302	Moved Temporarily
303	See Other
304	Not modified
305	Temporary redirect
400	Bad Request
401.1	Access Denied (Logon Failed)
401.2	Access Denied (Logon Failed due to server configuration)
401.3	Access Denied (Unauthorized due to ACL on resource)
401.4	Access Denied (Authorization failed by filter)
401.5	Access Denied (Authorization failed by ISAPI/CGI application)
401.7	Access Denied (By IIS6 URL authorization policy on web server)
403.1	Forbidden (Execute Access)
403.2	Forbidden (Read Access)
403.3	Forbidden (Write Access)
403.4	Forbidden (SSL Required)
403.5	Forbidden (128-bit SSL Required)
403.6	Forbidden (IP Address Rejected)
403.7	Forbidden (Client Certificate Required)
403.8	Forbidden (Site access denied)
403.9	Forbidden (Too many users)
403.10	Forbidden (Invalid configuration)
403.11	Forbidden (Password change)
403.12	Forbidden (Mapper Denied Access)
403.13	Forbidden (Client certificate revoked)
403.14	Forbidden (Directory listing denied)
403.15	Forbidden (Client Access Licenses exceeded)
403.16	Forbidden (Client certificate is untrusted)
403.17	Forbidden (Client certificate is expired)
403.18	Forbidden (Cannot execute URL in current application pool)
403.19	Forbidden (Cannot execute CGIs in current application pool)
403.20	Forbidden (Passport logon failed)
404.1	Not Found (Website not accessible on the requested port)
404.2	Not Found (Web service extension lockdown policy)
404.3	Not Found (MIME map policy)
404.4	Not Found (No Handler in IIS7)
404.5	Request Filtering (URL Sequence)
404.6	Request Filtering (Verb)
404.7	Request Filtering (File extension)
404.8	Request Filtering (Hidden namespace)
404.9	Request Filtering (Hidden File Attribute)
404.10	Request Filtering (Header is too long)
404.11	Request Filtering (URL double escaped)
404.12	Request Filtering (High-bit characters)
404.13	Request Filtering (Content length is too long)
404.14	Request Filtering (URL is too long)
404.15	Request Filtering (Query string is too long)
405	Method not allowed
406	Browser does not accept the media type
407	Proxy authentication required
412	Precondition failed
413	Request entity too large
414	Request-URI too long
415	Unsupported media type
416	Requested range not satisfiable
417	Execution failed
500.12	Web Server is restarting
500.13	Web server is too busy
500.15	You can’t have Global.asa
500.16	UNC authorization credentials are incorrect
500.18	URL authorization store cannot be opened
500.100	Internal ASP error
501	Header values specify a configuration that is not implemented
502.1	CGI application timeout
502.2	Error in CGI application
503	Service unavailable
504	Gateway timeout
505	HTTP version not supported

There are a lot of codes there. The majority of failures are in 401 (which deals with Authentication and Authorization) and 404 (which deals with server-side failures, as opposed to the content generators and filters). You can get really granular about why a particular request failed. This aids in debugging when things go wrong.

For the sc_win32_status, fortunately, there are only a few you need to know:

Win32 Code	Meaning
2148074252	The logon attempt failed
2148074254	No credentials are available in the security package

You will normally see sc_status=401 sc_win32_status=2148074254 on the first access during an integrated authentication to an IIS Web site. This will prompt the browser to pop up a window saying “Enter your credentials”. Once you submit those credentials, you will get another sc_status=401 but with sc_win32_status=2148074252 instead when those credentials cannot be verified. You can look up any other sc_win32_status codes at MSDN.

Which brings us to the question that caused me to write this blog post. Can I provide a report that shows the top failed logons into IIS with integrated authentication? Since the integrated authentication does something like this:

Client sends “GET /” command
Server returns sc_status=401 Authorization Required
Client sends “GET /” with Authorization head
Server returns sc_status=200 with the page details

One cannot just use sc_status=401 for failed logons. You have to use:

sourcetype=iis sc_status=401 sc_win32_status=2148074252

One of the common recurring themes I get is how to detect attachments and log those attachments in Splunk. Let me get the obvious piece of this out of the way first – you cannot log the attachment names or contents without a Transport Agent. This is a special piece of code that is deployed on all your Exchange Servers that intercepts the messages as they go through the system and does something to them. You will normally see a transport agent deployed for anti-virus scanning, for example.

However, logging the fact that there is an attachment is relatively easy. You can create a Transport Rule to log a message when attachments are created. To create the rule, you need to log into the Exchange Control Panel (ECP) using the path https://your-exchange-server/ecp. Follow this process:

Browse to Mail Flow, then Rules
Click on the + sign to create a new rule
Fill in the relevant information. I like:
- Predicate: When the attachment size is greater than or equal to 10Kb
- Action: Add a Message Header “X-Attachment” value “yes”
Check the box next to Audit and select a Level
Select Test and send notifications (near the bottom)
Click on Save

In small environments, this rule will propagate quickly. However, the larger the environment, the slower the process; bear this in mind – I’ve seen some systems take up to 24 hours to propagate a change. I’ve picked a value for attachment size of 10Kb because a lot of organizations like to add a graphic as a signature. This graphic is technically an attachment.

Now that your rule is in place, how can you tell? The logging is done in the Message Tracking logs and looks like this:

2013-11-22T17:11:42.270Z,,EX-MBX01,,,,,AGENT,AGENTINFO,9543417331719,,b692227ef81649f5083608d0b5bd0c4f,a-ahall@bd.splunk.com,,19574,1,,,test-attachment,ahall@testlab.local,whayes@testlab.local,,Undefined,,,,S:AMA=SUM|v=0|action=|error=|atch=1;S:AMA=EV|engine=M|v=0|sig=1.163.371.0|name=|file=;S:TRA=ETR|ruleId=ff1d0524-bcae-4c75-9dd1-3b56728aa029|st=11/22/2013 4:57:23 PM|action=SetAuditSeverity|action=SetHeader|sev=3|mode=AuditAndNotify

If you are using the Splunk App for Exchange, then this is decoded for you. In particular, you can see that some things are actually decoded automatically for you. The ruleId is a GUID that identifies your rule. You can get this with the Get-TransportRule cmdlet. You can also see the action (SetHeader in our case), severity (1 is low, 2 is medium and 3 is high). In the extractions within the Splunk App for Exchange you can get the sender, recipients and subject fields. To get a list of people who have sent attachments you can do the following search:

sourcetype=MSExchange:*:MessageTracking ruleId="ff1d0524-bcae-4c75-9dd1-3b56728aa029"|table _time,sender,recipients,subject,total_bytes

Replace the ruleId with the ruleId of the transport rule that you have created and you have your report. Who sends the most attachments?

sourcetype=MSExchange:*:MessageTracking ruleId="ff1d0524-bcae-4c75-9dd1-3b56728aa029"|top sender

Given the flexibility of Transport Rules, this makes the possibilities of reporting endless.

If you are running Exchange Server 2007 or 2010, things are a little different, and I’ll cover them in a different blog post.

Some systems are easy to monitor and diagnose – just Splunk the log file or performance counter and you are pretty much done. Others take a little more work. Take, for example, Microsoft SQL Server. Many of the best bits of management information are stored in Dynamic Management Views, or DMVs. Getting to them is not so straight forward.

In order to get those nuggets, we need to do some pre-work. Firstly, install a Splunk Universal Forwarder on the SQL Server. Then fire up the SQL Server Management Studio and add the LOCAL SYSTEM account to the sysadmin role. This will allow the local machine access to all the information you need to monitor any database within the instances that are installed. If you have multiple instances on the server then make sure you add the LOCAL SYSTEM to the sysadmin role on each instance. Finally, push or install the SA-ModularInput-PowerShell to the Splunk Universal Forwarder. This will allow you to grab the information you need.

Now that we have the pre-work out of the way, we can start concentrating on the basics. In the Splunk App for SQL Server, I have a PowerShell module that simplifies the SQL Server access. For instance, it has a command to list the instances on the server:

PS#> Import-Module .\\Common.psm1
PS#> Get-SQLInstanceInformation

This will list out the instances, most notably a field called ServerInstance. You can feed this to another cmdlet to get database information:

PS#> Get-SQLInstanceInformation | Get-SQLDatabases

We want to get access to the Dynamic Management Views. These are accessed via SQL statements. To assist with this, I have another module called SQL.psm1. For instance, in the Splunk App for SQL Server, I include an indexhealth.ps1 script. This runs a DMV query to find out if any indices are suggested for any databases within the instance. Here is the basic process:

PS#> $conns = (Get-SQLInstanceInformation | `
    Where-Object { $_.ServiceState -eq "Running" } | `
    Open-SQLConnection
PS#> $conns | Invoke-SQLQuery -SourceType "MSSQL:DMV:Indexes" -Query $query
PS#> $conns | Close-SQLConnection

As you can see, it’s a three part process. Firstly, we open up connections to each of the running instances. Secondly, we execute our SQL query to retrieve the information. The Invoke-SQLQuery is a wrapper around Invoke-SQLCmd that also formats the objects to be Splunk-friendly. Finally, we close the connections. You can place this in a *.ps1 script and use the PowerShell modular input to execute it.

You can find both modules mentioned in the TA-SQLServer bin directory when you install Splunk App for SQL Server.

The real power here is the DMV-related SQL query. For the index health, here is the query:

SELECT
    DB_NAME(s.database_id) AS [DatabaseName],
    OBJECT_NAME(s.[object_id]) AS [ObjectName],
    i.name AS [IndexName],i.index_id,
    user_seeks + user_scans + user_lookups AS [Reads],
    user_updates AS [Writes],
    i.type_desc AS [IndexType],
    i.fill_factor AS [FillFactor]
FROM sys.dm_db_index_usage_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
WHERE i.index_id = s.index_id AND i.index_id != 0

You can find a wide array of DMV related SQL scripts sites such as MSSQLTips.com – particularly this blog post, and to get you started, here is a list of five more queries.

Now, go forth and monitor that SQL Server!

If you have upgraded your Active Directory domain to Windows Server 2012 R2 and use the Splunk App for Active Directory, you may have noticed that the replication statistics script doesn’t work the same way as on older versions of Windows. Specifically, the ad-repl-stats.ps1 script takes forever to run and consumes just about as much memory as you can give it. This is because of a change in the implementation of the System.DirectoryServices.ActiveDirectory API that Microsoft provides. In prior releases of Windows Server, the API was lazy – data was only filled in within the objects when the data was requested. In Windows Server 2012 R2, those same objects filled in the data at instantiation. When we read the replication status object, all the replicated objects were loaded immediately, causing a major performance impact.

Fortunately, we’ve got all the facilities to correct this. As part of the PowerShell v3 release, we also got our hands on some new PowerShell cmdlets for managing Active Directory. These are contained in the RSAT-ADDS Windows Feature (which you will need to install on each domain controller). I created a new script called replication-stats.ps1 with the following contents:

Import-Module ActiveDirectory -ErrorAction SilentlyContinue

Get-ADReplicationPartnerMetaData -Target $env:ComputerName -PartnerType Inbound -Partition * | %{
    $src_host = Get-ADObject -Filter * `
        -SearchBase $_.Partner.Replace("CN=NTDS Settings,","") `
        -SearchScope Base -Properties dNSHostName

    New-Object PSObject -Property @{
        LastAttemptedSync = $_.LastReplicationAttempt
        LastSuccessfulSync = $_.LastReplicationSuccess
        Result = $_.LastReplicationResult
        transport = $_.IntersiteTransportType
        naming_context = $_.Partition
        type = "ReplicationEvent"
        usn = $_.LastChangeUsn
        src_host = $src_host.dNSHostName
    }
}

The primary source of information is the Get-ADReplicationPartnerMetaData, which provides details of the replication partnerships on the current host. We convert the partner into the source host using Get-ADObject. Now the output has exactly the same fields as the old ad-repl-stat.ps1 script. To run it, we need to schedule it using our SA-ModularInput-PowerShell add-on (which you will also need to install on each domain controller). Switch the scripted input for ad-repl-stat.ps1 for the following within inputs.conf:

[powershell://Replication-Stats]
script = & "$SplunkHome\etc\apps\ad-repl-stats\bin\replication-stats.ps1"
schedule = 30 */5 * ? * *
index = msad
source = Powershell
sourcetype = MSAD:NT6:Replication
disabled = false

Once you push out the change (including the required SA-ModularInput-PowerShell) and restart the forwarder (if you are installing SA-ModularInput-PowerShell), you will get the replication data flowing within five minutes. This will enable the replication status report to work for your Windows Server 2012 R2 servers again.

This change will be built into a future version of the Splunk App for Active Directory; for those who need it now, my advice is to create a new app with just this data input in it. Disable the ad-repl-stat.ps1 scripted input in the regular TA as well. This will enable a smooth upgrade when this data input is integrated into the Splunk App for Active Directory.

Have you ever installed a Splunk Universal Forwarder and seen one or more of your Active Directory domain controllers have high CPU utilization as a result? Have you ever wondered how the Splunk Universal Forwarder translates the Security ID effortlessly into a real name you can read? In this blog post, I’m going to tell you exactly how we do the things we do with Active Directory and how you can improve the performance or reduce the load on your domain controllers.

There are two Windows pieces on the Universal Forwarder that deal with Active Directory. The first is known as admon – it emits information about your Active Directory Domain Services objects – both as a “dump” of the entire tree and to monitor for changes. Admon uses the common ldap_* API calls that Microsoft provides to both get a baseline of the Active Directory database and ongoing changes to that database.

The most common effect is that a large amount of memory is utilized during the baselining of the Active Directory domain. This is normally a problem in the context of the Active Directory app. In the case of this app, the admon is turned on for every single domain controller. In the case of a large database, you could see the splunk-admon.exe process balloon to several gigabytes on every single domain controller. However, we don’t actually need a copy of the admon changes on every single domain controller. We really only need it on one. If you are turning on admon purely for the Active Directory app, then you only need changes – not the baseline. For those larger sites, I recommend the following:

Turn OFF the admon data input within the TA-DomainController-*
Enable the admon data input on just ONE domain controller (you can even add a domain controller just for this task)

To turn off the admon data input, edit the TA-DomainController-*\local\inputs.conf (depending on which version of the addon you use) to include the following:

[admon://NearestDC]
disabled = true

Then, on the one domain controller that you wish to enable the admon, add the following to a local\inputs.conf file that is on that domain controller:

[admon://ADMonitoring]
targetDc = MyDC
monitorSubtree = 1
baseline = 0
index = msad
disabled = false

The targetDc value should be the NetBIOS host name for the domain controller that you are running admon on. We set baseline = 0 to not collect the baseline since we don’t need it. This will prevent admon from ballooning in memory utilization because of the large data capture. I know that some sites like fault tolerance in data collection, so feel free to add another domain controller to collect the duplicate data.

The other – not so obvious – place that touches the domain controller is in the Windows Event Logs. Windows Event Logs consist of two parts – a structured data piece and a localized template for showing the “friendly” version. You can see the structured data as XML in the Event Viewer – just click on an event to open it up, click on the Details tab and select XML View. Sometimes – particularly in the Security event log – you will see a Security ID in the meta-data (not in the message block). It’s something that starts with S-1-5-21-…. and it is this that gets translated by the WinEventLog modular input on the universal forwarder.

To do this, we call LookupAccountSidW() – an API call that looks up a record based on the SID and returns the details like the domain and the sAMAccountName of the object that it represents. There is an optional parameter – lpSystemName – to this API call that allows us to pass in a domain controller name. In the WinEventLog configuration, you can set the domain controller that this call queries using the evt_dc_name parameter. What is of interest is what happens when you don’t set the evt_dc_name parameter (which is the default). In this case, the SID resolution is first attempted by the local system. If the local system does not find the SID, then we move on to a domain controller trusted by the local system. Inevitably, this will first be the domain controller that the computer is logged into and if not there, then the closest domain controller (usually within the AD site) that holds a copy of the global catalog. There are some other code paths that crack open internal portions of the event data to do translations. In these cases, we first of all check whether we are bound to a domain controller (so the lookup should be directed there) – if the initial GUID or SID lookup fails, we use DsCrackNames() – again, if evt_dc_name points to a copy of the global catalog, we can translate anything in the forest, otherwise you are limited to the local domain.

When you first install the Splunk_TA_windows, there are a large number of security events to ingest already – usually numbering in the millions. That means that millions of SID translations are required before the windows event log catches up, causing a high load on the domain controller being queried for SID translations. You can alleviate this problem by not reading the backlog (utilizing the current_only parameter). In addition, you can specify that the local DC is used for SID translations (by utilizing the evt_dc_name parameter). For instance, let’s say I have a domain controller called DC1, I can use the following:

[WinEventLog://Security]
evt_dc_name = \\DC1
current_only = 1

Since we are talking domain controllers here, what about the Active Directory app? The Active Directory app doesn’t actually use the SID translation. The username and domain of both the actor (who makes the changes) and the recipient (which object is changed) is embedded into the event without the need for SID translation. We can actually turn off the SID translation using the evt_resolve_ad_obj parameter like this:

[WinEventLog://Security]
evt_dc_name = \\DC1
current_only = 1
evt_resolve_ad_obj = 0

You can make these changes in the Splunk_TA_windows\local\inputs.conf (creating it if the file does not exist). Doing this allows you to keep all the critical events from your domain controller without adding significant load to that or other domain controllers.

Let’s face it – sometimes, it just isn’t possible to install the Universal Forwarder on all hosts. Mistrust of new software, proof of concepts and security concerns all play into the decision to install a Universal Forwarder or not. What do you do when you can’t install a Universal Forwarder? In this article, we will discuss how to configure a Microsoft Windows host to forward the Windows Event Logs somewhere else.

Throughout this article, we will refer to the “source” when we mean the system that is generating the logs in the first place, and we will refer to the “collector” when we mean the system where you are centralizing the logs.

Step 1: Configure WinRM

Your first step will be to configure remote management, and most especially remote windows event log management on the systems. On each source and the collector, you will want to type the following at an elevated PowerShell command prompt:

winrm quickconfig
$computer = (Get-WmiObject win32_computersystem).Name.ToUpper()
$collector = 'DOMAIN/COLLECTOR$'
$adsi = [ADSI]"WinNT://$computer/administrators,group"
$adsi.add("WinNT://$collector,group)

Make sure you replace DOMAIN and COLLECTOR with appropriate values for your environment. Also:

If you are using Windows Firewall, then add a Windows Firewall exception for Remote Event Log Management on each source computer
Create a domain account (let’s call it “DOMAIN\LogAdmin” and add it to the Event Log Readers group on each source computer.

Configuring the collector is similar:

wecutil qc
winrm set winrm/config/client @{TrustedHosts="SOURCE"}

This last line adds the source to the list of systems that is allowed to use NTLM authentication to communicate with the collector via WinRM.

Step 2: Create an Event Subscription

Our second step is to create the subscription to transfer the logs from the source to the collector. On the collector:

Run Event Viewer as an Administrator
Click Subscriptions in the console tree
On the Actions menu, click “Create Subscription”
Fill in the Subscription Name and Description with appropriate values
In the Destination Log box, select the log file where collected events are stored.
Click Add and select the source
Click Select Events to display the Query Filter – specify which events are to be collected
Click OK

Normally, the Destination Log will be “ForwardedEvents”, but you can create new ones or just munge all the logs together. My recommendation is to create a new log for each channel. For instance, if your source is “SOURCE” and you are collecting the Security events, then create a log “SOURCE-Security”. You can then use the Universal Forwarder inputs.conf settings to read that log and set the sourcetype and host to the appropriate thing to ensure your apps don’t see any difference.

Step 3: Advanced Settings

You probably saw the Advanced button. This is actually fairly important. There are two things you can do. The first is to configure event delivery optimization. With “Normal” optimization, the collector uses a pull delivery mode, batches 5 items at a time and sets a batch timeout of 15 minutes. When using “Minimize Bandwidth” optimization, a push delivery and a delivery timeout of 6 hours is used. The final option is “Minimize Latency” – it also uses push and a batch timeout of 30 seconds. Select the appropriate setting for the particular scenario.

The other element you can configure is the user and password. We created a domain user called LogAdmin earlier – you can enter the credentials here to deal with that, and it is a recommended setting.

Step 4: Install a Universal Forwarder

You can now install a Windows Universal Forwarder on the collector. If you have followed the instructions here, then you have created a new log file for the events from the source and you can see those events flowing in by utilizing the Event Viewer on the collector. Now it’s time to configure the Universal Forwarder. Use the following inputs.conf entry:

[WinEventLog://SOURCE-Security]
sourcetype = WinEventLog:Security
host = SOURCE
disabled = false

Push this to the Universal Forwarder and restart the service. You should see the events from your source computer appearing in the appropriate Splunk instance.

If you have thrown all of your Security logs into the Security log, then you need to do some post-processing on the indexer to ensure that the host field is set properly. Start with the following props.conf entry:

[WinEventLog:Security]
TRANSFORMS-sethost = Set-Host-By-ComputerName

then, in transforms.conf:

[Set-Host-By-ComputerName]
REGEX = (?m)ComputerName=(.*?)\n
FORMAT = host::$1
DEST_KEY = MetaData:Host

A Final Note

The features that the Universal Forwarder provides – secure and timely delivery of logs – should not be under-emphasized here. The Universal Forwarder is the best method of collecting Windows Event Logs for delivery to Splunk, and best practice is to install a Universal Forwarder on each host. You do not save any bandwidth, CPU or memory by setting up log forwarding – quite the opposite is true. You will use more bandwidth when using log forwarding and my own experiments show more CPU and memory for log forwarding as well. However, this mechanism is still better than using WMI to collect logs (which stores the logs in a different format).

This isn’t the last word on this subject either. There are quite a few methods of forwarding logs, including HTTPS transport for secure delivery and custom configuration of the push/pull subscription model. Those subjects are left as future research.

What can you tell me about my environment? It’s a common enough query and Splunk seems to be able to answer them all. The latest was this: Can you give me a list of all the servers that are inactive? Inactive, for the purposes of this post, means that they are bound to the domain but they have not logged into the domain in some period of time.

One of my favorite tools for answering these questions is the SA-ldapsearch commands. Fortunately for us, Active Directory contains the timestamp. Unfortunately for us, it contains two timestamps. The first is called “lastLogon” and contains the time stamp that the system in question last connected to THIS domain controller. The second is called “lastLogonTimeStamp” and contains the time stamp that the system in question last connected to ANY domain controller. This is a very important distinction (and the customer actually queried me on this one, so I had to go check). You can follow the links to get to the Microsoft definitions, but here is the important information from Microsoft. Basically – the lastLogonTimeStamp is replicated lazily and can be 9-14 days behind the real events. Use the Windows Event Log to find real-time logon information. Fortunately, we don’t need real time, so the laziness is ok for us.

We can start with a fairly basic ldapsearch command, such as this:

|ldapsearch domain=XXX search="(&(operatingSystem=*Server*)(objectCategory=computer)" attrs="cn,operatingSystem,lastLogonTimestamp" | table cn,operatingSystem,lastLogonTimestamp

The lastLogonTimestamp is returned as an actual date – not exactly something we can work with. Fortunately, Splunk provides a strptime() function. This is like the regular python strptime() function but can be used in the search pipeline to convert textual dates into something that Splunk can use. You specify the format as a series of % codes. Here is how we can do the conversion:

|ldapsearch domain=splk search="(&(operatingSystem=*Server*)(objectCategory=computer))" attrs="cn,operatingSystem,lastLogonTimestamp"|eval llt=strptime(lastLogonTimestamp,"%Y/%m/%d %T %Z") | eval inactiveTime=now() - llt

We now have a field called inactiveTime that contains the number of seconds that the system has been inactive. We can easily filter out the systems that are active and concentrate on the ones that are inactive:

|ldapsearch domain=splk search="(&(operatingSystem=*Server*)(objectCategory=computer))" attrs="cn,operatingSystem,lastLogonTimestamp"|eval llt=strptime(lastLogonTimestamp,"%Y/%m/%d %T %Z") | eval inactiveTime=now() - llt | where inactiveTime > (14*86400) | table cn,operatingSystem,lastLogonTimestamp,inactiveTime

Note that I use 14 days as the time because of that lazy replication employed by Active Directory. Anything less is unreliable.

About once a week I respond to a call or online question asking about the Splunk App for Active Directory. Specifically, these questions ask one of two things. The first is “can I collect the Active Directory data remotely?,” and the second is “What user shall I run the Universal Forwarder as?” The cliff notes version is that you should not collect Active Directory data remotely, and you should install the Universal Forwarder as the system local user. If you want more information, read on.

Let’s start with the first question – can you collect the Active Directory data remotely? Technically, the answer is yes, but reality is the answer is that it is ill advised from a security point of view and difficult to do, requiring many changes that are not supported. Let’s look a bit at the pieces that you may want to collect:

Windows Security Logs
DNS Debug Logs
Performance Counters for Active Directory and DNS Operations
PowerShell results for host information
PowerShell results for DNS information
PowerShell results for Domain Services Replication information

You could set up a log subscription for the Windows Security logs and collect those logs on a remote system. This accounts for about 90% of all the log gathering that the Splunk App for Active Directory does, so it’s a great option. You will have to add a transform on the collecting host to ensure the host field is set appropriately – if you don’t, the events will appear to be from the collecting host instead of the generating host. Do not use syslog, snare or WMI to collect the Windows Event Logs – the format is different and you would have to do an extraordinary amount of work to get the same field extractions that we rely on to produce the dashboards. The DNS Debug Logs are generated within the SYSTEM32 directory and are locked when they are being written. You cannot collect these logs remotely as locked files cannot be opened remotely. Locally, you can use the monitorNoHandle method of reading the file. Fortunately, you only need them for producing certain reports within the app. If you can avoid those reports, you can also decide to forgo the collection of the DNS Debug data. The performance counters can be collected via WMI. However, the names of the counters are different, so the dashboards will remain unpopulated without some work. Since most organizations use something else (for example, Systems Center Operations Manager) for performance management, this may not be a significant loss.

This leaves us with the PowerShell scripts. As many of you will know, PowerShell includes the ability to run commands remotely. However, the scripts are not designed to run remotely. When you run a script remotely, it serializes the results and sends them back to you. Unfortunately, when deserialization occurs on the collection end, the data is not in exactly the same format. In addition, you need to have domain or enterprise administrator privileges to run many of the functions that touch the Active Directory Domain Controller functionality – things like replication information and the contents of the DNS Server. A good PowerShell expert might be able to alter the scripts, but how are you going to get around the requirements for the wide-ranging security requirements that remote execution requires?

This brings us to the next question – who should I run the Universal Forwarder as? As you might have guessed from the above comments, you have only two choices, and one of those is really not a choice. Your two choices are the system local user and a domain administrator. If you choose the domain administrator, then the Splunk Administrator can execute any PowerShell script anywhere in the domain. This is bad – really bad. I strongly discourage you from even thinking this is a good idea. It isn’t. Worse – if you are running the Splunk Universal Forwarder on a forest root node, the user needs to be an Enterprise Administrator. This is even worse than the domain administrator, giving the Splunk Administrator the ability to do anything on any computer within the enterprise. Don’t do it.

That leaves us with the system local user. Choosing this option will allow the universal forwarder to access all the information it needs, but only on the local system. It can’t access that same information across the network. This is perfect – you isolate the capabilities to just the system you are monitoring. Of course, this requires that you have a Universal Forwarder on the domain controller, but that’s better all round for a security setting.

This begs the question – what is the impact of the Universal Forwarder on my domain controller? There are two phases of collection – baselining and ongoing service. During baselining, you can expect a memory utilization about the same size as your directory and a high CPU load. If your directory is large, then this can take some time and be a concern. Once the baselining is complete, you can expect a 40-80Mb RSS on the memory side and 3-5% CPU utilization, depending on how busy your domain controller is. I always recommend the following actions:

Turn off baselining
Turn off the admon data input on all but 1-2 systems (depending on redundancy needs) per domain.

You can read all about these recommendations in more detail in another blog post. If you follow these rules, your Splunk App for Active Directory installation will go smoothly.

I know. I normally blog about Microsoft stuff. Recently, however, I’ve been helping out on another project – updating the Cisco Security Suite to be compatible with Splunk 6. The Cisco Security Suite is the most downloaded app on Splunkbase behind the *Nix and Windows apps and exposes Cisco specific information about your Cisco specific security devices.

We had many aims for this project, aside from just upgrading everything to work with Splunk 6. We wanted it to use the Technology Add-ons that you may already have from a deployment of Enterprise Security. If you were considering an upgrade to Enterprise Security in the future (and you should – it’s awesome), then we wanted the data you have already collected to seamlessly integrate with that product. We also wanted to let you explore the data on your own with data models provided by the Common Information Model app. Finally, but by no means the least of it, we wanted to work with Cisco to ensure we were advising on the best practices for data collection and that we were supporting all the latest versions of the software that you may have installed.

That’s a tall order, and one release will not be enough to get it done. However, the Cisco Security Suite v3.0, which is available now on Splunkbase, handles that task for your Cisco ASA, PIX and FWSM firewalls plus your Cisco WSA web proxy appliances. You will need a couple of additional components from Splunkbase. If you have a Cisco ASA, PIX or FWSM firewall then you will need the Splunk Technology Add-on for Cisco ASA. This is an Enterprise Security 3 compatible add-on for reading the firewall data from those devices. If you want to explore the data via a data model, then you will want the Common Information Model app, which turns data gathered by ES3 compatible add-ons into data models.

If you are just starting out with the Cisco Security Suite, then installation is relatively painless:

Install the Cisco Security Suite from your Splunk Interface
If required, also install the Splunk Add-on for Cisco ASA from your Splunk Interface
Copy the contents of Splunk_CiscoSecuritySuite/appserver/addons to your main apps directory
Restart your Splunk server
Configure the Technology Add-ons so that they receive data
Enjoy!

You will note that we don’t need to install five different apps any more – everything is distributed as a package for you. The hard part is configuring the Technology Add-ons and your devices. If you have followed the instructions above, then two views are available to you under Cisco Security Suite -> Documentation – one walks you through configuring your Cisco firewall and the other walks you through configuring your Cisco WSA appliances.

If you are upgrading from an earlier version, then things are a little more complicated. As a first step, you need to remove all the Splunk_Cisco* apps from your Splunk installation. Yes – this means you won’t be collecting data for a while. Then you will want to upgrade your Splunk installation to Splunk 6.0.2 – our latest version. Finally, you will want to follow the steps above.

However, you are not finished there. You will have some data that is tagged the old way – sourcetype=cisco_asa, for example – and the new data will be tagged the new way – sourcetype=cisco:asa, for example. The subtle difference can make all the difference in your usage. To fix this, you need to do some adjustments to the apps.

Let’s start with the field extractions. We have new field extractions, and they are located in the props under the stanza [cisco:asa] – you want that to be [cisco_asa], so just copy it and change the stanza name – everything else stays the same. If you are editing the configuration files, then you need to copy the props.conf with the Splunk_TA_cisco-asa and SA-cisco-asa apps from default to local before editing. If you are using the Manager, this is exceedingly complex – I recommend editing the files by hand just this once.

Once you have the field extractions working for your old sourcetype, you can add the old sourcetype to the correct event types. Each eventtype starts with cisco- and includes a descriptive name. For example, eventtype=cisco-firewalls specifies a search that contains all the firewall data, so you will want to add sourcetype=”cisco_asa” to that eventtype. This is a lot easier to do within the Manager, but hand-editing the configuration files works as well.

This is not the final version of the Cisco Security Suite v3.0. Over the coming weeks, we intend to add back the Cisco IPS and ESA device support that was in the earlier versions, and add new Cisco security devices to the list of devices that we produce dashboards for. The Cisco Security Suite is community supported, so feel free to post on Splunk Answers if you run into trouble – we will try our best to assist.

When you install a new virtual host on VMware, you get to give it any name you want. The name has nothing to do with what is running on the host. How can we go from the Windows information to the VMware information? We’re here to help.

Let’s take a look at the VMware side of things for a moment. If you have the Splunk App for VMware installed, then you likely already have this information. The sourcetype is “vmware:inv:vm” and there is one event for every virtual host in there. Since we need a common field on which to correlate, I’m going to choose the network interface MAC Address. The “vmware:inv:vm” event is JSON data, so we need to use the spath command to extract the right information:

sourcetype=vmware:inv:vm macAddress 
    | spath moid 
    | spath output=mac path=changeSet.guest.net{}.macAddress 
    | spath output=vm_name path=changeSet.name 
    | stats values(mac) as mac by vm_name,host,moid 
    | mvexpand mac

What you will get is a table that provides the hypervisor and the name of the VM on that hypervisor for any given MAC address. However, we want to go further. We want to also correlate that information to the host information that is coming in from the Windows hosts. The Windows Universal Forwarder has a WinHostMon data input that we can use to provide this information. Since I generally recommend that all Windows hosts get the Splunk_TA_windows, I would also recommend you place this input definition in the local/inputs.conf file for this TA.

[WinHostMon://networkAdapter]
interval = 86400
sourcetype = WinHostMon:NetworkAdapter
type = networkAdapter

When this runs, you will get the MAC address of the network adapter together with the Windows host field.

03/04/2014 09:58:07.486 
Type=NetworkAdapter 
Name="Intel(R) 82579LM Gigabit Network Connection" 
ComputerName=ACME-001 
Manufacturer="Intel" 
ProductName="Intel(R) 82579LM Gigabit Network Connection" 
Status="OK" 
MACAddress="00:50:56:BE:73:A9"

We can set up a similar lookup to VMware as follows:

sourcetype=WinHostMon:NetworkAdapter 
    | stats values(MACAddress) as MACAddress by host 
    | mvexpand MACAddress

The only gotcha is that you will find that the MACAddress on the Windows side of things is one thing (00:50:56:BE:73:A9) and on the VMware side is something slightly different (00:50:56:be:73:a9). You need to choose one of them and convert the other. Fortunately, it’s a simple conversion. I chose altering the Windows side of things as follows:

sourcetype=WinHostMon:NetworkAdapter
    | eval mac=lower(MACAddress)
    | stats values(mac) as mac by host
    | mvexpand mac

Now we can correlate the two together. My preferred form of this is to generate a combined lookup that has the host, mac, vCenter host name and vm_name in the lookup:

(sourcetype=WinHostMon:NetworkAdapter) OR (sourcetype=vmware:inv:vm macAddress)
    | eval mac=lower(MACAddress)
    | spath moid 
    | spath output=mac path=changeSet.guest.net{}.macAddress 
    | spath output=vm_name path=changeSet.name 
    | eval vcenter=if(sourcetype="vmware:inv:vm",host,null())
    | eval w=if(sourcetype="WinHostMon:NetworkAdapter",host,null())
    | eval hostSystem=vcenter+"-"+moid
    | stats values(mac) as mac,values(w) as winhost by vm_name,hostSystem 
    | mvexpand mac

So, what can you do with this information. Well, the Splunk App for VMware has a nice drill down that allows you to drill into the host details from anywhere. In Simple XML you can add a <drilldown> target to /app/splunk_for_vmware/vm_detail?selectedVirtualMachine=VCENTER-MOID. I’ve included a hostSystem field for this purpose. All you need to do is to add a lookup to the end of your search:

...your-search... | lookup vmnethost host as winhost OUTPUT hostSystem

Now you can do the drilldown linkage within Simple XML. Clicking on your table takes you right into the virtual machine details page for your Windows server in the context of VMware. From there, you can start exploring the data in a VMware context.

One of the great features of the Splunk App for Microsoft Exchange is that you can track messages to the edge. It doesn’t matter what type of devices we go through, we get to see the messages and what hops they go through. Doing that requires some knowledge of the data flow and the construction of appropriate searches.

Let’s take an example of the inbound message flow. To track an inbound message, we use a macro – msgtrack-inbound-messages. The comments in the macros.conf file tell us that we need to have a table that has the date/time, message-id, cs-ip, sender, sender-domain, recipient-count, list-of-recipients and message-size. It then goes on to show off the Microsoft Exchange version. How would we alter this for including Cisco ESA data? Cisco Email Security Appliance is a very common, industry leading security appliance for anti-malware protection. This is one of those we would have to include.

The best way to include Cisco ESA (or for that matter any email security) data is to construct a search that provides the table for just that data stream. Then you can include the data rather easily in the overall definition. Cisco ESA logs are problematic because you don’t get just one event per message like you would with, say, a firewall. You get many events. In addition, there isn’t just one identifying ID that you can correlate on – you can package multiple messages inside of an inbound connection, handle processing across multiple IDs and then send on to the Exchange system with multiple connections. That’s a lot of IDs to handle. The logs look like this sample:

Wed Feb 12 19:48:37 2014 Info: New SMTP ICID 1000000201 interface Data2 (192.0.2.44) address 203.0.113.83 reverse dns host unknown verified no
Wed Feb 12 19:48:37 2014 Info: ICID 1000000201 ACCEPT SG UNKNOWNLIST match sbrs[0.0:10.0] SBRS 5.1
Wed Feb 12 19:48:37 2014 Info: ICID 1000000201 TLS success protocol TLSv1 cipher AES128-SHA
Wed Feb 12 19:48:37 2014 Info: Start MID 500000014 ICID 1000000201
Wed Feb 12 19:48:37 2014 Info: MID 500000014 ICID 1000000201 From: 
Wed Feb 12 19:48:37 2014 Info: MID 500000014 ICID 1000000201 RID 0 To: 
Wed Feb 12 19:48:37 2014 Info: MID 500000014 ICID 1000000201 RID 1 To: 
Wed Feb 12 19:48:38 2014 Info: MID 500000014 Message-ID ''
Wed Feb 12 19:48:38 2014 Info: MID 500000014 Subject 'FW: Some Subject Matter'
Wed Feb 12 19:48:38 2014 Info: MID 500000014 ready 52076 bytes from 
Wed Feb 12 19:48:38 2014 Info: LDAP: Masquerade query Rewrites-Inbound MID 500000014 address bill.jones@example.com to bill.jones@example.com
Wed Feb 12 19:48:38 2014 Info: LDAP: Masquerade query Rewrites-Inbound MID 500000014 address reginald.brown@example.com to reginald.brown@example.com
Wed Feb 12 19:48:38 2014 Info: LDAP: Masquerade query Rewrites-Inbound MID 500000014 address amy.johnson@example.com to amy.johnson@example.com
Wed Feb 12 19:48:38 2014 Info: MID 500000014 rewritten to MID 500000031 by LDAP rewrite
Wed Feb 12 19:48:38 2014 Info: MID 500000031 ICID 0 From: 
Wed Feb 12 19:48:38 2014 Info: LDAP: Reroute query AD.routing MID 500000014 RID 0 address bill.jones@example.com to [('bill.jones@internal.example.com', '')]
Wed Feb 12 19:48:38 2014 Info: MID 500000031 ICID 0 RID 0 To: 
Wed Feb 12 19:48:38 2014 Info: LDAP: Reroute query AD.routing MID 500000014 RID 1 address amy.johnson@example.com to [('amy.johnson@internal.example.com', '')]
Wed Feb 12 19:48:38 2014 Info: MID 500000031 ICID 0 RID 1 To: 
Wed Feb 12 19:48:38 2014 Info: Message finished MID 500000014 done
Wed Feb 12 19:48:38 2014 Info: MID 500000031 attachment 'image001.jpg'
Wed Feb 12 19:48:38 2014 Info: MID 500000031 attachment 'image002.jpg'
Wed Feb 12 19:48:38 2014 Info: MID 500000031 Custom Log Entry: Attachment Names: image001.jpg, image002.jpg
Wed Feb 12 19:48:38 2014 Info: MID 500000031 Custom Log Entry: Attachment Sizes: 1736, 1525
Wed Feb 12 19:48:38 2014 Info: MID 500000031 Custom Log Entry: Attachment Types: image/jpeg, image/jpeg
Wed Feb 12 19:48:38 2014 Info: ICID 1000000201 close
Wed Feb 12 19:48:38 2014 Info: New SMTP DCID 70000094 interface 192.0.2.44 address 192.0.2.89 port 25
Wed Feb 12 19:48:38 2014 Info: DCID 70000094 STARTTLS command not supported
Wed Feb 12 19:48:38 2014 Info: Delivery start DCID 70000094 MID 500000027 to RID [0]
Wed Feb 12 19:48:38 2014 Info: Message done DCID 70000094 MID 500000027 to RID [0] 
Wed Feb 12 19:48:38 2014 Info: MID 500000031 matched all recipients for per-recipient policy DEFAULT in the inbound table
Wed Feb 12 19:48:39 2014 Info: Delivery start DCID 70000094 MID 500000026 to RID [0]
Wed Feb 12 19:48:39 2014 Info: Message done DCID 70000094 MID 500000026 to RID [0] 
Wed Feb 12 19:48:39 2014 Info: Delivery start DCID 70000094 MID 500000033 to RID [0]
Wed Feb 12 19:48:39 2014 Info: Message done DCID 70000094 MID 500000033 to RID [0] 
Wed Feb 12 19:48:40 2014 Info: MID 500000031 interim verdict using engine: CASE spam negative
Wed Feb 12 19:48:40 2014 Info: MID 500000031 using engine: CASE spam negative
Wed Feb 12 19:48:40 2014 Info: Delivery start DCID 70000094 MID 500000049 to RID [0]
Wed Feb 12 19:48:40 2014 Info: MID 500000031 interim AV verdict using Sophos CLEAN
Wed Feb 12 19:48:40 2014 Info: MID 500000031 antivirus negative 
Wed Feb 12 19:48:40 2014 Info: MID 500000031 Outbreak Filters: verdict negative
Wed Feb 12 19:48:40 2014 Info: MID 500000031 queued for delivery
Wed Feb 12 19:48:40 2014 Info: Message done DCID 70000094 MID 500000049 to RID [0] 
Wed Feb 12 19:48:40 2014 Info: Delivery start DCID 70000094 MID 500000031 to RID [0, 1]
Wed Feb 12 19:48:40 2014 Info: Message done DCID 70000094 MID 500000031 to RID [0, 1] 
Wed Feb 12 19:48:40 2014 Info: MID 500000031 RID [0, 1] Response '2.6.0  Queued mail for delivery'
Wed Feb 12 19:48:40 2014 Info: Message finished MID 500000031 done
Wed Feb 12 19:48:40 2014 Info: DCID 70000094 close

In this example, you can see one Inbound Connection ID (ICID), one Direct Connection ID (DCID), multiple Recipient ID (RID) and two Message IDs (MIDs) and a clear conversion from one MID to the other in the middle of the transaction. Let’s assume for a moment that you have used MV_ADD in transforms.conf to extract ICID, DCID and MID as multi-value fields. In addition, you will note (I’ve left them in) that there are other messages interleaved in the process – if there are queued messages waiting to go, the Cisco ESA will transmit those down the same DCID to the Exchange environment. Let’s take a look at my search:

sourcetype=cisco:esa
| eventstats values(cs_ip) AS cs_ip BY icid
| eventstats values(ss_ip) AS ss_ip BY dcid
| stats values(mid) AS tmpMID
     values(icid) AS icid
     values(sender) AS sender
     values(recipient) AS recipient
     values(total_bytes) AS total_bytes
     values(message_id) AS message_id
     values(cs_ip) AS cs_ip
     values(ss_ip) AS ss_ip
     values(dcid) AS dcid BY mid
| eval recipient_count=mvcount(recipient)
| eval mid=tmpMID
| mvexpand mid
| eventstats values(tmpMID) AS tmp BY mid
| eval t=mvjoin(tmp, " ")
| rex field=sender "@(?.*)"
| stats values(icid) AS icid
     values(sender) AS sender
     values(sender_domain) AS sender_domain
     values(recipient) AS recipient
     max(total_bytes) AS total_bytes
     max(recipient_count) AS recipient_count
     values(message_id) AS message_id
     values(dcid) AS dcid
     values(tmp) as mid
     values(cs_ip) AS cs_ip
     values(ss_ip) AS ss_ip BY msg

I’m betting most of you would have reached for the transaction command here. In concept, that would be the most expedient way of doing this. Transaction (and others like append and join) are commonly used by folks new to Splunk Search because they work in a similar way to SQL commands, which many people have dealt with in the past. Events are streams of data and really need to be treated differently. Transaction is one of a handful of commands that are not streamable. A streamable command will get pushed down to the Splunk Indexer whereas a non-streamable command is run on the Search Head. In large scale environments, you will have multiple indexers but you will only be interacting with one search head. This means that transaction is a big performance destroyer. There are other non-streamable commands and some non-streamable eval statements. You can expect appends and joins to be big performance killers, for example.

The search itself is mostly simple eval/stats. However, the major magic here is in eventstats. Eventstats is an underutilized streaming command that alters the event stream as it passes. In this case, the first eventstats command adds an accumulator for any IP addresses used by the inbound connection and the second eventstats command does the same for the IP addresses used on the outbound connection. The third eventstats is where our magic happens. It accumulates all the Internal Message IDs related to this single message. We then concatenate those all together to give us a unique field on which to generate our final table.

When looking at a search that I’ve been given, I always go back to the pipeline. I’ll start from the very first command and slowly add commands to the pipeline, re-running the resulting pipeline at each step and adding the fields being generated along the way to my view. This allows me to really understand what the pipeline is doing.

You can extend these ideas such that all the big transaction macros in the Splunk App for Microsoft Exchange are streamable. This will result in major performance gains when you have complicated environments. Don’t let transaction kill your performance.

We had fun this week in our Seattle office setting up clustering for Splunk on Windows on a pure-IPv6 network. IPv6 has been gaining acceptance more outside the US than within for quite a number of years now and I am one of those optimists that expects that we will reach the tipping point soon where IPv6 adoption becomes the norm rather than the exception.

We had a set of four systems. On our indexer tier were a set of three indexers – one cluster master and two cluster slaves. We also had a separate search head. Each of these systems was running Windows Server 2008R2 and had the latest version of Splunk Enterpise 6 installed. The requirement was that this lab environment be able to communicate on a pure-IPv6 environment. We aren’t going to go into the details of how to configure your environment for IPv6 here as there are whole books devoted to the subject, but Microsoft TechNet is a good starting point.

Let’s first of all talk about the systems themselves. You need both the IPv4 and IPv6 stacks installed on the servers. The frontend web servers and backend splunkd processes talk on 127.0.0.1, so IPv4 is needed just for the local loopback. We don’t need to talk to the physical network using IPv4 so it doesn’t need to be configured, but it does have to be there.

Secondly, we needed to configure splunkweb to listen on the IPv6 port. We do this by editing the web.conf file on each server as follows:

[settings]
listenOnIPv6 = yes

You can place this setting in the $SPLUNK_HOME\etc\system\local\web.conf file – it will not get overwritten on upgrade then. We need to do something similar to make splunkd listen on the IPv6 port. We do this by editing the server.conf file on each server:

[general]
serverName = my-server-name
listenOnIPv6 = yes

Once you restart everything, you will be able to access each individual machine by it’s IPv6 address or DNS name. However, don’t restart just yet because we want to set up clustering as well. We do the cluster set up within server.conf as well. Let’s start with the cluster master:

[clustering]
mode = master
replication_factor = 2
search_factor = 2
pass4SymmKey = changeme

You will need to understand the replication factor and search factor parameters as you would for any clustering requirement and you should definitely refer to the documentation or your friendly local Splunk expert for a good discussion on that subject. Now we need to set up the other indexers. Here is the addition to the server.conf:

[replication_port://9887]

[clustering]
master_uri = https://[2001:xxxx:xxxx::1]:8089
mode = slave
pass4SymmKey = changeme

Note that the IPv6 address I am using in the master_uri is the globally routable IPv6 address of the cluster master. You can get the IPv6 addresses of the cluster master using ipconfig on that server.

Finally, our search heads will require a slightly different version:

[clustering]
master_uri = https://[2001:xxxx:xxxx::1]:8089
mode = searchhead
pass4SymmKey = changeme

This will allow the search head to redirect the request to the right cluster members based on their status.

I have a final word on the Windows Firewall. Turn it off unless you have a really good reason to have it on. It takes up resources that could be doing Splunk work. If, however, you need to have the Windows Firewall on, ensure you open port 8089 and 9887 for connections from the Splunk Enterprise servers and port 8000 generally for your web traffic, plus any port that you are listening for connections on (such as 9097 for connections from your Universal Forwarders).

By the way, this process will also work handily on other operating systems running Splunk Enterprise. Just ensure you have the IPv4 localhost enabled.

Splunk is exhibiting at the Microsoft Exchange Conference this week. If you are in town, please stop by booth #805 in the Eastside to see us. To coincide with this conference, we are releasing a whole slew of new apps and add-ons. Here are some of the highlights:

The Splunk App for Microsoft Exchange has undergone a huge makeover and now includes complementary functionality from the Active Directory Domain Services and Windows realm. We can correlate across those three platforms to see new and unique things. Want to understand how a Windows update affected the performance of your Exchange hosts? Now you have the information available to you. Want to arrange the app panels in ways that are useful to you? We have a new dashboard builder feature that can do that. Fed up with dashboards for functionality for which you are no longer gathering data? You can turn them off. We’ve also added a better install experience where we detect what data you have and allow for the rebuild of the lookup tables easily. With this release, the Splunk App for Microsoft Exchange becomes a premium offering. Included is a free trial which offers a commitment to continual improvement and support. If you have already installed a prior version of the Splunk App for Microsoft Exchange, please be aware: this is a new app which will require separate installation. Both the old and the new apps can co-exist and use the same data.

The Windows Infrastructure pieces have now been separated into the Splunk App for Windows Infrastructure. This app includes the components for Active Directory and Windows Management and will remain free. We have brought some of the exceptional usability features within the new Splunk App for Microsoft Exchange such as the Dashboard Builder and First Run Experience. Also included in this app are several UI improvements and correlated reports to handle the challenges of this environment. The Splunk App for Windows Infrastructure replaces both the Splunk App for Windows and the Splunk App for Active Directory.

We have not forgotten the add-ons. The ldapsearch commands included in the Splunk Support for Active Directory have been updated and now properly support UTF-8 transfer of information, making them useful in foreign language environments. In addition, the Splunk Technology Addon for Windows has been revised with more CIM coverage, allowing for easier ingest of data used in Enterprise Security or PCI Compliance scenarios.

We are showing off all these enhancements on the show floor at the Microsoft Exchange Conference. Come take a tour of these features with us.

We get quite a few requests on how to run two Splunk Universal Forwarders on the same Windows host. Why would you do this? The primary reason is that you have a lab environment and want to compare one version of Splunk to another during an evaluation of a new version. You may also have two sets of files you need to ingest into Splunk and the files have differing access permissions such that Splunk needs to run as different users. It’s really an edge case and definitely not something you want to generally do in production.

In Linux, this is a fairly simple process – just install to a different directory and change the ports and you are done. So what about Windows? The Service Manager kicks off the Splunk processes, so it’s not quite as simple. There are a few extra steps needed to tell the Service Manager about the new locations.

WARNING: WE ARE DISCUSSING AN UNSUPPORTED CONFIGURATION.

Don’t expect a sympathetic ear from our support guys when you are using this configuration. They may be sympathetic, but they won’t be able to assist. Most importantly, DO NOT RUN THIS IN PRODUCTION!

Also, there are limitations. You can only run one copy of the driver-related modular inputs (regmon, netmon and perhaps the most serious – MonitorNoHandle). This means these inputs can only appear in one of the universal forwarders. You will get weird and completely random errors and crashes if you break this rule.

So, now we have that out of the way, how do you do it?

Step 1 – Install the Splunk Universal Forwarder as normal.

Install your first Splunk Universal Forwarder just as you would normally. Go through the GUI or use your silent installer to install the first one. We will be adjusting it as needed. Since we are going to be moving it, you may want to specify an alternate directory for installation.

Step 2 – Stop your Splunk Universal Forwarder.

All these changes need to be made without the Splunk Universal Forwarder running.

Step 3 – Move the installation directory (if necessary).

If you are altering an existing environment for the additional forwarder, you are going to have to move it to a new location. This is simply a Move-Item in PowerShell. If you specified an alternate directory during the installation, this is done already.

Step 4 – Change the Splunk Launch Configuration.

The Splunk launch configuration is stored in $SPLUNK_HOME\etc in a file called splunk-launch.conf. There are two lines you need to alter:

SPLUNK_HOME=C:\Program Files\SplunkUniversalForwarder2
SPLUNK_SERVER_NAME=SplunkForwarder2

The SPLUNK_HOME points to your new directory. The SPLUNK_SERVER_NAME is the name of the Service within the Services control panel.

Step 5 – Create the new Service

Open up an elevated cmd prompt (using Run As Administrator) and type the following:

sc create SplunkForwarder2 binpath= "\"C:\Program Files\SplunkUniversalForwarder2\bin\splunkd.exe\" service”

Note that the path is the path to our new directory and the name of the service is the name we set in the splunk-launch.conf file.

Keep this elevated cmd prompt around – we are going to continue using it.

Step 6 – Delete the old service

In that same cmd prompt, type the following:

sc delete SplunkForwarder

This removes the old service so you can install the second instance.

Step 7 – Change the Admin Port

As with any TCP-based service, you can’t have two services listening on the same port. You need to change one. In this case, we need to change one setting. We can do this in that same elevated cmd window:

cd “C:\Program Files\SplunkUniversalForwarder2\bin”
.\splunk.exe set splunkd-port 8090

Step 8 – Start the Service

Since we have the elevated cmd window open:

.\splunk.exe start

Just as normal, this will start the service for you.

Step 9 – Install your second Splunk Universal Forwarder as normal.

Now that we have created the alternate Splunk Universal Forwarder configuration, we can now install the regular Universal Forwarder.

I have just a few final notes. Firstly, you will remember that the maximum throughput of a universal forwarder is set in the limits.conf file and is limited to 256Kbps. This is a per-service limit, so 2 universal forwarders = 512Kbps. Also, you will see twice the memory footprint and twice the CPU consumption.

Finally – THIS IS UNSUPPORTED.

NO, REALLY, I AM NOT KIDDING – UNSUPPORTED.

Did I mention it’s unsupported?

Windows XP is dead! Soon after Windows XP was introduced, Microsoft introduced the Trustworthy Computing Initiative – a kind of “security first” thinking that has been the hallmark of Microsoft for the last decade. Prior to the security focus, Microsoft operating systems were well known as a leaky sieve for viruses. Now, 12 years later, Windows XP is finally ready to be dropped. Well, to be honest – that happened a few years back. But many people are holding on to their XP installs for one reason or another. Now it’s time to give them up.

How can you tell who is connecting to your facilities with Windows XP systems? There are a variety of ways depending on if they are work computers (and bound to the domain) or home computers (and coming in for email, for example). Let’s look at a few ways.

The most common will be the home computer. Most enterprises have refreshed their work computers at least once in the last decade so they will have a new operating system on them. We can’t directly tell if a computer is Windows XP, but we can check the information sources we have available. For example, we may want to check the user agent that Outlook Web Access is providing us. The Splunk App for Microsoft Exchange provides this as a data feed and has extracted the various pieces.

eventtype=client-owa-usage | lookup useragent cs_user_agent OUTPUT os,osvariant,osversion | search osvariant="Windows NT" osversion="5.2*" | stats count by cs_username

The major piece here is the user agent lookup script provided with the Splunk App for Microsoft Exchange. This allows you to turn the user agent into it’s separate fields. We only care about two of those fields. The search outputs the list of usernames logging in with Windows NT 5.2, which is the version ID for Windows XP and Windows Server 2003. You can use this same technique with other web sites utilizing not just IIS logs but other logs like Apache.

The other piece here will be the work computer. The domain controller will store the current operating system in the computer object when a computer boots up and binds to Active Directory. We can query Active Directory with a properly configured SA-ldapsearch like this:

| ldapsearch domain=SHELL search="(&(|(operatingSystem=*XP*)(operatingSystem=*5.2*))(objectCategory=computer))" attrs="CN,operatingSystem" | table CN,operatingSystem

Here, SHELL is my NetBIOS domain name. You could also use your DNS domain name. We are looking for two versions – Windows XP and Windows NT 5.2 – depending on the particular service pack, you could see both. We then give you the name of the computer and the operating system field it is running. You could also find out who is logged in to this computer by using the Splunk App for Windows Infrastructure. This provides a user logon field in the eventtype msad-successful-user-logons:

eventtype=msad-successful-user-logons | stats latest(user) as user by src_nt_domain,src_nt_host

The src_nt_host field is the name of the computer connecting. You can now put these two together with an ldapfilter command to add on the operating system:

eventtype=msad-successful-user-logons | stats latest(_time) as _time,latest(user) as user by src_nt_domain,src_nt_host | ldapfilter domain=$src_nt_domain$ search="(&(CN=$src_nt_host$*)(objectCategory=computer))" attrs="operatingSystem" | search (operatingSystem="*XP*" OR operatingSystem="*NT 5.2*) | table user,src_nt_domain,src_nt_host,operatingSystem

This now provides the user that was last logged into the Windows XP host.

Splunk 6 has been out almost six months and I have not yet finished covering all the new Windows features. Let’s continue doing that by looking at print monitoring. If you have ever wanted to do charge back reporting for print jobs but lacked the data, then this is for you. The Windows Print Monitor is a new data input in the Splunk 6 Universal Forwarder (ok – it’s also available on Splunk Enterprise).

The idea of this is fairly simple. Install a Splunk 6 Universal Forwarder on your print servers, set up the data input and you will get data. There are two types of data you can get – inventory type information such as the printers, the ports they are attached to (on Windows, servers always connect to printers via ports – even on the network) and the drivers running them are the first. You can monitor this information by enabling them in the inputs.conf like this:

[WinPrintMon://printer]
type=printer
interval=600
baseline=1
disabled=0

[WinPrintMon://driver]
type=driver
interval=600
baseline=1
disabled=0

[WinPrintMon://port]
type=port
interval=600
baseline=1
disabled=0

You will note that there is an interval, measured in seconds. This allows you to capture changes within a certain time period. Print Servers are generally not busy servers so a low interval is generally ok. However, the process does iterate over all the printer objects in the server, so it depends on your environment. Since I generally recommend you install the Splunk Add-on for Windows on all Windows hosts, you can put this in the local inputs.conf of that add-on. You will get events like this from these inputs:

04/21/2014 13:51:59.486
operation=set
type=Printer
ComputerName=ops-sys-001
printer=HP LaserJet M3035 mfp PCL6
share=
port=IPAddress
driver=HP LaserJet M3035 mfp PCL6
comment=None
location=
separate_file=
print_processor=hpzppwn7
data_type="RAW"
parameters=
status="normal"
attributes=979
priority=6
default_priority=2
jobs=8
average_PagePerMinute=73

Note that the type field is Printer. The other type fields for inventory are PrintDriver and PrintPort. You can use these events to create inventory type lookups of all the information. When doing this, I would not include the dynamic fields – jobs or average_PagePerMinute – these would cause unnecessary updates to your lookup file. We’ve discussed generating lookup files in prior blog posts, so I won’t go over them here.

The (to my mind) more interesting input is the Job input. You define it like this:

[WinPrintMon://jobs]
type=job
interval=60
baseline=0
disabled=0

This writes out an event per print job and they look something like this:

04/21/2014 13:52:19.486
operation=add
type=PrintJob
printer=HP LaserJet M3035 mfp PCL6
ComputerName=ops-sys-001
machine=ops-sys-001
user=adrian
document=wallstreetjournal.htm
notify_name=adrian
JobId=35
data_type="NT EMF 1.008"
print_processor=hpzppwn7
parameters=
driver_name="HP LaserJet M3035 mfp PCL6
status=printing
priority=5
total_pages=9
size_bytes=397039
submitted_time=04/21/2014 13:51:40.131
page_printed=7

Producing a report by user for the number of pages printed on a per-printer basis is fairly easy:

sourcetype=WinPrintMon type=PrintJob operation=add | stats sum(page_printed) by user

However, let’s go a little bit further. Let’s say that we have generated a lookup table that has three fields, like this:

host,printer,cost
ops-sys-001,HP LaserJet M3035 mfp PCL6,0.02

We now now augment our search as follows:

sourcetype=WinPrintMon type=PrintJob operation=add
    | lookup printerCost host,printer OUTPUT cost
    | eval cost=if(isnull(cost),0.01,cost)
    | eval job_cost=cost * page_printed
    | stats sum(page_printed) as pagecount, sum(job_cost) as total_cost by user

Note our use of the eval to set a default value for cost, just in case one is not specified. We are almost at an ideal search. However, this isn’t suitable as a chargeback report because we want this by department. To turn a user into a department, you can use another lookup or you can use Active Directory:

sourcetype=WinPrintMon type=PrintJob operation=add
    | lookup printerCost host,printer OUTPUT cost
    | eval cost=if(isnull(cost),0.01,cost)
    | eval job_cost=cost * page_printed
    | stats sum(page_printed) as pagecount, sum(job_cost) as total_cost by user
    | rex field=user "^(?[^\\]+)\\(?.*)"
    | ldapfilter domain=$src_nt_domain$ search="(sAMAccountName=$src_user$)" attrs="department"
    | eval department=if(isnull(department),"UNKNOWN",department)
    | stats sum(pagecount) as pagecount, sum(total_cost) as total_cost by department

The additional work separates out the username we get from the WinPrintMon into domain and user, then we use these to create an Active Directory Filter to retrieve the department. Finally, just as in the case of the page cost, we provide a default for the case when the department is not known and do the calculation. You may be wondering why we do stats twice – once at the user level and once at the department level. This is for efficiency. Do you want hundreds of thousands of ldap queries going to Active Directory, or just a couple of hundred? I suspect the latter.

You should now have the capabilities in your hand for providing chargeback reporting of print jobs to departments. With these capabilities, you can provide accounting with a report on a regular basis (as a CSV), provide individual reports for departments as needed and look at individual users.

If you are a long time Splunker, you might have your environment on an older Splunk version and haven’t taken the plunge to Splunk 6 yet. One of the common questions we get during upgrades is “how do I upgrade all my add-ons?” In Splunk 6, we made some fairly major changes to the Windows inputs, converting perfmon gathering and Windows event log gathering to modular inputs. For example, this means that perfmon is configured in inputs.conf instead of perfmon.conf, and the Windows event logs get an additional couple of slashes in the configuration inside of inputs.conf. How do you slowly upgrade all your universal forwarders from Splunk 5 to Splunk 6 without getting duplication of data and only having one copy of the TA around.

Splunk distributes a free Technology Add-on for Windows and most people use this as the basis for Windows data collection. Let’s use this as the example.

Step 1: Configure the Splunk Technology Add-on for Windows for Splunk 5.x Compatibility

Since we are starting with our Universal Forwarders at Splunk 5.x, we need to support Splunk 5.x in the Splunk Technology Add-on for Windows. This involves just two steps:

Comment out all the Splunk 6.x stanzas in default/inputs.conf
Create the Splunk 5.x stanzas in local/inputs.conf

We comment out the Splunk 6.x stanzas because Splunk 5.x doesn’t understand them. To do this, you need to edit the Splunk_TA_windows/default/inputs.conf file. I know this is not normally done and you will have to be careful when upgrading the TA during the process, but it’s really for the best. There will be several stanzas that start with perfmon:// and three that start with WinEventLog:// – you need to comment out the entire stanza. For example:

# [WinEventLog://Security]
# disabled = 1
# start_from = oldest
# current_only = 0
# checkpointInterval = 5

Now we can move our attention to creating the Splunk 5.x stanzas. The latest Splunk Technology Add-on for Windows does not have any Splunk 5.x stanzas in its default inputs.conf, but there is an inputs.conf.example that has the entries in there. Here is what your local inputs.conf should look like:

[WinEventLog:Application]
disabled = 0
start_from = oldest
current_only = 0
checkpointInterval = 5

[WinEventLog:Security]
disabled = 0
start_from = oldest
current_only = 0
checkpointInterval = 5

[WinEventLog:System]
disabled = 0
start_from = oldest
current_only = 0
checkpointInterval = 5

[script://$SPLUNK_HOME\bin\scripts\splunk-perfmon.path]
interval = 3600
disabled = 0
source = PerformanceMonitor
queue = winparsing

Note that I have not specified any indices here. You really want to create some indices for these and specify them here. Personally, I do most windows event logs into an index “winevents”, all my perfmon data into “perfmon” and finally, my security logs go into a special “security” index. Also, don’t forget that you are configuring perfmon counter gathering in perfmon.conf – not inputs.conf. The changes come later.

Now that you have the Splunk Technology Add-on for Windows configured for Splunk 5.x, you can push it out with the deployment server.

Step 2: Upgrade your Universal Forwarders to Splunk 6.0.3 or later

Now that you have your Splunk instances working nicely on Splunk 5.x, you can start upgrading your installations of the Universal Forwarder to the latest and greatest Universal Forwarder. I prefer a PowerShell version of the install, but other people have done System Center Configuration Manager (SCCM) or even GPO installations. However you do it, you can upgrade at your leisure. The Splunk 6 Universal Forwarder will upgrade all your Splunk 5.x Windows Inputs configurations to Splunk 6.x on the fly when the Universal Forwarder restarts, so you don’t have to worry about them.

Step 3: Transition to a Splunk 6.x Technology Add-on

Once the transition to the Splunk 6 Universal Forwarder is complete, you can transition to a configuration that supports just Splunk 6.x. This allows you to move to a more normal version of the Splunk_TA_windows that can be downloaded and directly implemented. Here are the steps:

Back out the changes to default/inputs.conf – un-comment the Splunk 6 compatible stanzas
Enable the Windows Event Logs in local/inputs.conf
Convert your perfmon.conf to local/inputs.conf

The first step is easy – especially easy if you kept around a copy of the original default/inputs.conf. You want to uncomment everything that you previously commented out. The second step is equally easy – just replace the local/inputs.conf with the following:

[WinEventLog://Application]
disabled = 0

[WinEventLog://Security]
disabled = 0

[WinEventLog://System]
disabled = 0

Finally, it’s likely that you did some changes to perfmon.conf, so you have a local/perfmon.conf file that needs to be changed. To do this, include the perfmon.conf in the inputs.conf file, then edit each perfmon stanza to include a double-slash and lower-case perfmon. The original perfmon.conf stanza would look like this:

[PERFMON:CPUTime]
counters = % Processor Time;% User Time
disabled = 1
instances = _Total
interval = 10
object = Processor

The new inputs.conf stanza looks like this:

[perfmon://CPUTime]
counters = % Processor Time;% User Time
disabled = 1
instances = _Total
interval = 10
object = Processor

Those bolded sections are the only thing that changed!

Once you have made your changes, you can push out the updated Splunk_TA_windows via your deployment server.

Now you have no reason to go ahead and upgrade your Windows Infrastructure to utilize the Splunk 6 Universal Forwarder.

The Splunk App for Microsoft Exchange has a useful lookup named ad_username. It takes the various forms that you can logon to a domain as (like DOMAIN\user and user@domain.com) and normalizes them. Further, it then takes all the user aliases and normalizes them so adrian.hall is the same as ahall and that is the same as adrian. It’s really useful when you are trying to deal with domain accounts from a support functionality – you don’t have to know how they logged in – only what their official username is.

AD_Username is a scripted input written in Python and lives in the bin directory of the application directory. It relies on two files that live in the local directory called domain_aliases.csv and active_directory.csv. In a single box environment, the ad_username.py script finds these files, loads them and uses them to do its job. Normalization happens normally. All is good in the world.

But what happens in a multi-tier environment when you have indexers separated from the search heads? The search head pushes the ad_username.py script into a replication bundle and passes that to the indexers to execute as part of the pipeline. The lookup then happens on the indexer, not the search head. Unfortunately, the dependent CSV files aren’t on the remote indexers, so the script doesn’t do an effective job and if the scripted input requires those files, it could break completely. At best, you get wrong results.

Fortunately, the fix is relatively easy. One of the configuration files that you can use is distsearch.conf and its job is to tell Splunk how to handle distributed searches. One of the things we do in the Splunk App for Windows Infrastructure, for example, is to blacklist the tSessions lookup (which tends to run into the Gb of data) from being replicated. This improves performance. However, you can also whitelist files and this technique is used in our example to ensure that dependent files are replicated properly.

Try this simple stanza in the distsearch.conf:

[replicationWhitelist]
ad_username = …(domain_aliases|active_directory).csv

Place this file in the Splunk_for_Exchange/local/distsearch.conf (for 2.x releases) or the splunk_app_microsoft_exchange/local/distsearch.conf (for 3.x releases). This will add the domain_aliases.csv and active_directory.csv file to the replication bundle for ANY search that is executed within the context of the app. Now you can utilize the ad_username lookup as normal. In addition, panels that rely on the ad_username (normally under the User Behavior menu) will similarly work as intended.

Of course, you could just upgrade to the latest (v3.0.1) version of the Splunk App for Exchange and get the same effect. The Splunk App for Exchange now has a free downloadable trial, so trying this functionality is easier than ever.

You’ve just installed the Splunk App for Windows Infrastructure, or its friend the Splunk App for Exchange. You’ve followed all the instructions, placed the Universal Forwarders on the domain controllers, and configured everything according to the documentation. Now your license is blowing up because you are getting too many EventCode=4662 in the Windows Security Event Log. How did this happen?

Security EventCode 4662 is an abused event code. It is used for directory access, like this:

An operation was performed on an object. 
Subject : 
    Security ID: NT AUTHORITY\SYSTEM 
    Account Name: EXCH2013$ 
    Account Domain: SPL 
    Logon ID: 0x177E5B394
Object: 
    Object Server: DS 
    Object Type: domainDNS 
    Object Name: DC=spl,DC=com 
    Handle ID: 0x0 
Operation: 
    Operation Type: Object Access 
    Accesses: Control Access 
    Access Mask: 0x100
    Properties: Control Access 
        Replicating Directory Changes
        domainDNS 
Additional Information: 
    Parameter 1: - 
    Parameter 2:

These are logged all the time and the more complicated your environment, the more of them you will see. They are also logged for other reasons, like when admon first starts – you’ll get one per record that admon reads, resulting in a large number of 4662 events that will quiet down after a while. Personally, I don’t see a whole lot of value in these messages. You can review another blog post for information on how to control the storm of events from admon initialization. Unfortunately, we need 4662 events for their other – rarer – purpose. That’s an event like this:

An operation was performed on an object. 
Subject : 
    Security ID: SPL\Administrator 
    Account Name: Administrator 
    Account Domain: SPL 
    Logon ID: 0x133857101 
Object: 
    Object Server: DS 
    Object Type: groupPolicyContainer 
    Object Name: CN={BFE075D4-186E-4762-A534-E993DEA898E0}CN=Policies,CN=System,DC=spl,DC=com 
    Handle ID: 0x0 
Operation: 
    Operation Type: Object Access 
    Accesses: Write Property 
    Access Mask: 0x20 
    Properties: Write Property 
        Default Property Set flags 
        groupPolicyContainer 
Additional Information: 
    Parameter 1: - 
    Parameter 2:

We need this one as it deals with a change to a group policy – something we report on within the Splunk App for Windows Infrastructure. However, group policy is the only time we need EventCode 4662. This allows us to filter out the other things – things we don’t need.

Sometimes your security policies require AD access monitoring, but most of the time it’s just noise. How do you log what is required but throw away what isn’t. Fortunately, Splunk Universal Forwarder v6.1 came to the rescue. We added a feature to black list and white list on a regular expression. In the case of the Security Windows Event Log, we need something like this:

[WinEventLog://Security]
blacklist1=EventCode=4662 Message=”Object Type:\s+(?!groupPolicyContainer)”

The black list is a set of key=regex pairs. The list of keys are things like “EventCode” and “TaskCategory” – i.e. the event log keys, not the Splunk fields. In this case we are going to black list EventCode 4662, but only when the Object Type is not groupPolicyContainer. You can do the same for the NT5 (Windows Server 2003) world by using EventCode=566. For more information on this use of regular expressions, see the tutorial at http://www.regular-expressions.info/lookaround.html

So, given all the advice we’ve given over this blog, here is our suggested WinEventLog:Security stanza. It’s fairly simple:

[WinEventLog://Security]
disabled=0
current_only=1
blacklist1=EventCode=4662 Message=”Object Type:\s+(?!groupPolicyContainer)”
blacklist2=EventCode=566 Message=”Object Type:\s+(?!groupPolicyContainer)”

Place this in your Splunk_TA_windows\local\inputs.conf file and push it out to your domain controllers. You should get all the regular Security Event Log entries, but the 566 and 4662 codes are filtered to only provide information on group policy containers. Don’t forget to also follow our advice on admon usage to further reduce the data you store.

Of course, you will have to upgrade your Universal Forwarder to the latest version (v6.1.1 at the time of writing), but the gains for your license usage will be worth it. In addition, this will not reduce the load on your domain controller – we will still do all the queries we need to do to turn SIDs and GUIDs into real names. However, they will no longer hit your license. Just ensure your log rotation settings for your security log are set appropriately.

Windows IP Address Monitoring

Decoding IIS Logs

Detecting Attachments in Microsoft Exchange Server 2013

Logging DMVs from Microsoft SQL Server with PowerShell

Active Directory Replication and Windows Server 2012 R2

Working with Active Directory on Splunk Universal Forwarders

Forwarding Windows Event Logs to another host

Step 1: Configure WinRM

Step 2: Create an Event Subscription

Step 3: Advanced Settings

Step 4: Install a Universal Forwarder

A Final Note

Which Microsoft Servers are inactive?

Universal Forwarders and the Splunk App for Active Directory

Introducing the Cisco Security Suite for Splunk 6

Correlating Windows and VMware Host Information

Correlating Cisco ESA with Microsoft Exchange for Message Tracking

Splunk on Windows, Clustering and IPv6

What’s new in Microsoft Apps

Running two Universal Forwarders on Windows

Detecting Windows XP Systems with Splunk

Windows Print Monitoring in Splunk 6

Upgrading Windows Inputs from Splunk 5.x to Splunk 6.x

Fixing Scripted Inputs in Tiered Deployments

Controlling 4662 Messages in the Windows Security Event Log