Emergency Support for IT and Ops

Imagine going to the ER and the attending doctor is not able to ask you what happened. All he has to work with are a myriad of tests he can run on you to narrow down the problem. This is the environment IT and Operations Support frequently work in when they do not have visibility to configuration changes in their environment.

Emergency Support 1

MAGIC is often a perception of some desirable outcome without the understanding or appreciation of the underlying work required to produce the desired result. We all want MAGIC without having to do the work. Without tackling the fundamental challenges, MAGIC and all the amazing things that can come from it cannot happen. Understanding your IT and Operations infrastructure configuration is one of these core principles.

Emergency Support 2

To learn more about how SIFF can help empower your configuration and change management, watch our 3-minute video to find out “What the #%&$ changed?!”

Log4Shell: Finding Where You Are Vulnerable

I’m sure by now you are well aware of the Log4j 2 vulnerability which is putting an unprecedented number of companies at risk. In case you haven’t heard, here are a couple of quick links to get you up-to-date and advise you on how to mitigate it:

The big question, however, for those who are directly responsible for the security of your company or perhaps indirectly responsible as the application owner or as operational support, is where are the vulnerabilities located? Which Applications? Which Servers? Which tools are susceptible to the Log4Shell; and more importantly, how confident are you that you found every instance of it?

Using the SIFF configuration monitoring platform, you can quickly discover the location of the Log4j vulnerability by using a  SIFF Service Definition (SD) to discover and identify the java processes that are using Apache Log4j, then leverage a SIFF Policy Definition (PD) to validate whether these instances are compliant or not (i.e. Log4j version <= 2.14.1). Violations are flagged, users notified, and the platform can be configured to trigger automated remediation actions.

To make this easy, we have created these SD/PD and included them in the built-in SIFF community library. You simply have to activate these definitions and they will automatically examine any SIFF-managed devices for the Log4Shell vulnerability and notify you.

If you are interested in learning more about using SIFF to ensure security and configuration compliance as well as how SIFF can help monitor configuration changes in your environment, learn more here.

Network Automation != Network Compliance

In a recent study by EMA, The State of Network Automation: Configuration Management Obstacles are Universal, the report indicated that there is significant dissatisfaction with the current state of Configuration Management, especially at the large network operators. The concerns revealed that 3 out of 4 IT organizations are worried that configuration changes are likely to lead to performance problems and security issues. These errors can impact any organization, even those with a leading reputation for network operations such as Facebook where they suffered a global outage in October 2021. Facebook publicly attributed the outage due to a bad network configuration change. 

The study goes on to prescribe that Network Automation is the key path towards improving Network Compliance and Audits. Although automation tools do help provide more consistency and reduce human configuration errors, this path ignores critical attributes of network operations in the real world. Specifically:

  • No networks are fully automated. Most have people making manual configuration changes to the infrastructure. 
  • A large volume of planned vs unplanned changes. Ideally configuration changes follow the change management process however in most organizations, there are a large number of changes made directly that avoid the process for various reasons.
  • Authorized vs unauthorized changes. This includes changes due to security intrusions/hackers as well as internal personnel making changes that are implicitly “allowed”.
  • Multiple automation tools. Most environments have multiple tools used by different functional groups that make configuration changes including vendor-specific management tools and Element Management Systems (EMS). 

The real issue is that Network Compliance and by extension, configuration monitoring, should not be conflated with Network Automation. They serve different purposes. Network Compliance and Audits need to ensure the correct configuration on actual devices and not just “golden configs” defined in automation tools. In other words,

The “configuration truths” are on the actual devices and not in a CMDB or an Inventory system.
Not in network management tools or network automation tools.

The Network Compliance policies and audits must validate what is on the actual devices and verify all changes made to those devices regardless of whether it is manual, automated, or worst case, hacked. 

At SIFF.IO, this is the methodology or approach we use to ensure Network Compliance.

  • SIFF collects and monitors any configuration changes, whether it is a manual change or a change initiated by Network Automation.
  • SIFF applies Compliance Policies to ensure any misconfiguration is immediately flagged and notified. This includes checking existing configs as well as newly detected configuration changes which allows new vulnerabilities to be identified on existing configs.
  • SIFF integrates with one or multiple change management systems used by different functional groups to identify planned vs unauthorized changes.

With SIFF, you have visibility into all configuration changes across all sources (networks, servers, apps, cloud, VMs, and containers) to meet your security compliance and audit requirements. This change visibility is not limited to those that are planned or automated.

Configuration Monitoring and Compliance is different from Configuration Automation. There are certainly overlaps but what they do should not be confused. To learn more about how SIFF.IO can help monitor all infrastructure configuration changes and ensure policy compliance,

Visit SIFF.IO to find out “What the #%&$ changed?!”

Learning from Facebook Global Outage Caused by Mis-Configuration

In a recent blog post, Facebook revealed that the global outage that lasted many hours was caused by configuration change errors to its routers.

Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.

CloudFlare also has an interesting blog that describes what they saw from their perspective. Understanding How Facebook Disappeared from the Internet.

What confounds me is how often errors from misconfiguration still bite us. Even large organizations with seemingly plentiful resources and processes are prone to incidents like the rest of us. 

  • How did the change process break down? 
  • Was it a planned change or an ad-hoc change? 
  • Was the change request not reviewed or lack sufficient detail?
  • Was a post-implementation review carried out?
  • Were the configuration changes captured for post-implementation review?

More often than not, the change process in most organizations is simply a rubber stamp process: 

“Does this planned change make sense? Go ahead, and don’t mess it up.”

For me, what can really save a lot of headaches is to ensure the last 2 questions above are covered:

  • always have someone else peer-review the changes
  • capture, identify and highlight the changes made so it’s easy to provide real feedback and what-ifs

Configuration and change management has been something developers have been dealing with since the dawn of programming. As software development has matured, so have the processes and tooling required to support and minimize the human errors that occur. Specifically,

  • peer code review
  • software version control to automatically capture and highlight the changes made

Both of these go hand-in-hand. Without an easy and automated way to capture and identify configuration changes, peer code review is difficult to do well, and more importantly, becomes very time-consuming.

In operations, your infrastructure configuration is your code. 

  • Do you know when your “code” is getting changed?  
  • Do you know what is getting changed? What was it before?
  • Do you know who is making the changes?

And when a planned change is made, do you automatically capture the actual changes across all the devices and systems in your environment so that you can adequately do a peer review, and hopefully prevent human errors like this one.

This is what we do at SIFF.IO

To learn more about how SIFF can help empower your configuration and change management, watch our 3-minute video to find out “What the #%&$ changed?!”

Responsive Infrastructure Operations

Bridging the Two Worlds, Part 2

In part 1 of the series, we provided an overview of how organizations have a tough time tracking their ongoing configuration changes in their IT and network infrastructure. Many have implemented change management processes to provide some approval and accountability for changes, however many config changes whether planned or unplanned still go undetected which could often lead to disastrous outages, impact on customers, and business continuity.

The ability for companies to innovate, improve service uptime, and implement rapid change is critical to their ability to remain competitive. Knowing that 80% of all incidents are caused by configuration changes, many companies create an overabundance of overhead and a risk averse culture that hobbles change – they fear making mistakes and slow down the process to ensure they cross all the T’s and dot all the i’s – death by a million CABs and approvals.

Innovative companies view their ability to be agile and change as a competitive differentiator. They strive for ways to improve their ability to make changes while minimizing risk. Abstraction through the use of cloud services and automation through the use of new DevOps release automation tools are a couple good examples of how to minimize change risk. Cloud services reduce the number of moving parts and enable us to reliably spin up new services quickly and easily, while automation tools enable consistent repeatable change.  

However, abstraction and automation still do not prevent bad configuration changes from occurring, they simply shift where the configuration changes are performed. More importantly, when incidents or problems occur (and they still will) abstraction and automation does not eliminate accountability or determination of root cause. Awareness of all planned, but also unplanned, configuration changes is critical for effective analysis and remediation of all infrastructure and security incidents. Having this visibility and awareness of change is key to helping to mitigate risk and still provide operational efficiency.  

The adoption of Change Monitoring technologies can significantly reduce the risk of infrastructure changes. It provides accountability to all changes whether they’re planned or unplanned, as well as unauthorized or security intrusions. It can quickly isolate and help identify the root cause of complex incidents.

Operational-Change-Maturity-Levels

Level 1 – Unaware

Level – 1 Unaware was covered in the previous blog post (part1). Most organizations fit into this category as they are simply unaware of configuration changes that are made to their environment. Not only does this pose a significant security risk, it also reinforces bad behavior where users would often skip the prescribed change management process because there simply is no visibility or accountability to infrastructure changes. 

Level 2 – Responsive

A “Responsive” operational support organization is different from a traditional support team.  The Responsive operational support organization leverages and utilizes configuration change events to quickly narrow down and isolate the scope of the incident and identify potential root cause before spending precious time chasing and diving down various “rabbit holes” that can easily consume time and expensive resources. Ideally, the support team should be able to directly utilize Change Request records and analyze the resulting configuration changes made to the infrastructure as well as the impacted dependent services.

The “Level 2 – Responsive” of the Operational Change Maturity Model is the foundational layer in which all the capabilities and objectives of the higher change assurance levels are derived.

  • Greatly improved ability to troubleshoot incidents, especially complex outages
  • Prevent and reduce incidents caused by human errors, or the lack of understanding the underlying service impact and dependencies
  • Identify and reduce unplanned changes to ensure a consistent process and review are followed
  • Detect unauthorized changes and security intrusions
  • Implement and ensure security policy and compliance

Although the primary focus of the Responsive operational support team is to monitor, diagnose and repair as quickly as possible, the team also plays an important role to provide the necessary feedback to the change team and the security teams to ensure that consistent processes are followed to minimize risk and unnecessary incidents.  One of the key requirements of “Level 2 – Responsive” is to provide the visibility and accountability to ALL infrastructure changes. The support team needs to be able to easily distinguish between planned vs unauthorized changes or security intrusions; to provide the feedback to the change and security teams to reinforce the correct behavior.

What about existing tools and approaches?

Configuration Management Database (CMDB)

CMDB is a foundational idea but often with impractical implementation that provides limited value. A good test that demonstrates whether the CMDB provides any useful configuration information for troubleshooting complex incidents is whether the operational support team actually uses the CMDB.

The horror stories of CMDB endeavours are enough to scare off any ambitions of an actual configuration change repository vs what most CMDB implementations are today. CMDBs today are a shell of what they promised. Most simply provide an inventory of systems, devices and software which are used as references or Configuration Item (CIs) by Incident, Change and Problem records. Some CMDB may provide some service dependency information but the actual configuration details of the CI themselves are very limited.

For example, CMDBs would be challenged to answer questions like:

  • Which of MySQL databases have MAX_CONNECTIONS configuration setting greater than 500?
  • Which Cisco devices have the vulnerable Smart Install feature enabled?
  • What configurations were recently changed on this device?

The difficulty in collecting all appropriate data in the CMDB, with too many assets or CI categories is best highlighted by Gartner’s recommendation that only 10-15% of assets be cataloged in the CMDB.

Domain-specific Configuration Management Tools

There are many existing Configuration Management tools that collect and deploy configurations to networks, servers, applications and cloud environments. The challenge with these tools is that they are limited to their specific domains e.g. only networks, specific applications, only cloud, etc.

The ability to see changes across ALL technologies is essential to be able to troubleshoot complex incidents and to understand what services are impacted by the configuration changes. For example a simple firewall rule change by the network security team can directly impact the availability of an application managed by the application team. As services grow increasingly more connected with microservices, containers, virtualization and software-defined storage and networks, the ability to correlate change events across domains is critical.

Being Responsive with Change Monitoring

Having a Change Monitoring tool enables organizations to quickly determine what has changed in the environment whether they are planned or unplanned changes.  Change Monitoring allows operational teams to quickly isolate, troubleshoot complex issues and become more responsive.  The following are some of the key elements of Change Monitoring that should be considered:

  • discovery and inventory of devices to manage
  • discovery and classification of devices and services
  • retrieve the various forms of configuration information
  • search indexing and parsing
  • policy and action
  • service dependency
  • user workflow and interaction
  • API and reporting
  • process and workflow integration
Device Discovery and Inventory

Dynamically add new elements to be managed. This includes networks, servers, containers, applications, cloud services, etc. Need to be able to define rules to include as well as exclude elements. Integration with other inventory sources such as CMDBs are also helpful.

Device and Service Classification

Examine devices in detail to understand what services are relevant and what configuration to collect and monitor for changes.

Configuration Collection

Configuration information exists in many different forms, e.g. files, command line output, registry keys, database entries, APIs. The Change Monitoring system should be able to collect all of these types of configuration information and normalize the data.  Being able to collect the proper information based on the type of application, system or device is critical to monitor the changes in an environment. 

Search Index and Parsing

Once the Configuration information is collected and consolidated, being able to index and analyze the configuration data in an IT-centric way enables the proper information to be provided to IT professionals. For example, IP addresses syntax, special symbols, etc have special meaning that should not be lost by the indexing process so that searching is intuitive for infrastructure management.

Policy and Action

Configuration and monitoring policies can be defined to notify on policy non-compliance. Why is this important?

Service Dependency

Analysis of the configuration data to determine dependencies between service components. This is the foundation for impact analysis and service visualization and enables related systems and information to be automatically collected and presented in context of the affected service

User Workflow and Interaction

What the user is trying to accomplish should determine the workflow and interaction required to retrieve the configuration change information. Troubleshooting and searching for the root-cause of an incident is very different from searching for configuration information about a specific service.

API and Reporting

The searching and filtering capability to retrieve the configuration data should be easily accessible via the API for reporting and integration with external systems.

Process and Workflow Integration

The configuration change information should be integrated with the Change Request workflow to automatically audit the underlying infrastructure configuration changes related to the change request.

Conclusion

A Change Monitoring solution does not requirement much effort to deploy. It can quickly and easily provide additional insights to the operational support team, enabling them to be more efficient at troubleshooting complex incidents. The configuration data is also invaluable to the security team and provides the necessary auditing required for many compliance requirements.

Next in the series we will cover “Level 3 – Proactive” when we make strategic proactive and preventative measures that will reduce unnecessary incidents and minimize change risks. 



SIFF and AccuOSS Team-Up to Integrate the Industry’s First Real-Time Configuration Monitoring

AccuOSS adds SIFF.IO to its OSS portfolio to give service providers and enterprises real-time visibility of infrastructure configuration across their environments with the simplicity of Software-as-a-Service.

Irvine, CA– March 18, 2020 – AccuOSS LLC (Accurate Operational Support Systems), a leading integrator in the Service Assurance and IT Operations Management space, has entered a formal partnership with SIFF.IO  – a cross-silo, cloud-based, configuration monitoring platform. SIFF.IO is the industry’s first cloud-based, configuration monitoring platform that provides detailed configuration details across infrastructure and application silos. It provides IT operations with a simple-to-use platform to understand/rule-out how change may have impacted their services. This is particularly valuable when troubleshooting complex outages and preventing incidents in planned and unplanned configuration changes.

Under this formal partnership agreement, AccuOSS can now offer SIFF.IO to AccuOSS clients through a resell agreement and provide technical services to design, implement, integrate and support SIFF.IO deployments. As per Gartner, “More than 80% of all incidents are caused by planned or unplanned changes.” Adding SIFF.IO to the AccuOSS mix of offerings, addresses a major challenge for enterprises and service providers alike.

In the era of Software-Defined-Everything, DevOps and Hybrid-Cloud, the growing volume and velocity of change is dictating implementation of new capabilities and procedures so that Ops can keep up with Dev. Traditional CMDB and Change Management processes, now more than ever, cannot keep up with the demands of the business. Integrating a capability like SIFF.IO with traditional fault and performance management systems gives IT operations the agility needed to quickly identify the root cause of an issue.

“We’re very excited to partner with AccuOSS. Their deep IT operations experience and systems integration skills will be a great asset as we deliver the industry’s first cloud-based, configuration monitoring solution. More than 80% of incidents are caused by planned or unplanned configuration changes – yet most organizations do not have visibility of actual changes that occur in their environment. Effective change management needs to extend beyond just planning and approvals to include monitoring of configuration changes that are essential to root-cause analysis of complex outages and incident prevention.” – Duke Tantiprasut, CEO & Founder

“Our customers recognized long ago that change is the biggest impact to their IT and Network Operations. While we’ve implemented tools and procedures to reconcile with these challenges, like network configuration and server configuration management capabilities, often those tools have been too siloed to realize the full value potential that a configuration monitoring solution should provide across the IT landscape.,” said Rodney Rindels, Founder and CEO at AccuOSS. “We’re excited to partner with SIFF.IO to deliver this capability for our clients and help them prepare for this inevitable paradigm shift.”

Change Management & Infrastructure Operations

Bridging the Two Worlds, Part 1

This is the first in a series of articles in which we will explore the opportunities that can be accomplished by connecting these two related, yet inherently disconnected worlds. The series will navigate the Operational Change Management Maturity Levels to provide a pathway towards reducing incidents and outages, and avoid inflicting unnecessary pain to ourselves.  

The series does not cover the change management process itself. There is plenty of content available that explores the change management discipline. We will explore what is limiting the benefits promised by the change management process from directly helping infrastructure operations; and how these challenges can be overcome.

 

Operational-Change-Maturity-Levels

explores the change management discipline. We will explore what is limiting the benefits promised by the change management process from directly helping infrastructure operations and how these challenges can be overcome. 

Level 1- Unaware

“More than 80% of all incidents are caused by planned and unplanned changes” – Gartner

 

Of all the wild claims we often hear from industry analysts, this particular one rings true. We could  debate the percentage value, but the core essence of the statement is accurate. There’s a reason why most industries that are highly dependent on their IT infrastructure and services such as financial services, retail, etc… go into a “lock-down” mode during their seasonal peaks —making changes breaks things.

But aren’t we supposed to “fail fast” and “break things” now? 

Amazon, Google, and Netflix dominate because of their ability to innovate quickly. They adopted the “fail fast” mentality and created a culture that do not adversely penalize mistakes for the sake of innovation. This is often measured by the number of successful change requests. In some ways, the ability to promote changes and improvements directly correlates to their business agility.

But how do you do this at scale? How does an organization reduce the risk and make more frequent changes manageable? Many look towards infrastructure as code or software-defined X as the goal. However as many developers can attest, just because it is “code” i.e releases are automated, does not eliminate the underlying issues. Organizations need to revise their perspective on what managing changes means to them, specifically the discipline and process required to reduce the risk. There are light-weight strategies teams can adopt to help avoid causing pain. 

The good news is that software developers have been dealing with quality assurance for a very long time. The opportunity is in applying the lessons to help infrastructure operations.  

What does “Level 1- Unaware” mean?

Here’s the scenario:

  • A critical business application or service breaks
  • You would hear “What the #$&% changed ?!?”
  • The technician starts looking at the availability dashboards to see if there is any outages
  • Checks the alerts to see if there are any related events
  • Checks the performance charts to see any symptoms
  • Sets up a bridge across the functional teams to see if it’s a network, server, storage, application, cloud, or security problem 
  • Then the inevitable…start digging down into logs to find the needle in the haystack

But what’s missing? In this scenario, we rarely experience the technician search Change Requests (CRs) to see if there were recent changes that may have caused the problem. The technician is busy looking at symptoms and trying to guess the cause vs assessing relevant changes that have recently occurred that might have caused the problem. Both approaches are needed however, understanding changes can tremendously narrow the search-space and reduce the time to repair.

Change Requests are often not utilized because they lack actual detailed configurations that resulted from the work. CRs describe the work to be carried out and can provide detailed instructions on how to perform the work, but they do not capture the resulting configurations needed to troubleshoot and repair problems by the infrastructure monitoring team. This is the disconnect or gap between those implementing changes to the infrastructure vs those monitoring and keeping everything up and running. 

Additionally, many organizations still struggle to adopt a consistent change management process. Frequently changes are made directly to systems and devices without following the change request process. These unplanned changes are often the source of outages. Without visibility and accountability to these ad-hoc or unplanned changes, it is very difficult to instill these process changes.

Lastly, unauthorized changes or security breaches, both internal and external are becoming common-place. It’s not a matter of if but when this will occur. How do you know if you’re compromised if you don’t even know what changes are going on?  

These symptom highlights a couple of key limitations in managing change:

  • Change Requests do not contain sufficient technical details to help narrow down the root-cause of complex incidents. They do not contain the actual configuration of systems or devices resulting from the execution of the change request.
  • Operations monitoring does not have visibility to configuration changes that are taking place, including planned, unplanned changes and unauthorized.

What’s Changed? 

How do operations answer this question today? The answer is … complicated. 

First, it depends on what functional domain (aka silo) you’re referring to. Is it network configuration changes, servers, applications, storage, cloud, or security? In addition, each of these are likely to have their own management tool to configure their elements. Some may have multiple, where one application team uses Ansible, another may prefer Puppet.

All of these disparate tools only show part of the picture which makes it difficult to really understand how independent changes in each functional group can impact each other. Hence the need for the conference bridges to isolate the root-cause of complex outages are common-place. No wonder we frequently hear of outrageous hourly outage costs.

According to Gartner, the average cost of IT downtime is $5,600 per minute. Because there are so many differences in how businesses operate, downtime, at the low end, can be as much as $140,000 per hour, $300,000 per hour on average, and as much as $540,000 per hour at the higher end. 

A critical requirement to improve change management for infrastructure operations is “change monitoring”. You need the ability to quickly review configuration changes across all IT infrastructure and easily search for relevant configurations and changes all in a single place.

The change monitoring tool must be flexible. Depending upon your environment, it may be needed to support a plethora of complex services and devices as well as the nuances of legacy systems. For a Communications Service Provider (CSP), this can range from fiber equipment and 5G Radio Access Network (RAN) equipment, to new SDN virtual devices. For a new SaaS provider, it needs to be Cloud services aware (AWS, Azure, GCP), supports containers and orchestration. For enterprises, Hypervisors, SANs and all of the above. 

The above is simply just getting the data. Making it easy for operators to get the needed information at their fingertips is entirely another challenge.

Conclusion

Change management is more than just managing the approval process and execution of changes. It must also involve the monitoring of changes to bring visibility and accountability to planned, unplanned and unauthorized changes. Once you have this in place, you can then start to take actionable steps towards improving infrastructure operations. 

Next in the series we will cover “Level 2 – Responsive” and discuss in greater detail how configuration change monitoring can help accelerate root-cause analysis of complex incidents.

For more information on “What the #$&% changed ?!?”, please contact SIFF at:

[email protected]
https://siff.io
Ph: 949.409.1264

 

Why SIFF?

The Frustration

Everyone has experienced the frustration of troubleshooting a complex problem with no idea of what caused the issue. The only thing you have to work with are the symptoms such as alerts and performance metrics, from which you try to deduce plausible causes. Leading to the initial reaction:

“What the $%&! changed!”

Troubleshooting could be easier if you were aware of the configuration changes that occurred so you can quickly narrow down what you need to consider.

Gartner has shared that “More than 80% of all incidents are caused by planned and unplanned changes.” As a result, large IT operations have policies that prevent configuration changes during critical or busy seasons of their business to minimize incidents.The problem is that none of the existing tools directly tackle what changed.

How about Configuration Management Database (CMBD) or Network Inventory Systems – don’t they contain configuration information?

The configuration details in CMDBs or Inventory systems are pretty rudimentary and contain only basic inventory information such as CPU and Memory or network relationships. But relevant, up-to-date, detailed configuration information that would be useful in troubleshooting incidents is not included.

How about ITIL Change Management – that should track all changes, right?

Assuming configuration changes nicely initiates a Change Request and follows the change management process, the Change Request itself only describes what is intended to be carried out. It does not actually have the details of the configuration changes that were made. It is helpful to know the recent work completed that may be related to the incident but the actual configuration changes are essential to be able to isolate and determine the root cause.

How about Network Config Management, Server Config Management, and Application Configuration Management systems – that should have the detailed configs, right?

These config mgmt systems do have detailed configs, some may have versioning, and show you changes between the configs. The limitation is that they are often constrained to specific domains or silos, e.g. networks only, servers-only, or specific applications. You need to be able to see change events across all functions to be able to correlate and determine the root cause.

Answering “What the $%&! Changed!” is the essence of SIFF.

SIFF = Search for dIFF

Why SIFF Today?

SIFF helps infrastructure operations in the following 3 areas:

  1. Troubleshooting & Repair
  2. Change Management
  3. Governance & Compliance

Troubleshooting & Repair

Our goal is to help infrastructure operations become more efficient and effective at troubleshooting incidents and complex problems by providing the necessary configuration change events to help identify the root cause.

Unlike existing configuration management tools, SIFF does not configure or provision systems or devices, we focus on providing features and capabilities that help with the analysis and troubleshooting of complex incidents.

Change Management

Change Management is a critical process that helps ensure changes made to the infrastructure do not adversely affect current operations. It helps reduce unnecessary incidents by providing approval controls and coordination of work or Change Requests to be performed. 

SIFF helps improve the change management process by associating actual configuration changes with their corresponding Change Request, making process improvements such as peer reviews viable. Currently, it is uncommon for most infrastructure operations to perform peer review of configuration changes because it is time and labor intensive for an additional resource to verify the changes across all related systems and devices. As SIFF monitors configuration changes, it automatically tags the change events with the corresponding Change Request ID so that it can be easily reviewed. Unplanned or unauthorized changes are more easily visible and candidates for investigation. Additionally the change events can be easily searched during incident troubleshooting.

Governance & Compliance

The configuration of systems, devices, services that make up the infrastructure is the code or DNA of your infrastructure. A big benefit in solving the “What the $%&! Changed!” problem is the resulting data become readily accessible for many governance and policy compliance activities. From detailed asset / inventory reporting, configuration compliance monitoring to security audits and forensic analysis. Our goal is to make it easy to access these data and support these governance and monitoring requirements.