Security Detection Language Proposal #1073
Replies: 4 comments 1 reply
-
While building a common detection language is a great idea in principle that I find intriguing, it does not fit into the mission of the OCSF. Therefore, I strongly vote against this proposal, as it would dilute the focus. OCSF is an event taxonomy that defines representation and semantics of data. This mission is big enough already. The whole point is to make it agnostic of downstream analytics, not couple it to one particular one. (There are numerous reasons for why not SQL and not JSON, but that's a separate topic.) |
Beta Was this translation helpful? Give feedback.
-
It seems like a case of scope creep. OCSF is a format for encoding data. The mission is to establish it as a layer independent of downstream analytics.
I'm not quite sure if I understand the point. If you'd include a detection language in the standard, it's fundamentally coupled, no? Otherwise there is no point in proposing a language within OCSF.
The TL;DR is that security people are not data people.
|
Beta Was this translation helpful? Give feedback.
-
Could we just rely on SIGMA and create the proper transformer to OCSF and whatever format is needed based on how data is stored / processed? (Ex: SQL, JQ, etc) |
Beta Was this translation helpful? Give feedback.
-
SQL vs GQL?As somebody who started his programming career in the mid-90s when the RDBMS was king, I almost instinctively tend towards an SQL JOIN-style way of expressing relationships between things. So it almost pains me to say the following. There are good reasons why graph databases came to popularity and why languages like Cypher, GQL, etc. are increasingly used in contexts where SQL might previously have been the obvious choice. It's probably not a good use of anyone's time to start enumerating pros and cons here because I think we're all equally adept at using Google, ChatGPT, etc. As well as the above "well I wouldn't start with SQL in the first place" argument, I'll also raise a few specific issues which don't appear to have been covered in the proposal as-is.
Do we need another standard?Then there's the whole question as to whether we even need another standard way to share detection rules. As my colleague Leandro has pointed out above, we aleady have Sigma, i.e. a "standard" way to express detection rules. Sigma doesn't make any assumptions about a rule is actually evaluated. The rules are abstract and compiled to whatever detection engine, SIEM, etc. one is using. This gives huge flexibility to vendors in terms of how and where rules are evaluated. In contrast, the SQL-based syntax proposed here appears to assume (or at best hints strongly at) an implementation in which events are modelled as collections of flattened key-value pairs which are then projected into Relational Land using something like SQLite's virtual table mechanism. Is this what OCSF is about?At the risk of being a bit blunt, I think this is an overreach and a distraction (albeit an interesting one). We should focus our efforts on getting to a comprehensive schema for the exchange of security information. We are a long way from there right now. If and when that happy day arrives, only then should we perhaps think about a standard way to describe the kind of things we actually do with this security information. |
Beta Was this translation helpful? Give feedback.
-
Contributors: David Magnotti, Rajas Panat, Tim Vidas, Aldrin DSouza, Mike Artz
Overview
I propose expanding Open Cybersecurity Schema Framework (OCSF) to include a SQL-compliant security detection language stored in a JSON blob. The reason to include this as part of OCSF is to facilitate portable stateless and stateful detections for threat detection purposes.
Design
The language is proposed as being stored in JSON in order to support portability. JSON support is native in languages such as Python, Java, PHP, JavaScript, and other languages.
The actual detection language logic is proposed to be implemented as SQL queries, further enabling portability. SQL provides language capabilities that support creation of both stateless (singular events) and stateful (multiple events) detection logic.
Rule Schema
Rules are designed to run against OCSF-crafted data models and generate detections (or classifications) for events. Rules adhere to the template method design pattern.
The UML representation of the rule schema is as such (note, [0..1] indicates the field value is optional, per the UML specification)
Rule
The technique is a string and the sub-technique fields are stored as string. The values of the technique and sub-technique fields are compliant with the MITRE ATT&CK Framework.
The filter section is a SQL-compliant string that queries an OCSF-compliant schema, compliant with ISO 9075-2023. Each column that is selected or filtered maps to a field name value from OCSF. Stateless rules should not use aggregations, restricting use of the keywords JOIN or GROUP BY, and also should not use sub-queries.
An example of a stateless rule designed to detect exploitation of CVE-2023-4966 in an HTTP record is as such:
Another example of a stateless rule designed to detect CobaltStrike beacon via a DNS record:
Considerations
Why SQL?
In order to facilitate support for stateful processing, rather than building custom matching logic, we could choose to leverage an existing language such as SQL. Many of the semantics we would need to support, such as aggregations and filtering, are natively supported in SQL. Additionally, support for SQL is highly available across technology stacks and platforms, so parsing and conversion of rules between languages is better supported than alternative languages.
A stateful query to detect use of a process chain sequence of nginx followed by wget executing bash, using a string array to break the search into multiple lines:
We did consider using SQL extensions that provide greater capability for manipulating data such as JSON, such as Amazon’s SQL-compliant extension, PartiQL. PartiQL provides some advantages for filtering on JSON data over traditional SQL. The disadvantage of these extensions is that they harm portability across technology stacks. Therefore, to optimize for portability, we elected to use a SQL dialect that did not provide custom functions.
Why JSON?
JavaScript Object Notation (JSON) is an efficient and portable notation for representing data. While other notations such as Yet Another Markup Language (YAML) provide advantages such as being simpler to write due to not requiring diligent use of double quotes for key-value pairs, YAML lacks native support in languages such as Python. Other alternatives such as Extensible Markup Language (XML), while portable, tend to be highly verbose, resulting in difficult to read and write rules.
What about Existing languages like Sigma, YARA-L, Roota?
Existing languages such as Sigma, YARA-L, and Roota each provide capabilities for storing security detection rules. However, all of the rules are written to be stateless, only able to operate on singular events. Additionally, each language is written in YAML, which may hinder portability.
Another challenge with Sigma and YARA-L specifically is the lack of support for OCSF. While Roota intends to solve that challenge by combining Sigma with OCSF, it still doesn’t provide a mechanism for creating stateful rules, nor is it as portable as a JSON-based language.
Implementation
How do I use the rules?
There are a variety of approaches for consumption of the rules. For a Python-based implementation, a data science or security analyst user can quickly leverage the rules to evaluate criteria against a pandas DataFrame through the use of the third-party library pandasql.
Using the provided example rule to identify exploitation attempts of CVE-2024-28255:
We can create a script that populates test data in an HttpActivity Pandas DataFrame that will evaluate against the given rule, like so:
9 votes ·
Beta Was this translation helpful? Give feedback.
All reactions