From f256f2f2e52fb432c13b9803c207354d291de9a9 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Wed, 31 May 2023 10:38:51 -0700 Subject: [PATCH 1/9] Current progress on reworking QL docs --- docs/query_lang.rst | 309 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 308 insertions(+), 1 deletion(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 6e68529a..4d046e5d 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -7,7 +7,314 @@ Query Language ************** -As of version 1.2.0, Hatchet has a filtering query language that allows users to filter GraphFrames based on caller-callee relationships between nodes in the Graph. This query language contains two APIs: a high-level API that is expressed using built-in Python data types (e.g., lists, dictionaries, strings) and a low-level API that is expressed using Python callables. +.. versionadded:: 1.2.0 + +One of the most important steps in identifying performance phenomena (e.g., bottlenecks) is data subselection +and reduction. In previous versions of Hatchet, users could only reduce their data using filters on the +pandas :code:`DataFrame` component of the :code:`GraphFrame` object. However, this type of data reduction +ignores the relational data encoded in the Graph component of the :code:`GraphFrame`. + +Hatchet's Call Path Query Language (*query language* for short) provides a new way of reducing profiling +data that leverages this relational data. In the query language, users provide a *query*, or a +description of the properties of one or more paths in the Graph. Hatchet then uses this query to +select all nodes in the Graph that satisfy the query. These nodes are returned to the user as a new +:code:`GraphFrame`. + +The following sections will describe the structure of a query, the syntaxes for constructing +queries, and the APIs for dealing with queries. + +Query Structure +=============== + +The main input to the query language is a query. A *query* is defined as a sequence of *query nodes*. +A query node is a tuple of a *quantifier* and a *predicate*. A quantifier defines how many nodes in the +Graph should be matched to a single query node. A predicate defines what conditions must be satisfied +for a node in the Graph to match a query node. + +Query Syntaxes and Dialects +=========================== + +Hatchet provides three ways of creating queries to simplify the use of the query language under +diverse circumstances (e.g., creating queries in JavaScript that will be moved into Python code). +The first way to create queries is the *base syntax*: a Python API for iteratively building a query +using method calls. Hatchet also provides two dialects for the query language: + +- **Object-based Dialect**: uses Python built-in objects (i.e., :code:`list`, :code:`tuple`, :code:`dict`) + to build queries +- **String-based Dialect**: uses strings to build queries. The syntax of this dialect is derived from + the `Cypher `_ query language for graph databases. + +Besides syntaxes, these three ways of creating queries differ in their capabilities. In general, the base syntax +is the most capable and most verbose. The object-based dialect is the least capable and least verbose, and the +string-based dialect is in-between. For a more complete breakdown of the capabilities, see the +:ref:`Syntax and Dialect Capabilities` section. + +Base Syntax +----------- + +.. note:: + In version 2023.1.0, the query language was refactored to work better with a new tool + called `Thicket `_. This refactor is explained in + more detail in the :ref:`Query Language APIs` section. + This section assumes the use of "new-style" queries. If you are using "old-style" queries, you + can simply replace any use of :code:`Query` with :code:`QueryMatcher`. + +Hatchet's :code:`Query` class defines how to construct a query using the base syntax. Using this class, +queries are built using the :code:`match` and :code:`rel` methods. The :code:`match` method sets the first +node of the query. The :code:`rel` method is called iteratively, each time adding a new node to the end +of the query. Both methods take a quantifier and a predicate as input. + +A quantifier can have one of four possible values: + +- :code:`"."`: match one node +- :code:`"*"`: match zero or more nodes +- :code:`+`: match one or more nodes +- An integer: match exactly that number of nodes + +All queries can be constructed using just :code:`"."` and :code:`"*"`. The :code:`"+"` and integer quantifiers +are provided for convenience and are implemented as follows: + +- :code:`"+"`: implemented as a node with a :code:`"."` quantifier followed by a node with a :code:`"*"` quantifier +- Integer: implemented as a sequence of nodes with :code:`"."` quantifiers + +If no quantifier is provided to :code:`match` or :code:`rel`, the default :code:`"."` quantifier is used. + +A predicate is represented as a Python :code:`Callable` that takes the data for a node in a :code:`GraphFrame` +as input and returns a Boolean. The returned Boolean is used internally to determine whether a :code:`GraphFrame` +node satisfies the predicate. If a predicate is to provided to :code:`match` or :code:`rel`, the default +predicate is a function that always returns :code:`True`. + +The base syntax is illustrated by the query below. This query uses two query nodes to find all subgraphs in the +Graph rooted at MPI (or PMPI) function calls that have more than 5 L2 cache misses (as measured by PAPI). + +.. code-block:: python + + query = ( + Query() + .match( + ".", + lambda row: re.match( + "P?MPI_.*", + row["name"] + ) + is not None + and row["PAPI_L2_TCM"] > 5 + ) + .rel("*") + ) + +Object-based Dialect +-------------------- + +The object-based dialect allows users to construct queries using built-in Python objects. In this dialect, a +query is represented by a Python :code:`list` of query nodes. Each query node is represented by a Python +:code:`tuple` of a qunatifier and a predicate. Quantifiers are represented the same way as in the base syntax +(see :ref:`Base Syntax` for more information). Predicates are represented as key-value pairs where keys +are metric names and values are Boolean expressions generated using the following rules: + +- If the metric is numeric, the value can be a be a number (checks for equality) or a string consisting of a + comparison operator (one of :code:`<`, :code:`<=`, :code:`==`, :code:`>`, or :code:`>=`) followed by a number +- If the metric is a string, the value can be any regex string that is a valid input to `Python's re.match + function `_. + +Multiple predicates can be combined into a larger predicate by simply putting multiple key-value pairs into +the same Python :code:`dict`. When multiple predicates are in the same :code:`dict` in the object-base dialect, +they are all combined by conjunction (i.e., logical AND). + +When using a default quantifier (i.e., :code:`"."`) or predicate (i.e., a function that always returns :code:`True`), +query nodes do not have to be represented as a Python :code:`tuple`. In these situations, a query node is represented +by simply adding the non-default component to the Python :code:`list` representing the query. + +The object-based dialect is illustrated by the query below. This query is the same as the one introduced in the +:ref:`Base Syntax` section. It uses two query nodes to find all subgraphs in the Graph rooted at MPI (or PMPI) +function calls that have more than 5 L2 cache misses (as measured by PAPI). + +.. code-block:: python + + query = [ + ( + ".", + { + "name": "P?MPI_.*", + "PAPI_L2_TCM": "> 5" + } + ), + "*" + ] + +String-based Dialect +-------------------- + +The string-based dialect allows users to construct queries using strings. This allows the string-based dialect +to be the only way of creating queries that is not tied to Python. The syntax of the query strings in the +string-based dialect is derived from `Cypher `_. +A query in this dialect contains two main syntactic pieces: a :code:`MATCH` statement and a :code:`WHERE` +statement. + +The :code:`MATCH` statement starts with the :code:`MATCH` keyword and defines the quantifiers and variable +names used to refer to query nodes in the predicates. Each node in the :code:`MATCH` statement takes the form +of :code:`([,] )`. Quantifiers in the string-based dialect have the same representation +as the base syntax and object-based dialect. Variables can be any valid combination of letters, numbers, and underscores +that does not start with a number (i.e., normal variable name rules). Multiple query nodes can be added to the +:code:`MATCH` statement by chaining the nodes with :code:`->`. + +The :code:`WHERE` statement starts with the :code:`WHERE` keyword and defines one or more predicates. +Predicates in the string-based dialect are represented by expressions of the form :code:`."" `. +In these expressions, :code:`` should be replaced by the variable associated with the desired query node +in the :code:`MATCH` statement, and :code:`` should be replaced by the name of the metric being considered. +:code:`` should be replaced by one of the following: + +- :code:`= `: checks if the metric equals a value +- :code:`STARTS WITH `: checks if a string metric starts with a substring +- :code:`ENDS WITH `: checks if a string metric ends with a substring +- :code:`CONTAINS `: checks if a string metric contains a substring +- :code:`=~ `: checks if a string metric matches a regex +- :code:`< `: checks if a numeric metric is less than a value +- :code:`<= `: checks if a numeric metric is less than or equal to a value +- :code:`> `: checks if a numeric metric is greater than a value +- :code:`>= `: checks if a numeric metric is greater than or equal to a value +- :code:`IS NAN`: checks if a numeric metric is NaN +- :code:`IS NOT NAN`: checks if a numeric metric is not NaN +- :code:`IS INF`: checks if a numeric metric is infinity +- :code:`IS NOT INF`: checks if a numeric metric is not infinity +- :code:`IS NONE`: checks if a metric is Python's None value +- :code:`IS NOT NONE`: checks if a metric is not Python's None value + +.. versionadded:: 2022.2.1 + + Added two new comparison operations (i.e., :code:`IS LEAF` and :code:`IS NOT LEAF`) for determining + whether a node is a leaf node of the Graph. + +Multiple predicates can be combined using three Boolean operators: conjunction (i.e., :code:`AND` keyword), +disjunction (i.e., :code:`OR` keyword), and complement (i.e., :code:`NOT` keyword). + +The string-based dialect is illustrated by the query below. This query is the same as the one introduced in the +:ref:`Base Syntax` section. It uses two query nodes to find all subgraphs in the Graph rooted at MPI (or PMPI) +function calls that have more than 5 L2 cache misses (as measured by PAPI). + +.. code-block:: python + + query = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WTICH "PMPI_" AND + p."PAPI_L2_TCM" > 5 + """ + +Query Language APIs +=================== + +In version 2023.1.0, the query language underwent a large refactor to enable support for :code:`GraphFrame` objects +containing a multi-indexed DataFrame. As a result, the query language now has two APIs: + +- New-Style Queries: APIs for the query language starting with version 2023.1.0 +- Old-Style Queries: APIs for the query language prior to version 2023.1.0 + +Old-style queries are discouraged for new users. However, these APIs are not deprecated at this time. For the time +being, old-style queries will be maintained as a thin wrapper around new-style queries. + +Applying Queries to GraphFrames +------------------------------- + +Whether using new-style queries or old-style queries, queries are applied to the data in a :code:`GraphFrame` +using the :code:`GraphFrame.filter()` method. This method takes a "filter object" as its first argument. A filter object can be +one of the following: + +- A Python :code:`Callable`: filters the data in the :code:`GraphFrame` using a filter on the DataFrame (i.e., does not use the query language) +- A string: assumes the argument is a string-dialect query, builds a new-style query object from the argument, + and applies that query to the :code:`GraphFrame` +- A Python :code:`list`: assumes the argument is an object-dialect query, builds a new-style query object from the argument, + and applies that query to the :code:`GraphFrame` +- A new-sytle or old-style query object: applies the query to the :code:`GraphFrame` + +As a result, the differences between new-style queries and old-style queries do not matter when using single +object- or string-dialect queries. Users only need to get into the differences between these APIs when using +base syntax queries or when combining queries. + +New-Style Queries +----------------- + +The new-style query API consists of 3 main classes: + +- :code:`Query`: represents the base syntax +- :code:`ObjectQuery`: represents the object-based dialect +- :code:`StringQuery`: represents the string-based dialect + +After creating objects of these classes, queries can be combined using the following "compound query" classes: + +- :code:`ConjunctionQuery`: combines the results of each sub-query using set conjunction (i.e., logical AND) +- :code:`DisjunctionQuery`: combines the results of each sub-query using set disjunction (i.e., logical OR) +- :code:`ExclusiveDisjunctionQuery`: combines the results of each sub-query using set exclusive disjunction (i.e., logical XOR) +- :code:`NegationQuery`: combines the results of a single sub-query using set negation (i.e., logical NOT) + +The rest of this section provides brief descriptions and examples of the usage of these classes. + +Query Class +^^^^^^^^^^^ + +The :code:`Query` class is used to represent base syntax queries. To use it, simply create a :code:`Query` +object and call :code:`match` and :code:`rel` as described in the :ref:`Base Syntax` section. + +An example of the use of this class can be found in the :ref:`Base Syntax` section. + +ObjectQuery Class +^^^^^^^^^^^^^^^^^ + +The :code:`ObjectQuery` class is used to represent object-based dialect queries. To use it, create an object-based dialect +query (as described in the :ref:`Object-based Dialect` section), and pass that query to the constructor of +:code:`ObjectQuery`. + +For example, the following code can be used to create an :code:`ObjectQuery` object from the query in the example from the :ref:`Object-based Dialect` +section: + +.. code-block:: python + + query = [ + ( + ".", + { + "name": "P?MPI_.*", + "PAPI_L2_TCM": "> 5" + } + ), + "*" + ] + query_obj = hatchet.query.ObjectQuery(query) + +StringQuery Class +^^^^^^^^^^^^^^^^^ + +The :code:`StringQuery` class is used to represent string-based dialect queries. To use it, first create +a string-based dialect query (as described in the :ref:`String-based Dialect` section). Then, a :code:`StringQuery` +object can be created from that string-based dialect query using either the :code:`StringQuery` constructor +or the :code:`parse_string_dialect` function. :code:`parse_string_dialect` is the recommended way of creating +a :code:`StringQuery` object because it allows users to write compound queries as strings (this functionality is not +yet documented). + +For example, the following code can be used to create a :code:`StringQuery` object from the query in the example from the :ref:`String-based Dialect` +section: + +.. code-block:: python + + query = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WTICH "PMPI_" AND + p."PAPI_L2_TCM" > 5 + """ + query_obj = hatchet.query.parse_string_dialect(query) + +ConjunctionQuery, DisjunctionQuery, and ExclusiveDisjunctionQuery Classes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +TBA + +NegationQuery Class +^^^^^^^^^^^^^^^^^^^ + +TBA + + + +Hatchet has a filtering query language that allows users to filter GraphFrames based on caller-callee relationships between nodes in the Graph. This query language contains two APIs: a high-level API that is expressed using built-in Python data types (e.g., lists, dictionaries, strings) and a low-level API that is expressed using Python callables. Regardless of API, queries in Hatchet represent abstract paths, or path patterns, within the Graph being filtered. When filtering on a query, Hatchet will identify all paths in the Graph that match the query. Then, it will return a new GraphFrame object containing only the nodes contained in the matched paths. A query is represented as a list of *abstract graph nodes*. Each *abstract graph node* is made of two parts: From 3ac1b7b4223cf49b0277527e2b39832beb52f41f Mon Sep 17 00:00:00 2001 From: ilumsden Date: Thu, 1 Jun 2023 22:01:37 -0700 Subject: [PATCH 2/9] Completes the initial versions of all but the last two sections --- docs/query_lang.rst | 564 ++++++++++++++++++++++++-------------------- docs/user_guide.rst | 1 + 2 files changed, 311 insertions(+), 254 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 4d046e5d..d0133e3d 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -11,17 +11,17 @@ Query Language One of the most important steps in identifying performance phenomena (e.g., bottlenecks) is data subselection and reduction. In previous versions of Hatchet, users could only reduce their data using filters on the -pandas :code:`DataFrame` component of the :code:`GraphFrame` object. However, this type of data reduction -ignores the relational data encoded in the Graph component of the :code:`GraphFrame`. +pandas :code:`DataFrame` component of the GraphFrame object. However, this type of data reduction +ignores the relational data encoded in the Graph component of the GraphFrame. Hatchet's Call Path Query Language (*query language* for short) provides a new way of reducing profiling data that leverages this relational data. In the query language, users provide a *query*, or a description of the properties of one or more paths in the Graph. Hatchet then uses this query to select all nodes in the Graph that satisfy the query. These nodes are returned to the user as a new -:code:`GraphFrame`. +GraphFrame. The following sections will describe the structure of a query, the syntaxes for constructing -queries, and the APIs for dealing with queries. +queries, extra functionality to provided by basic queries, and the APIs for dealing with queries. Query Structure =============== @@ -31,6 +31,7 @@ A query node is a tuple of a *quantifier* and a *predicate*. A quantifier define Graph should be matched to a single query node. A predicate defines what conditions must be satisfied for a node in the Graph to match a query node. +.. _query_syntaxes: Query Syntaxes and Dialects =========================== @@ -47,16 +48,17 @@ using method calls. Hatchet also provides two dialects for the query language: Besides syntaxes, these three ways of creating queries differ in their capabilities. In general, the base syntax is the most capable and most verbose. The object-based dialect is the least capable and least verbose, and the string-based dialect is in-between. For a more complete breakdown of the capabilities, see the -:ref:`Syntax and Dialect Capabilities` section. +:ref:`syntax_capabilities` section. +.. _base_syntax: Base Syntax ----------- .. note:: In version 2023.1.0, the query language was refactored to work better with a new tool called `Thicket `_. This refactor is explained in - more detail in the :ref:`Query Language APIs` section. - This section assumes the use of "new-style" queries. If you are using "old-style" queries, you + more detail in the :ref:`query_lang_apis` section. + This section assumes the use of "new-style" query API. If you are using the "old-style" API, you can simply replace any use of :code:`Query` with :code:`QueryMatcher`. Hatchet's :code:`Query` class defines how to construct a query using the base syntax. Using this class, @@ -68,7 +70,7 @@ A quantifier can have one of four possible values: - :code:`"."`: match one node - :code:`"*"`: match zero or more nodes -- :code:`+`: match one or more nodes +- :code:`"+"`: match one or more nodes - An integer: match exactly that number of nodes All queries can be constructed using just :code:`"."` and :code:`"*"`. The :code:`"+"` and integer quantifiers @@ -79,13 +81,13 @@ are provided for convenience and are implemented as follows: If no quantifier is provided to :code:`match` or :code:`rel`, the default :code:`"."` quantifier is used. -A predicate is represented as a Python :code:`Callable` that takes the data for a node in a :code:`GraphFrame` -as input and returns a Boolean. The returned Boolean is used internally to determine whether a :code:`GraphFrame` -node satisfies the predicate. If a predicate is to provided to :code:`match` or :code:`rel`, the default +A predicate is represented as a Python :code:`Callable` that takes the data for a node in a GraphFrame +as input and returns a Boolean. The returned Boolean is used internally to determine whether a GraphFrame +node satisfies the predicate. If a predicate is not provided to :code:`match` or :code:`rel`, the default predicate is a function that always returns :code:`True`. The base syntax is illustrated by the query below. This query uses two query nodes to find all subgraphs in the -Graph rooted at MPI (or PMPI) function calls that have more than 5 L2 cache misses (as measured by PAPI). +GraphFrame rooted at MPI (or PMPI) function calls that have more than 5 L2 cache misses (as measured by PAPI). .. code-block:: python @@ -103,13 +105,14 @@ Graph rooted at MPI (or PMPI) function calls that have more than 5 L2 cache miss .rel("*") ) +.. _obj_dialect: Object-based Dialect -------------------- The object-based dialect allows users to construct queries using built-in Python objects. In this dialect, a query is represented by a Python :code:`list` of query nodes. Each query node is represented by a Python :code:`tuple` of a qunatifier and a predicate. Quantifiers are represented the same way as in the base syntax -(see :ref:`Base Syntax` for more information). Predicates are represented as key-value pairs where keys +(see the :ref:`base_syntax` section for more information). Predicates are represented as key-value pairs where keys are metric names and values are Boolean expressions generated using the following rules: - If the metric is numeric, the value can be a be a number (checks for equality) or a string consisting of a @@ -118,7 +121,7 @@ are metric names and values are Boolean expressions generated using the followin function `_. Multiple predicates can be combined into a larger predicate by simply putting multiple key-value pairs into -the same Python :code:`dict`. When multiple predicates are in the same :code:`dict` in the object-base dialect, +the same Python :code:`dict`. When multiple predicates are in the same :code:`dict` in the object-based dialect, they are all combined by conjunction (i.e., logical AND). When using a default quantifier (i.e., :code:`"."`) or predicate (i.e., a function that always returns :code:`True`), @@ -126,7 +129,7 @@ query nodes do not have to be represented as a Python :code:`tuple`. In these si by simply adding the non-default component to the Python :code:`list` representing the query. The object-based dialect is illustrated by the query below. This query is the same as the one introduced in the -:ref:`Base Syntax` section. It uses two query nodes to find all subgraphs in the Graph rooted at MPI (or PMPI) +:ref:`base_syntax` section. It uses two query nodes to find all subgraphs in the GraphFrame rooted at MPI (or PMPI) function calls that have more than 5 L2 cache misses (as measured by PAPI). .. code-block:: python @@ -142,9 +145,12 @@ function calls that have more than 5 L2 cache misses (as measured by PAPI). "*" ] +.. _str_dialect: String-based Dialect -------------------- +.. versionadded:: 2022.1.0 + The string-based dialect allows users to construct queries using strings. This allows the string-based dialect to be the only way of creating queries that is not tied to Python. The syntax of the query strings in the string-based dialect is derived from `Cypher `_. @@ -153,7 +159,7 @@ statement. The :code:`MATCH` statement starts with the :code:`MATCH` keyword and defines the quantifiers and variable names used to refer to query nodes in the predicates. Each node in the :code:`MATCH` statement takes the form -of :code:`([,] )`. Quantifiers in the string-based dialect have the same representation +of :code:`(, )`. Quantifiers in the string-based dialect have the same representation as the base syntax and object-based dialect. Variables can be any valid combination of letters, numbers, and underscores that does not start with a number (i.e., normal variable name rules). Multiple query nodes can be added to the :code:`MATCH` statement by chaining the nodes with :code:`->`. @@ -161,7 +167,7 @@ that does not start with a number (i.e., normal variable name rules). Multiple q The :code:`WHERE` statement starts with the :code:`WHERE` keyword and defines one or more predicates. Predicates in the string-based dialect are represented by expressions of the form :code:`."" `. In these expressions, :code:`` should be replaced by the variable associated with the desired query node -in the :code:`MATCH` statement, and :code:`` should be replaced by the name of the metric being considered. +from the :code:`MATCH` statement, and :code:`` should be replaced by the name of the metric being considered. :code:`` should be replaced by one of the following: - :code:`= `: checks if the metric equals a value @@ -180,16 +186,16 @@ in the :code:`MATCH` statement, and :code:`` should be replaced by the n - :code:`IS NONE`: checks if a metric is Python's None value - :code:`IS NOT NONE`: checks if a metric is not Python's None value -.. versionadded:: 2022.2.1 - - Added two new comparison operations (i.e., :code:`IS LEAF` and :code:`IS NOT LEAF`) for determining - whether a node is a leaf node of the Graph. +.. note:: + .. versionadded:: 2022.2.1 + Added the comparison operations :code:`IS LEAF` and :code:`IS NOT LEAF`, which check + whether a node is a leaf node of the GraphFrame. Multiple predicates can be combined using three Boolean operators: conjunction (i.e., :code:`AND` keyword), disjunction (i.e., :code:`OR` keyword), and complement (i.e., :code:`NOT` keyword). The string-based dialect is illustrated by the query below. This query is the same as the one introduced in the -:ref:`Base Syntax` section. It uses two query nodes to find all subgraphs in the Graph rooted at MPI (or PMPI) +:ref:`base_syntax` section. It uses two query nodes to find all subgraphs in the GraphFrame rooted at MPI (or PMPI) function calls that have more than 5 L2 cache misses (as measured by PAPI). .. code-block:: python @@ -200,10 +206,234 @@ function calls that have more than 5 L2 cache misses (as measured by PAPI). p."PAPI_L2_TCM" > 5 """ +.. note:: + + The string-based dialect is **case-sensitive**. + +.. _applying_queries: +Applying Queries to GraphFrames +=============================== + +Queries are applied to the data in a GraphFrame using the :code:`GraphFrame.filter()` method. +This method takes a "filter object" as its first argument. A filter object can be one of the following: + +- A Python :code:`Callable`: filters the data in the GraphFrame using a filter on the DataFrame + (*does not use the query language*) +- A string: assumes the argument is a string-dialect query, builds a new-style query object from the argument, + and applies that query to the GraphFrame +- A Python :code:`list`: assumes the argument is an object-dialect query, builds a new-style query object from the argument, + and applies that query to the GraphFrame +- A new-sytle or old-style query object: applies the query to the GraphFrame + +The call to :code:`GraphFrame.filter()` will return a new GraphFrame containing the nodes from *all* paths +in the original GraphFrame that match the properties described by the query. + +Additional Query Language Functionality +======================================= + +This section covers several types of functionality that the query language provides beyond the core querying +covered by the :ref:`query_syntaxes` and :ref:`applying_queries` sections. + +.. _compound_queries: +Combining Query Results with Compound Queries +--------------------------------------------- + +.. versionadded:: 2022.1.0 + +.. note:: + + This section assumes the use of the "new-style" query APIs. If using the "old-style" API, simply replace + the query classes detailed in this section with their equivalents from the :ref:`old_style_queries` section. + For more information about the new-style and old-style APIs, see the :ref:`query_lang_apis` section. + +Sometimes, a user might want to combine the results of multiple queries together to get a more detailed +picture of their performance data. To enable this, the query language provides "compound queries". A compound +query is a type of query that modifies the results of one or more other queries using a set operation. Currently, +the query language provides the following Python classes for creating compound queries: + +- :code:`ConjunctionQuery`: combines the results of two or more sub-queries using + set conjunction (i.e., logical AND) +- :code:`DisjunctionQuery`: combines the results of two or more sub-queries using + set disjunction (i.e., logical OR) +- :code:`ExclusiveDisjunctionQuery`: combines the results of two or more sub-queries using + exclusive set disjunction (i.e., logical XOR) +- :code:`NegationQuery`: modifies the results of a single sub-query using set negation (i.e., logical NOT) + +Before using any of these classes, all sub-queries must be converted into a Python query class. These classes +are detailed in the :ref:`query_lang_apis` section. In general, though, "raw" queries (i.e., +object-based dialect and string-based dialect queries) are converted into query class objects by simply calling +the constructor for the corresponding class. + +Once all sub-queries are converted into query class objects, a compound query can be created in one of two ways. +First, all the sub-queries can be passed into the constructor of a compound query class. An example of this is shown +below. This example creates a :code:`DisjunctionQuery` object from two string-based dialect queries. +The first query looks for all subgraphs rooted at MPI nodes, and the second query looks for all subgraphs rooted at CUDA host functions +(i.e., functions starting with the :code:`cuda` or :code:`cu` prefixes). So, the :code:`DisjunctionQuery` +can be used to look at the host-side internals of a MPI+CUDA program. + +.. code-block:: python + + query_mpi = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" + """ + query_cuda_host = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "cuda" or p."name" STARTS WITH "cu" + """ + disjunction_query = hatchet.query.DisjunctionQuery(query_mpi, query_cuda_host) + +The other way to create a compound query is to use Python's built-in binary operators. The following list +shows the operators supported for compound queries and how they map to compound query classes: + +- :code:`&` = :code:`ConjunctionQuery` +- :code:`|` = :code:`DisjunctionQuery` +- :code:`^` = :code:`ExclusiveDisjunctionQuery` +- :code:`~` = :code:`NegationQuery` + +The code block below shows the same :code:`DisjunctionQuery` query example as above using binary operators. + +.. code-block:: python + + query_mpi = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" + """ + query_cuda_host = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "cuda" or p."name" STARTS WITH "cu" + """ + disjunction_query = query_mpi | query_cuda_host + +Supporting Compound Queries through the String-based Dialect +------------------------------------------------------------ + +.. versionadded:: 2022.1.0 + +When using the string-based dialect, compound queries do not need to be created using the compound query +classes described in the :ref:`compound_queries` section. Instead, compound queries can be created +directly within the string-based dialect using curly braces and the :code:`AND`, :code:`OR`, and :code:`XOR` +keywords. + +When creating compound queries from the string-dialect, curly braces should be placed around either +entire string-based dialect queries (i.e., both the :code:`MATCH` and :code:`WHERE` statements) or +around subsets of the predicate in the :code:`WHERE` statement. When wrapping entire string-based +dialect queries, each wrapped region is treated as a sub-query. When wrapping subsets of the predicate +in the :code:`WHERE` statement, sub-queries are created by combining the unwrapped :code:`MATCH` statement +with each wrapped subset in the :code:`WHERE` statement. + +Curly brace-delimited regions of a string-based query should then be separated using the :code:`AND`, +:code:`OR`, and :code:`XOR` keywords. When used to separate curly brace-delimited regions, these keywords +map to compound query classes as follows: + +- :code:`AND` = :code:`ConjunctionQuery` +- :code:`OR` = :code:`DisjunctionQuery` +- :code:`XOR` = :code:`ExclusiveDisjunctionQuery` + +To illustrate this functionality, consider the MPI+CUDA example from the :ref:`compound_queries` section. +When placing curly braces around entire string-based dialect subqueries, this example can be rewritten +as follows: + +.. code-block:: python + + query_mpi_and_cuda = """ + {MATCH (".", p)->("*") WHERE p."name" STARTS WITH "MPI_"} OR + {MATCH (".", p)->("*") WHERE p."name" STARTS WITH "cuda" or p."name" STARTS WITH "cu"} + """ + +Similarly, when placing curly braces around subsets of the predicate in the :code:`WHERE` statement, +this example can be rewritten as follows: + +.. code-block:: python + + query_mpi_and_cuda = """ + MATCH (".", p)->("*") + WHERE {p."name" STARTS WITH "MPI_"} OR {p."name" STARTS WITH "cuda" or p."name" STARTS WITH "cu"} + """ + +Compound queries in the string-based dialect cannot be wrapped in query language classes by simply +passing them to constructors. Instead, these types of compound queries can be wrapped in classes +using the :code:`parse_string_dialect` function. This function accepts a string-based dialect +query as its only required argument and returns either a :code:`StringQuery` object (when there are no curly +brace-delimited regions in the input query) or a compound query object (when there are curly +brace-delimited regions in the input query). If a query language class is not needed, compound +queries in the string-based dialect can simply be applied to a GraphFrame as usual with +:code:`GraphFrame.filter()`. + +Supporting Multi-Indexed GraphFrames in the Object- and String-based Dialects +----------------------------------------------------------------------------- + +.. versionadded:: 2023.1.0 + +As explained in the :ref:`user_guide`, the DataFrame component of the GraphFrame often uses a multiindex +to represent data for multiprocessed and/or multithreaded applications. However, this multiindexed data +is difficult to work with in query language predicates. For example, consider the following base syntax +predicate: + +.. code-block:: python + + predicate1 = lambda row: row["time"] > 5 + +This predicate simply checks if a node's "time" metric is greater than 5. +This predicate makes sense for non-multiindexed data because there is only one row of data in the +DataFrame for a given node and, thus, only one value for each metric for that node. In other words, +metrics are scalar for a given node when dealing with non-multiindexed data. However, when dealing +with multiindexed data, there are multiple rows for a given node, and, as a result, each metric is +effectively a vector. + +Since version 1.2.0, handling this type of multiindexed data has only been supported in the base syntax +because of the flexibility it provides by being a programmatic interface. For example, the predicate +above can be rewritten for multiindexed data as follows: + +.. code-block:: python + + predicate1 = lambda node_data: node_data["time"].apply(lambda x: x > 5).all() + +This predicate checks that *all* values for a node's "time" metric are greater than 5. +As the example above illustrates, one important consideration when dealing with multiindexed data +is how to reduce a vector of metric data into the scalar Boolean value required by the query language. +Because the base syntax requires users to write the Python code for their predicates, it allows users +to make that decision easily. Unfortunately, the object- and string-based dialects do not provide the +same flexibility because they intentionally require the user to not write Python code. For this reason, +the dialects previously have not supported multiindexed GraphFrames, and users were required to reduce +their data to a non-multiindexed GraphFrame (e.g., through :code:`GraphFrame.drop_index_levels`) before +applying a query in either dialect. + +However, with the introduction of the new-style query API in version 2023.1.0 (see the :ref:`query_lang_apis` +section for more information), it is now possible to use multiindexed GraphFrames with the object- and +string-based dialects. To do so, users must manually wrap their object-based dialect and string-based dialect +queries in the :code:`ObjectQuery` and :code:`StringQuery` classes respectively. Both of these classes' +constructors accept two arguments: + +1. The object- or string-based dialect query to be wrapped +2. The :code:`multi_index_mode` parameter, which can be set to :code:`"off"`, :code:`"all"`, or :code:`"any"` + +The new :code:`multi_index_mode` parameter controls how predicates generated from the dialects will treat +multiindexed data. When set to :code:`"off"` (which is the default), the generated predicates will assume +that the data for each node is **not** multiindexed. This behavior is the same as eariler versions of Hatchet. +When set to :code:`all`, the generated predicates will require that all rows of data for a given node +satisfy the predicate. This usually amounts to applying the predicate to a node's data +with pandas' :code:`Series.apply()` and reducing the resulting :code:`Series` with :code:`Series.all()`. +Finally, when :code:`multi_index_mode` is set to :code:`"any"`, the generated predicates will require +that one or more rows of data for a given node satisfy the predicate. This usually amounts to applying +the predicate to a node's data with pandas' :code:`Series.apply()` and reducing the resulting :code:`Series` +with :code:`Series.any()`. + +.. warning:: + + The old-style query API still does **not** support multiindexed GraphFrames for the object- + and string-based dialects. When using multiindexed GraphFrames, users must either use the new-style + query API or the base syntax support in the old-style query API's :code:`QueryMatcher` class. + +.. _query_lang_apis: Query Language APIs =================== -In version 2023.1.0, the query language underwent a large refactor to enable support for :code:`GraphFrame` objects +.. warning:: + + Section in-progress + +In version 2023.1.0, the query language underwent a large refactor to enable support for GraphFrame objects containing a multi-indexed DataFrame. As a result, the query language now has two APIs: - New-Style Queries: APIs for the query language starting with version 2023.1.0 @@ -212,27 +442,11 @@ containing a multi-indexed DataFrame. As a result, the query language now has tw Old-style queries are discouraged for new users. However, these APIs are not deprecated at this time. For the time being, old-style queries will be maintained as a thin wrapper around new-style queries. -Applying Queries to GraphFrames -------------------------------- - -Whether using new-style queries or old-style queries, queries are applied to the data in a :code:`GraphFrame` -using the :code:`GraphFrame.filter()` method. This method takes a "filter object" as its first argument. A filter object can be -one of the following: - -- A Python :code:`Callable`: filters the data in the :code:`GraphFrame` using a filter on the DataFrame (i.e., does not use the query language) -- A string: assumes the argument is a string-dialect query, builds a new-style query object from the argument, - and applies that query to the :code:`GraphFrame` -- A Python :code:`list`: assumes the argument is an object-dialect query, builds a new-style query object from the argument, - and applies that query to the :code:`GraphFrame` -- A new-sytle or old-style query object: applies the query to the :code:`GraphFrame` - -As a result, the differences between new-style queries and old-style queries do not matter when using single -object- or string-dialect queries. Users only need to get into the differences between these APIs when using -base syntax queries or when combining queries. - New-Style Queries ----------------- +.. versionadded:: 2023.1.0 + The new-style query API consists of 3 main classes: - :code:`Query`: represents the base syntax @@ -243,7 +457,7 @@ After creating objects of these classes, queries can be combined using the follo - :code:`ConjunctionQuery`: combines the results of each sub-query using set conjunction (i.e., logical AND) - :code:`DisjunctionQuery`: combines the results of each sub-query using set disjunction (i.e., logical OR) -- :code:`ExclusiveDisjunctionQuery`: combines the results of each sub-query using set exclusive disjunction (i.e., logical XOR) +- :code:`ExclusiveDisjunctionQuery`: combines the results of each sub-query using exclusive set disjunction (i.e., logical XOR) - :code:`NegationQuery`: combines the results of a single sub-query using set negation (i.e., logical NOT) The rest of this section provides brief descriptions and examples of the usage of these classes. @@ -252,18 +466,18 @@ Query Class ^^^^^^^^^^^ The :code:`Query` class is used to represent base syntax queries. To use it, simply create a :code:`Query` -object and call :code:`match` and :code:`rel` as described in the :ref:`Base Syntax` section. +object and call :code:`match` and :code:`rel` as described in the :ref:`base_syntax` section. -An example of the use of this class can be found in the :ref:`Base Syntax` section. +An example of the use of this class can be found in the :ref:`base_syntax` section. ObjectQuery Class ^^^^^^^^^^^^^^^^^ The :code:`ObjectQuery` class is used to represent object-based dialect queries. To use it, create an object-based dialect -query (as described in the :ref:`Object-based Dialect` section), and pass that query to the constructor of +query (as described in the :ref:`obj_dialect` section), and pass that query to the constructor of :code:`ObjectQuery`. -For example, the following code can be used to create an :code:`ObjectQuery` object from the query in the example from the :ref:`Object-based Dialect` +For example, the following code can be used to create an :code:`ObjectQuery` object from the query in the example from the :ref:`obj_dialect` section: .. code-block:: python @@ -284,13 +498,13 @@ StringQuery Class ^^^^^^^^^^^^^^^^^ The :code:`StringQuery` class is used to represent string-based dialect queries. To use it, first create -a string-based dialect query (as described in the :ref:`String-based Dialect` section). Then, a :code:`StringQuery` +a string-based dialect query (as described in the :ref:`str_dialect` section). Then, a :code:`StringQuery` object can be created from that string-based dialect query using either the :code:`StringQuery` constructor or the :code:`parse_string_dialect` function. :code:`parse_string_dialect` is the recommended way of creating a :code:`StringQuery` object because it allows users to write compound queries as strings (this functionality is not yet documented). -For example, the following code can be used to create a :code:`StringQuery` object from the query in the example from the :ref:`String-based Dialect` +For example, the following code can be used to create a :code:`StringQuery` object from the query in the example from the :ref:`str_dialect` section: .. code-block:: python @@ -302,231 +516,73 @@ section: """ query_obj = hatchet.query.parse_string_dialect(query) +.. _binary_compound_query: ConjunctionQuery, DisjunctionQuery, and ExclusiveDisjunctionQuery Classes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -TBA - -NegationQuery Class -^^^^^^^^^^^^^^^^^^^ - -TBA - - +The :code:`ConjunctionQuery`, :code:`DisjunctionQuery`, and :code:`ExclusiveDisjunctionQuery` classes are used +to combine the results of two or more queries using set conjunction (i.e., logical AND), set disjunction (i.e., logical OR), +and exclusive set disjunction (i.e., logical XOR) respectively. All three of these classes work the same way. +To use these classes, simply pass the queries (either as new-sytle query objects or "raw" object- and string-base dialect queries) +whose results you want to combine into the class's constructor. -Hatchet has a filtering query language that allows users to filter GraphFrames based on caller-callee relationships between nodes in the Graph. This query language contains two APIs: a high-level API that is expressed using built-in Python data types (e.g., lists, dictionaries, strings) and a low-level API that is expressed using Python callables. - -Regardless of API, queries in Hatchet represent abstract paths, or path patterns, within the Graph being filtered. When filtering on a query, Hatchet will identify all paths in the Graph that match the query. Then, it will return a new GraphFrame object containing only the nodes contained in the matched paths. A query is represented as a list of *abstract graph nodes*. Each *abstract graph node* is made of two parts: - -- A wildcard that specifies the number of real nodes to match to the abstract node -- A filter that is used to determine whether a real node matches the abstract node - -The primary differences between the two APIs are the representation of filters, how wildcards and filters are combined into *abstract graph nodes*, and how *abstract graph nodes* are combined into a full query. - -The following sections will describe the specifications for queries in both APIs and provide examples of how to use the query language. - -High-Level API -============== - -The high-level API for Hatchet's query language is designed to allow users to quickly write simple queries. It has a simple syntax based on built-in Python data types (e.g., lists, dictionaries, strings). The following subsections will describe each component of high-level queries. After creating a query, it can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: - -.. code-block:: python - - query = - filtered_gf = gf.filter(query) - -Wildcards -~~~~~~~~~ - -Wildcards in the high-level API are specified by one of four possible values: - -- The string :code:`"."`, which means "match 1 node" -- The string :code:`"*"`, which means "match 0 or more nodes" -- The string :code:`"+"`, which means "match 1 or more nodes" -- An integer, which means "match exactly that number of nodes" (integer 1 is equivalent to :code:`"."`) - -Filters -~~~~~~~ - -Filters in the high-level API are specified by Python dictionaries. These dictionaries are keyed on the names of *node attributes*. These attributes' names are the same as the column names from the DataFrame associated with the GraphFrame being filtered (which can be obtained with :code:`gf.dataframe`). There are also two special attribute names: - -- `depth`, which filters on the depth of the node in the Graph -- `node_id`, which filters on the node's unique identifier within the GraphFrame - -The values in a high-level API filter dictionary define the conditions that must be passed to pass the filter. Their data types depend on the data type of the corresponding attribute. The table below describes what value data types are valid for different attribute data types. - -+----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ -| Attribute Data Type | Example Attributes | Valid Filter Value Types | Description of Condition | -+============================+==========================+================================================================================================+================================================================================================================+ -| Real (integer or float) | `time` | Real (integer or float) | Attribute value exactly equals filter value | -+ + +------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ -| | `time (inc)` | String starting with comparison operator | Attribute value must pass comparison described in filter value | -+----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ -| String | `name` | Regex String (see `Python re module `_ for details) | Attribute must match filter value (passed to `re.match `_) | -+----------------------------+--------------------------+------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ - -The values in a high-level API filter dictionary can also be iterables (e.g., lists, tuples) of the valid values defined in the table above. - -In the high-level API, all conditions (key-value pairs, including conditions contained in a list value) in a filter must pass for the a real node to match the corresponding *abstract graph node*. - -Abstract Graph Nodes -~~~~~~~~~~~~~~~~~~~~ - -In the high-level API, *abstract graph nodes* are represented by Python tuples containing a single wildcard and a single filter. Alternatively, an *abstract graph node* can be represented by only a single . When only providing a wildcard or a filter (and not both), the default is used for the other component. The defaults are as follows: - -- Wildcard: :code:`"."` (match 1 node) -- Filter: an "always-true" filter (any node passes this filter) - -Full Queries -~~~~~~~~~~~~ - -In the high-level API, a query is represented as a Python list of *abstract graph nodes*. In general, the following code can be used as a template to build a low-level query. - -.. code-block:: python - - query = [ - (wildcard1, query1), - (wildcard2, query2), - (wildcard3, query3) - ] - filtered_gf = gf.filter(query) - -Low-Level API -============= - -The low-level API for Hatchet's query language is designed to allow users to perform more complex queries. It's syntax is based on Python callables (e.g., functions, lambdas). The following subsections will describe each component of low-level queries. Like high-level queries, low-level queries can be used to filter a GraphFrame by passing it to the :code:`GraphFrame.filter` function as follows: - -.. code-block:: python - - query = - filtered_gf = gf.filter(query) - -Wildcards -~~~~~~~~~ - -Wildcards in the low-level API are the exact same as wildcards in the high-level API. The following values are currently allowed for wildcards: - -- The string :code:`"."`, which means "match 1 node" -- The string :code:`"*"`, which means "match 0 or more nodes" -- The string :code:`"+"`, which means "match 1 or more nodes" -- An integer, which means "match exactly that number of nodes" (integer 1 is equivalent to :code:`"."`) - -Filters -~~~~~~~ - -The biggest difference between the high-level and low-level APIs are how filters are represented. In the low-level API, filters are represented by Python callables. These callables should take one argument representing a node in the graph and should return a boolean stating whether or not the node satisfies the filter. The type of the argument to the callable depends on whether the :code:`GraphFrame.drop_index_levels` function was previously called. If this function was called, the type of the argument will be a :code:`pandas.Series`. This :code:`Series` will be the row representing a node in the internal :code:`pandas.DataFrame`. If the :code:`GraphFrame.drop_index_levels` function was not called, the type of the argument will be a :code:`pandas.DataFrame`. This :code:`DataFrame` will contain the rows of the internal :code:`pandas.DataFrame` representing a node. Multiple rows are returned in this case because the internal :code:`DataFrame` will contain one row for every thread and function call. - -For example, if you want to match nodes with an exclusive time (represented by "time" column) greater than 2 and an inclusive time (represented by "time (inc)" column) greater than 5, you could use the following filter. This filter assumes you have already called the :code:`GraphFrame.drop_index_levels` function. +For example, the following code can be used to create a :code:`DisjunctionQuery` object from two string-based dialect queries. +The first query looks for all subgraphs rooted at MPI nodes, and the second query looks for all subgraphs rooted at CUDA host functions +(i.e., functions starting with the :code:`cuda` prefix). So, the :code:`DisjunctionQuery` can be used to look at the +host-side internals of a MPI+CUDA program. .. code-block:: python - filter = lambda row: row["time"] > 2 and row["time (inc)"] > 5 - -Abstract Graph Nodes -~~~~~~~~~~~~~~~~~~~~ - -To build *abstract graph nodes* in the low-level API, you will first need to import Hatchet's :code:`QueryMatcher` class. This can be done with the following import. - -.. code-block:: python - - from hatchet import QueryMatcher - -The :code:`QueryMatcher` class has two functions that can be used to build *abstract graph nodes*. The first function is :code:`QueryMatcher.match`, which resets the query and constructs a new *abstract graph node* as the root of the query. The second function is :code:`QueryMatcher.rel`, which constructs a new *abstract graph node* and appends it to the query. Both of these functions take two arguments: a wildcard and a low-level filter. If either the filter or wildcard are not provided, the default will be used. The defaults are as follows: - -- Wildcard: :code:`"."` (match 1 node) -- Filter: an "always-true" filter (any node passes this filter) - -Both of these functions also return a reference to the :code:`self` parameter of the :code:`QueryMatcher` object. This allows :code:`QueryMatcher.match` and :code:`QueryMatcher.rel` to be chained together. - -Full Queries -~~~~~~~~~~~~ - -Full queries in the low-level API are built by making sucessive calls to the :code:`QueryMatcher.match` and :code:`QueryMatcher.rel` functions. In general, the following code can be used as a template to build a low-level query. - -.. code-block:: python - - from hatchet import QueryMatcher - - query = QueryMatcher().match(wildcard1, filter1) - .rel(wildcard2, filter2) - .rel(wildcard3, filter3) - filtered_gf = gf.filter(query) - -Compound Queries -================ - -*Compound queries is currently a development feature.* - -Compound queries allow users to apply some operation on the results of one or more queries. Currently, the following compound queries are available directly from :code:`hatchet.query`: - -- :code:`AndQuery` and :code:`IntersectionQuery` -- :code:`OrQuery` and :code:`UnionQuery` -- :code:`XorQuery` and :code:`SymDifferenceQuery` - -Additionally, the compound query feature provides the following abstract base classes that can be used by users to implement their own compound queries: - -- :code:`AbstractQuery` -- :code:`NaryQuery` - -The following subsections will describe each of these compound query classes. - -AbstractQuery -~~~~~~~~~~~~~ - -:code:`AbstractQuery` is an interface (i.e., abstract base class with no implementation) that defines the basic requirements for a query in the Hatchet query language. All query types, including user-created compound queries, must inherit from this class. - -NaryQuery -~~~~~~~~~ - -:code:`NaryQuery` is an abstract base class that inherits from :code:`AbstractQuery`. It defines the basic functionality and requirements for compound queries that perform one or more subqueries, collect the results of the subqueries, and performs some subclass defined operation to merge the results into a single result. Queries that inherit from :code:`NaryQuery` must implment the :code:`_perform_nary_op` function, which takes a list of results and should perform some operation on it. - -AndQuery -~~~~~~~~ - -The :code:`AndQuery` class can be used to perform two or more subqueries and compute the intersection of all the returned lists of matched nodes. To create an :code:`AndQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`AndQuery` constructor. The following code can be used as a template for creating an :code:`AndQuery`. - -.. code-block:: python - - from hatchet.query import AndQuery + query_mpi = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" + """ + query_cuda_host = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "cuda" + """ + disjunction_query = hatchet.query.DisjunctionQuery(query_mpi, query_cuda_host) - query1 = - query2 = - query3 = - and_query = AndQuery(query1, query2, query3) - filtered_gf = gf.filter(and_query) -:code:`IntersectionQuery` is also provided as an alias (i.e., renaming) of :code:`AndQuery`. The two can be used interchangably. +NegationQuery Class +^^^^^^^^^^^^^^^^^^^ -OrQuery -~~~~~~~~ +The :code:`NegationQuery` class is used to negate the results of another query. In other words, a :code:`NegationQuery` object +can be used to get all nodes in a Graph that were **not** matched by another query. To use this class, +simply pass the query you want to negate to the constructor of :code:`NegationQuery`. -The :code:`OrQuery` class can be used to perform two or more subqueries and compute the union of all the returned lists of matched nodes. To create an :code:`OrQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`OrQuery` constructor. The following code can be used as a template for creating an :code:`OrQuery`. +For example, the following code can be used to create a :code:`NegationQuery` object that removes the MPI layer +(i.e., subgraphs rooted at MPI functions) from a Graph. .. code-block:: python - from hatchet.query import OrQuery + mpi_query = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" + """ + no_mpi_query = hatchet.query.NegationQuery(mpi_query) - query1 = - query2 = - query3 = - or_query = OrQuery(query1, query2, query3) - filtered_gf = gf.filter(or_query) +.. _old_style_queries: +Old-Style Queries +----------------- -:code:`UnionQuery` is also provided as an alias (i.e., renaming) of :code:`OrQuery`. The two can be used interchangably. +The old-style query API consists of 2 main classes: -XorQuery -~~~~~~~~ +- :code:`QueryMatcher`: represents the base syntax and the object-based dialect +- :code:`CypherQuery`: represents the string-based dialect -The :code:`XorQuery` class can be used to perform two or more subqueries and compute the symmetric difference (set theory equivalent to XOR) of all the returned lists of matched nodes. To create an :code:`XorQuery`, simply create your subqueries (which can be high-level, low-level, or compound), and pass them to the :code:`XorQuery` constructor. The following code can be used as a template for creating an :code:`XorQuery`. +After creating objects of these classes, queries can be combined using the following old-style compound +query classes: -.. code-block:: python +- :code:`AndQuery` (and its alias :code:`IntersectionQuery`): combines the results of each sub-query using logical AND +- :code:`OrQuery` (and its alias :code:`UnionQuery`): combines the results of each sub-query using logical OR +- :code:`XorQuery` (and its alias :code:`SymDifferenceQuery`): combines the results of each sub-query using logical XOR +- :code:`NotQuery`: negates the results of a single sub-query (i.e., logical NOT) - from hatchet.query import XorQuery +.. _syntax_capabilities: +Syntax and Dialect Capabilities +=============================== - query1 = - query2 = - query3 = - xor_query = XorQuery(query1, query2, query3) - filtered_gf = gf.filter(xor_query) +.. warning:: -:code:`SymDifferenceQuery` is also provided as an alias (i.e., renaming) of :code:`XorQuery`. The two can be used interchangably. + Section in-progress diff --git a/docs/user_guide.rst b/docs/user_guide.rst index c93838f1..0335850c 100644 --- a/docs/user_guide.rst +++ b/docs/user_guide.rst @@ -3,6 +3,7 @@ SPDX-License-Identifier: MIT +.. _user_guide: ********** User Guide ********** From 15b840b0c6ffea3f8e767f44520e8325997e00dd Mon Sep 17 00:00:00 2001 From: ilumsden Date: Thu, 1 Jun 2023 22:06:50 -0700 Subject: [PATCH 3/9] Adds QL paper publication to Publications page --- docs/publications.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/publications.rst b/docs/publications.rst index fa66e993..7c69f1b7 100644 --- a/docs/publications.rst +++ b/docs/publications.rst @@ -10,6 +10,8 @@ Publications and Presentations Publications ============ +- Ian Lumsden, Jakob Luettgau, Vanessa Lama, Connor Scully-Allison, Stephanie Brink, Katherine E. Isaacs, Olga Pearce, Michela Taufer. `Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications `_. In Proceedings of the 2022 IEEE 18th International Conference on e-Science (e-Science), Salt Lake City, UT. + - Stephanie Brink, Ian Lumsden, Connor Scully-Allison, Katy Williams, Olga Pearce, Todd Gamblin, Michela Taufer, Katherine Isaacs, Abhinav Bhatele. `Usability and Performance Improvements in Hatchet `_. Presented at the `ProTools 2020 Workshop `_, held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20), held virtually. - Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. `Hatchet: Pruning the Overgrowth in Parallel Profiles `_. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19), Denver, CO. From 28e17604cc5f478e735a5626069370d93f0a22b1 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 2 Jun 2023 07:04:39 -0700 Subject: [PATCH 4/9] Adds the Query Language APIs section --- docs/query_lang.rst | 243 ++++++++++++++------------------------------ 1 file changed, 78 insertions(+), 165 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index d0133e3d..4d224767 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -225,8 +225,9 @@ This method takes a "filter object" as its first argument. A filter object can b and applies that query to the GraphFrame - A new-sytle or old-style query object: applies the query to the GraphFrame -The call to :code:`GraphFrame.filter()` will return a new GraphFrame containing the nodes from *all* paths -in the original GraphFrame that match the properties described by the query. +When providing a query, the call to :code:`GraphFrame.filter()` will return a new GraphFrame +containing the nodes from *all* paths in the original GraphFrame that match the properties +described by the query. Additional Query Language Functionality ======================================= @@ -243,7 +244,7 @@ Combining Query Results with Compound Queries .. note:: This section assumes the use of the "new-style" query APIs. If using the "old-style" API, simply replace - the query classes detailed in this section with their equivalents from the :ref:`old_style_queries` section. + the query classes detailed in this section with their equivalents from the old-style API. For more information about the new-style and old-style APIs, see the :ref:`query_lang_apis` section. Sometimes, a user might want to combine the results of multiple queries together to get a more detailed @@ -259,17 +260,12 @@ the query language provides the following Python classes for creating compound q exclusive set disjunction (i.e., logical XOR) - :code:`NegationQuery`: modifies the results of a single sub-query using set negation (i.e., logical NOT) -Before using any of these classes, all sub-queries must be converted into a Python query class. These classes -are detailed in the :ref:`query_lang_apis` section. In general, though, "raw" queries (i.e., -object-based dialect and string-based dialect queries) are converted into query class objects by simply calling -the constructor for the corresponding class. - -Once all sub-queries are converted into query class objects, a compound query can be created in one of two ways. -First, all the sub-queries can be passed into the constructor of a compound query class. An example of this is shown -below. This example creates a :code:`DisjunctionQuery` object from two string-based dialect queries. -The first query looks for all subgraphs rooted at MPI nodes, and the second query looks for all subgraphs rooted at CUDA host functions -(i.e., functions starting with the :code:`cuda` or :code:`cu` prefixes). So, the :code:`DisjunctionQuery` -can be used to look at the host-side internals of a MPI+CUDA program. +A compound query can be created in one of two ways. First, all the sub-queries can be passed into +the constructor of a compound query class. An example of this is shown below. This example creates +a :code:`DisjunctionQuery` object from two string-based dialect queries. The first query looks for +all subgraphs rooted at MPI nodes, and the second query looks for all subgraphs rooted at CUDA host +functions (i.e., functions starting with the :code:`cuda` or :code:`cu` prefixes). So, the +:code:`DisjunctionQuery` can be used to look at the host-side internals of a MPI+CUDA program. .. code-block:: python @@ -320,7 +316,8 @@ entire string-based dialect queries (i.e., both the :code:`MATCH` and :code:`WHE around subsets of the predicate in the :code:`WHERE` statement. When wrapping entire string-based dialect queries, each wrapped region is treated as a sub-query. When wrapping subsets of the predicate in the :code:`WHERE` statement, sub-queries are created by combining the unwrapped :code:`MATCH` statement -with each wrapped subset in the :code:`WHERE` statement. +with each wrapped subset in the :code:`WHERE` statement. This can be thought of as the :code:`MATCH` +statement being shared between the wrapped subsets in the :code:`WHERE` statement. Curly brace-delimited regions of a string-based query should then be separated using the :code:`AND`, :code:`OR`, and :code:`XOR` keywords. When used to separate curly brace-delimited regions, these keywords @@ -360,6 +357,7 @@ brace-delimited regions in the input query). If a query language class is not ne queries in the string-based dialect can simply be applied to a GraphFrame as usual with :code:`GraphFrame.filter()`. +.. _multi_index_gf: Supporting Multi-Indexed GraphFrames in the Object- and String-based Dialects ----------------------------------------------------------------------------- @@ -401,15 +399,16 @@ applying a query in either dialect. However, with the introduction of the new-style query API in version 2023.1.0 (see the :ref:`query_lang_apis` section for more information), it is now possible to use multiindexed GraphFrames with the object- and -string-based dialects. To do so, users must manually wrap their object-based dialect and string-based dialect -queries in the :code:`ObjectQuery` and :code:`StringQuery` classes respectively. Both of these classes' -constructors accept two arguments: +string-based dialects. To do so, users must provide the new :code:`multi_index_mode` parameter +to the :code:`GraphFrame.filter()` method, the :code:`ObjectQuery` class, or the :code:`StringQuery` class. +This parameter controls how predicates generated from the dialects will treat multiindexed data. +It can be set to one of three values: -1. The object- or string-based dialect query to be wrapped -2. The :code:`multi_index_mode` parameter, which can be set to :code:`"off"`, :code:`"all"`, or :code:`"any"` +- :code:`"off"` +- :code:`"all"` +- :code:`"any"` -The new :code:`multi_index_mode` parameter controls how predicates generated from the dialects will treat -multiindexed data. When set to :code:`"off"` (which is the default), the generated predicates will assume +When set to :code:`"off"` (which is the default), the generated predicates will assume that the data for each node is **not** multiindexed. This behavior is the same as eariler versions of Hatchet. When set to :code:`all`, the generated predicates will require that all rows of data for a given node satisfy the predicate. This usually amounts to applying the predicate to a node's data @@ -429,155 +428,69 @@ with :code:`Series.any()`. Query Language APIs =================== -.. warning:: - - Section in-progress +.. versionchanged:: 2023.1.0 In version 2023.1.0, the query language underwent a large refactor to enable support for GraphFrame objects -containing a multi-indexed DataFrame. As a result, the query language now has two APIs: - -- New-Style Queries: APIs for the query language starting with version 2023.1.0 -- Old-Style Queries: APIs for the query language prior to version 2023.1.0 - -Old-style queries are discouraged for new users. However, these APIs are not deprecated at this time. For the time -being, old-style queries will be maintained as a thin wrapper around new-style queries. - -New-Style Queries ------------------ - -.. versionadded:: 2023.1.0 - -The new-style query API consists of 3 main classes: - -- :code:`Query`: represents the base syntax -- :code:`ObjectQuery`: represents the object-based dialect -- :code:`StringQuery`: represents the string-based dialect - -After creating objects of these classes, queries can be combined using the following "compound query" classes: - -- :code:`ConjunctionQuery`: combines the results of each sub-query using set conjunction (i.e., logical AND) -- :code:`DisjunctionQuery`: combines the results of each sub-query using set disjunction (i.e., logical OR) -- :code:`ExclusiveDisjunctionQuery`: combines the results of each sub-query using exclusive set disjunction (i.e., logical XOR) -- :code:`NegationQuery`: combines the results of a single sub-query using set negation (i.e., logical NOT) - -The rest of this section provides brief descriptions and examples of the usage of these classes. - -Query Class -^^^^^^^^^^^ - -The :code:`Query` class is used to represent base syntax queries. To use it, simply create a :code:`Query` -object and call :code:`match` and :code:`rel` as described in the :ref:`base_syntax` section. - -An example of the use of this class can be found in the :ref:`base_syntax` section. - -ObjectQuery Class -^^^^^^^^^^^^^^^^^ - -The :code:`ObjectQuery` class is used to represent object-based dialect queries. To use it, create an object-based dialect -query (as described in the :ref:`obj_dialect` section), and pass that query to the constructor of -:code:`ObjectQuery`. - -For example, the following code can be used to create an :code:`ObjectQuery` object from the query in the example from the :ref:`obj_dialect` -section: +containing a multi-indexed DataFrame (see the :ref:`multi_index_gf` section for more information). +As a result, the query language now has two APIs: + +- New-Style Query API: for the query language starting with version 2023.1.0 +- Old-Style Query API: for the query language prior to version 2023.1.0 + +The old-style API is discouraged for new users. However, these APIs are not deprecated at this time. For the time +being, the old-style API will be maintained as a thin wrapper around the new-style API. + +The key changes in the new-style API that are exposed to users are: + +- The creation of a new dedicated :code:`ObjectQuery` class to represent object-based dialect queries +- The renaming of compound query classes and the elimination of confusing alias classes + +All other changes in the new-style API are either minor changes (e.g., renaming) or internal changes that +are not visible to end users. + +The table below shows the classes and functions of the new- and old-style APIs and how they map to one another. + ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| New-Style API | Old-Style API | Description | ++===================================+============================+========================================================================+ +| :code:`Query` | :code:`QueryMatcher` | Implements the base syntax | ++-----------------------------------+ +------------------------------------------------------------------------+ +| :code:`ObjectQuery` | | Parses the object-based dialect and converts it into the base syntax | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`StringQuery` | :code:`CypherQuery` | Parses the string-based dialect and converts it into the base syntax | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`parse_string_dialect` | :code:`parse_cypher_query` | Parses either normal string-based dialect queries or compound | +| | | queries in the string-based dialect into classes | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`ConjunctionQuery` | :code:`AndQuery` | Combines sub-queries with set conjunction (i.e., logical AND) | +| | :code:`IntersectionQuery` | | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`DisjunctionQuery` | :code:`OrQuery` | Combines sub-queries with set disjunction (i.e., logical OR) | +| | :code:`UnionQuery` | | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`ExclusiveDisjunctionQuery` | :code:`XorQuery` | Combines sub-queries with exclusive set disjunction (i.e., logical XOR | +| | :code:`SymDifferenceQuery` | | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ +| :code:`NegationQuery` | :code:`NotQuery` | Modifies a single sub-query with set negation (i.e., logical NOT) | ++-----------------------------------+----------------------------+------------------------------------------------------------------------+ + +The only other changes that may impact users are changes to the base classes of the classes in the table above. +In the old-style API, all classes in the query language inherit from :code:`AbstractQuery`. As a result, +:code:`isinstance(obj, hatchet.query.AbstractQuery)` or :code:`issubclass(type(obj), hatchet.query.AbstractQuery)` +can be used to check if a Python object is an old-sytle API query object. In the new-style API, "normal" queries +(i.e., :code:`Query`, :code:`ObjectQuery`, and :code:`StringQuery`) and compound queries +(i.e., :code:`ConjunctionQuery`, :code:`DisjunctionQuery`, :code:`ExclusiveDisjunctionQuery`, and +:code:`NegationQuery`) inherit from the :code:`Query` and :code:`CompoundQuery` classes respectively. +As a result, to check if a Python object is a new-sytle API query object (either normal or compound), +the following piece of code can be used: .. code-block:: python - query = [ - ( - ".", - { - "name": "P?MPI_.*", - "PAPI_L2_TCM": "> 5" - } - ), - "*" - ] - query_obj = hatchet.query.ObjectQuery(query) - -StringQuery Class -^^^^^^^^^^^^^^^^^ - -The :code:`StringQuery` class is used to represent string-based dialect queries. To use it, first create -a string-based dialect query (as described in the :ref:`str_dialect` section). Then, a :code:`StringQuery` -object can be created from that string-based dialect query using either the :code:`StringQuery` constructor -or the :code:`parse_string_dialect` function. :code:`parse_string_dialect` is the recommended way of creating -a :code:`StringQuery` object because it allows users to write compound queries as strings (this functionality is not -yet documented). - -For example, the following code can be used to create a :code:`StringQuery` object from the query in the example from the :ref:`str_dialect` -section: - -.. code-block:: python - - query = """ - MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WTICH "PMPI_" AND - p."PAPI_L2_TCM" > 5 - """ - query_obj = hatchet.query.parse_string_dialect(query) - -.. _binary_compound_query: -ConjunctionQuery, DisjunctionQuery, and ExclusiveDisjunctionQuery Classes -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The :code:`ConjunctionQuery`, :code:`DisjunctionQuery`, and :code:`ExclusiveDisjunctionQuery` classes are used -to combine the results of two or more queries using set conjunction (i.e., logical AND), set disjunction (i.e., logical OR), -and exclusive set disjunction (i.e., logical XOR) respectively. All three of these classes work the same way. -To use these classes, simply pass the queries (either as new-sytle query objects or "raw" object- and string-base dialect queries) -whose results you want to combine into the class's constructor. - -For example, the following code can be used to create a :code:`DisjunctionQuery` object from two string-based dialect queries. -The first query looks for all subgraphs rooted at MPI nodes, and the second query looks for all subgraphs rooted at CUDA host functions -(i.e., functions starting with the :code:`cuda` prefix). So, the :code:`DisjunctionQuery` can be used to look at the -host-side internals of a MPI+CUDA program. - -.. code-block:: python - - query_mpi = """ - MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "MPI_" - """ - query_cuda_host = """ - MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "cuda" - """ - disjunction_query = hatchet.query.DisjunctionQuery(query_mpi, query_cuda_host) - - -NegationQuery Class -^^^^^^^^^^^^^^^^^^^ - -The :code:`NegationQuery` class is used to negate the results of another query. In other words, a :code:`NegationQuery` object -can be used to get all nodes in a Graph that were **not** matched by another query. To use this class, -simply pass the query you want to negate to the constructor of :code:`NegationQuery`. - -For example, the following code can be used to create a :code:`NegationQuery` object that removes the MPI layer -(i.e., subgraphs rooted at MPI functions) from a Graph. - -.. code-block:: python - - mpi_query = """ - MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "MPI_" - """ - no_mpi_query = hatchet.query.NegationQuery(mpi_query) - -.. _old_style_queries: -Old-Style Queries ------------------ - -The old-style query API consists of 2 main classes: - -- :code:`QueryMatcher`: represents the base syntax and the object-based dialect -- :code:`CypherQuery`: represents the string-based dialect - -After creating objects of these classes, queries can be combined using the following old-style compound -query classes: + issubclass(type(obj), hatchet.query.Query) or issubclass(type(obj), hatchet.query.CompoundQuery) -- :code:`AndQuery` (and its alias :code:`IntersectionQuery`): combines the results of each sub-query using logical AND -- :code:`OrQuery` (and its alias :code:`UnionQuery`): combines the results of each sub-query using logical OR -- :code:`XorQuery` (and its alias :code:`SymDifferenceQuery`): combines the results of each sub-query using logical XOR -- :code:`NotQuery`: negates the results of a single sub-query (i.e., logical NOT) +Since the :code:`GraphFrame.filter()` method works with either API, the :code:`is_hatchet_query` function +is provided to conveniently check if a Python object is any type of query language object, regardless +of API. .. _syntax_capabilities: Syntax and Dialect Capabilities From 4e653e9d2611fa6de3ece800c1b0635b8afda69c Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 2 Jun 2023 07:37:18 -0700 Subject: [PATCH 5/9] Typo fix --- docs/query_lang.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 4d224767..9aa49d51 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -111,7 +111,7 @@ Object-based Dialect The object-based dialect allows users to construct queries using built-in Python objects. In this dialect, a query is represented by a Python :code:`list` of query nodes. Each query node is represented by a Python -:code:`tuple` of a qunatifier and a predicate. Quantifiers are represented the same way as in the base syntax +:code:`tuple` of a quantifier and a predicate. Quantifiers are represented the same way as in the base syntax (see the :ref:`base_syntax` section for more information). Predicates are represented as key-value pairs where keys are metric names and values are Boolean expressions generated using the following rules: From 859a35055d490c91c02e318cabced172e7e857d4 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 2 Jun 2023 08:59:20 -0700 Subject: [PATCH 6/9] Finishes the query lang docs --- docs/query_lang.rst | 58 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 9aa49d51..9eebcf23 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -499,3 +499,61 @@ Syntax and Dialect Capabilities .. warning:: Section in-progress + +Along with different syntaxes, the base syntax, object-based dialect, and string-based dialect also +have different capabilities. In other words, there are things that each way of writing queries can and +cannot do. To help users understand these capabilities, the ways of writing queries +have been classified in terms of their properties and logical operators. + +Properties are classified into five categories, one for quantifiers and four for predicates. These +categories are: + +- *Quantifier Capabilities*: ability to match different numbers of nodes (i.e., match one, zero or more + one or more, or an exact number of nodes) +- *String Equivalence and Regex Matching Predicates*: ability to check if the value of a specified string + metric is equal to a provided string or matches a provided regular expression +- *String Containment Predicates*: ability to check if the value of a specified string metric starts + with, ends with, or contains a provided string +- *Basic Numeric Comparison Predicates*: ability to check if the value of the specified numeric metric + satisfies the numeric comparison (e.g., equal to, greater than, greater than or equal to) +- *Special Value Identification Predicates*: ability to check if the value of the specified metric + is equivalent to the provided "special value" (i.e., NaN, infinity, None, or "leaf") + +The table below shows whether each way of creating queries supports each property category. + ++----------------------------------------+-------------+-----------------------+----------------------+ +| Property Category | Base Syntax | Object-based Dialect | String-based Dialect | ++========================================+=============+=======================+======================+ +| Quantifier Capabilities | ✅ | ✅ | ✅ | ++----------------------------------------+-------------+-----------------------+----------------------+ +| String Equivalence and Regex | ✅ | ✅ | ✅ | +| Matching Predicates | | | | ++----------------------------------------+-------------+-----------------------+----------------------+ +| String Containment Predicates | ✅ | | ✅ | ++----------------------------------------+-------------+-----------------------+----------------------+ +| Basic Numeric Comparison Predicates | ✅ | ✅ | ✅ | ++----------------------------------------+-------------+-----------------------+----------------------+ +| Special Value Identification | ✅ | | ✅ | +| Predicates | | | | ++----------------------------------------+-------------+-----------------------+----------------------+ + +Logical operators are classified into three categories. These categories are: + +- *Predicate Combination through Conjunction*: ability to combine predicates using conjuntion (i.e., logical AND) +- *Predicate Combination through Disjunction and Complement*: ability to combine predicates using + disjunction (i.e., logical OR) or find the complement (i.e., logical NOT) to a single predicate +- *Predicate Combination through Other Operations*: ability to combine predicates through other + means, such as exclusive disjunction (i.e., logical XOR) + ++----------------------------------------+-------------+-----------------------+----------------------+ +| Logical Operator Category | Base Syntax | Object-based Dialect | String-based Dialect | ++========================================+=============+=======================+======================+ +| Predicate Combination through | ✅ | ✅ | ✅ | +| Conjunction | | | | ++----------------------------------------+-------------+-----------------------+----------------------+ +| Predicate Combination through | ✅ | | ✅ | +| Disjunction and Complement | | | | ++----------------------------------------+-------------+-----------------------+----------------------+ +| Predicate Combination through | ✅ | | | +| Other Operations | | | | ++----------------------------------------+-------------+-----------------------+----------------------+ From 4cf65f5d9544eebde4dfafd353ad0f12123c39ec Mon Sep 17 00:00:00 2001 From: ilumsden Date: Fri, 2 Jun 2023 11:51:38 -0700 Subject: [PATCH 7/9] New stuff --- docs/query_lang.rst | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 9eebcf23..eb60c0f4 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -9,6 +9,10 @@ Query Language .. versionadded:: 1.2.0 +.. note:: + + Point to QL Paper + One of the most important steps in identifying performance phenomena (e.g., bottlenecks) is data subselection and reduction. In previous versions of Hatchet, users could only reduce their data using filters on the pandas :code:`DataFrame` component of the GraphFrame object. However, this type of data reduction @@ -202,7 +206,7 @@ function calls that have more than 5 L2 cache misses (as measured by PAPI). query = """ MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WTICH "PMPI_" AND + WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WITH "PMPI_" AND p."PAPI_L2_TCM" > 5 """ @@ -557,3 +561,29 @@ Logical operators are classified into three categories. These categories are: | Predicate Combination through | ✅ | | | | Other Operations | | | | +----------------------------------------+-------------+-----------------------+----------------------+ + +Query Language Examples +======================= + +This section shows some examples of common queries that users may want to perform. +They can be used as a starting point for creating more complex queries. + +Find a Subgraph with a Specific Root +------------------------------------ + +TBA + +Find All Paths Ending with a Specific Node +------------------------------------------- + +TBA + +Find All Nodes for a Particular Software Library +----------------------------------------------- + +TBA + +Find All Paths through a Specific Node +--------------------------------------- + +TBA From 76a11836b337a77a8d6ba781f271300c053ec440 Mon Sep 17 00:00:00 2001 From: ilumsden Date: Wed, 28 Jun 2023 21:04:49 -0700 Subject: [PATCH 8/9] Current progress on QL docs --- docs/conf.py | 1 + docs/query_lang.rst | 67 +++++++++++++++++++++++++++++++++++++++++-- docs/requirements.txt | 3 ++ 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index aa4e1f35..a6e6778b 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -50,6 +50,7 @@ "sphinx.ext.viewcode", "sphinx.ext.githubpages", "sphinx.ext.napoleon", + "sphinx_tabs.tabs", ] # Add any paths that contain templates here, relative to this directory. diff --git a/docs/query_lang.rst b/docs/query_lang.rst index eb60c0f4..320173df 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -11,7 +11,8 @@ Query Language .. note:: - Point to QL Paper + For more information about the query language, including some case studies leveraging it, + check out `our paper `_ from eScience 2022. One of the most important steps in identifying performance phenomena (e.g., bottlenecks) is data subselection and reduction. In previous versions of Hatchet, users could only reduce their data using filters on the @@ -566,11 +567,73 @@ Query Language Examples ======================= This section shows some examples of common queries that users may want to perform. -They can be used as a starting point for creating more complex queries. +They can be used as a starting point for creating more complex queries. All of these +examples are designed to be applied to the following data presented in the :ref:`user_guide` page. + +.. table:: + :align: center + :widths: auto + + +-------------------------------------+-------------------------------+ + | .. image:: images/vis-terminal.png | .. image:: images/vis-dot.png | + | :scale: 50% | :scale: 35% | + | :align: left | :align: right | + +-------------------------------------+-------------------------------+ Find a Subgraph with a Specific Root ------------------------------------ +This example shows how to find a subgraph of a GraphFrame starting with a specific root. +More specifically, the queries in this example can be used to find the subgraph rooted at +nodes representing MPI or PMPI function calls. + +.. tabs:: + + .. tab:: Base Syntax + + .. code-block:: Python + + query = ( + Query() + .match( + ".", + lambda row: re.match( + "P?MPI_.*", + row["name"] + ) + is not None + and row["PAPI_L2_TCM"] > 5 + ) + .rel("*") + ) + + .. tab:: Object-based Dialect + + .. code-block:: python + + query = [ + ( + ".", + { + "name": "P?MPI_.*", + "PAPI_L2_TCM": "> 5" + } + ), + "*" + ] + + .. tab:: String-based Dialect + + .. code-block:: python + + query = """ + MATCH (".", p)->("*") + WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WITH "PMPI_" AND + p."PAPI_L2_TCM" > 5 + """ + +When applying one of these queries to the example data, the resulting GraphFrame looks like this: + TBA Find All Paths Ending with a Specific Node diff --git a/docs/requirements.txt b/docs/requirements.txt index e7a69101..d75da107 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -7,3 +7,6 @@ cython multiprocess textX caliper-reader +Sphinx +sphinx-rtd-theme +sphinx-tabs From 3cfa628fda62a1eefbb96acbf6f27f50af8d392b Mon Sep 17 00:00:00 2001 From: Ian Lumsden Date: Mon, 10 Jul 2023 10:48:16 -0700 Subject: [PATCH 9/9] Adds base syntax examples to QL docs --- docs/query_lang.rst | 127 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 110 insertions(+), 17 deletions(-) diff --git a/docs/query_lang.rst b/docs/query_lang.rst index 320173df..75822213 100644 --- a/docs/query_lang.rst +++ b/docs/query_lang.rst @@ -580,12 +580,13 @@ examples are designed to be applied to the following data presented in the :ref: | :align: left | :align: right | +-------------------------------------+-------------------------------+ +.. _subgraph_root_ex: Find a Subgraph with a Specific Root ------------------------------------ This example shows how to find a subgraph of a GraphFrame starting with a specific root. More specifically, the queries in this example can be used to find the subgraph rooted at -nodes representing MPI or PMPI function calls. +nodes with the name :code:`"corge"`. .. tabs:: @@ -595,15 +596,7 @@ nodes representing MPI or PMPI function calls. query = ( Query() - .match( - ".", - lambda row: re.match( - "P?MPI_.*", - row["name"] - ) - is not None - and row["PAPI_L2_TCM"] > 5 - ) + .match(".", row["name"] == "corge") .rel("*") ) @@ -615,8 +608,7 @@ nodes representing MPI or PMPI function calls. ( ".", { - "name": "P?MPI_.*", - "PAPI_L2_TCM": "> 5" + "name": "corge" } ), "*" @@ -628,8 +620,7 @@ nodes representing MPI or PMPI function calls. query = """ MATCH (".", p)->("*") - WHERE p."name" STARTS WITH "MPI_" OR p."name" STARTS WITH "PMPI_" AND - p."PAPI_L2_TCM" > 5 + WHERE p."name" = "corge" """ When applying one of these queries to the example data, the resulting GraphFrame looks like this: @@ -639,14 +630,116 @@ TBA Find All Paths Ending with a Specific Node ------------------------------------------- -TBA +This example shows how to find all paths of a GraphFrame ending with a specific node. +More specifically, the queries in this example can be used to find paths ending with +a node named :code:`"grault"`. + +.. tabs:: + + .. tab:: Base Syntax + + .. code-block:: python + + query = ( + Query() + .match("*") + .rel(".", lambda row: row["name"] == "grault") + ) + + .. tab:: Object-based Dialect + + TBA + + .. tab:: String-based Dialect + + TBA + +Find All Paths with Specific Starting and Ending Nodes +------------------------------------------------------ + +This example shows how to find all paths of a GraphFrame starting with and ending with specific nodes. +More specifically, the queries in this example can be used to find paths starting with a node named +:code:`"corge"` and ending with a node named :code:`"grault"`. + +.. tabs:: + + .. tab:: Base Syntax + + .. code-block:: python + + query = ( + Query() + .match(".", lambda row: row["name"] == "corge") + .rel("*") + .rel(".", lambda row: row["name"] == "grault") + ) + + .. tab:: Object-based Dialect + + TBA + + .. tab:: String-based Dialect + + TBA Find All Nodes for a Particular Software Library ----------------------------------------------- -TBA +This example shows how to find all paths of a GraphFrame representing a specific +software library. This example is simply a variant of finding a subgraph with a +given root (i.e., from :ref:`this section `). The example queries +below can be used to find the nodes for a subset of an MPI library. + +.. tabs:: + + .. tab:: Base Syntax + + .. code-block:: python + + api_entrypoints = [ + "MPI_Init", + "MPI_Finalize", + "MPI_Send", + "MPI_Recv", + ] + query = ( + Query() + .match(".", lambda row: row["name"] in api_entrypoints) + .rel("*") + ) + + .. tab:: Object-based Dialect + + TBA + + .. tab:: String-based Dialect + + TBA Find All Paths through a Specific Node --------------------------------------- -TBA +This example shows how to find all paths of a GraphFrame that pass through +a specific node. More specifically, the queries below can be used to find +all paths that pass through a node named :code:`"corge"`. + +.. tabs:: + + .. tab:: Base Syntax + + .. code-block:: python + + query = ( + Query() + .match("*") + .rel(".", lambda row: row["name"] == "corge") + .rel("*") + ) + + .. tab:: Object-based Dialect + + TBA + + .. tab:: String-based Dialect + + TBA \ No newline at end of file