-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logger for efficient and convenient logging of objects #1343
Open
plypaul
wants to merge
3
commits into
p--query-resolution-perf--12_9
Choose a base branch
from
p--query-resolution-perf--13
base: p--query-resolution-perf--12_9
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
27 changes: 27 additions & 0 deletions
27
metricflow-semantics/metricflow_semantics/formatting/formatting_helpers.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
from __future__ import annotations | ||
|
||
import textwrap | ||
|
||
|
||
def mf_dedent(text: str) -> str: | ||
"""Remove leading newlines, dedents, and remove tailing newlines. | ||
|
||
This function simplifies the commonly used: | ||
|
||
text = textwrap.dedent( | ||
[triple quote][backslash] | ||
Line 0 | ||
Line 1 | ||
[triple quote] | ||
).rstrip() | ||
|
||
to: | ||
|
||
text = mf_dedent( | ||
[triple quote] | ||
Line 0 | ||
Line 1 | ||
[triple quote] | ||
) | ||
""" | ||
return textwrap.dedent(text.lstrip("\n")).rstrip("\n") |
65 changes: 65 additions & 0 deletions
65
metricflow-semantics/metricflow_semantics/mf_logging/logger_configuration.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
from __future__ import annotations | ||
|
||
import inspect | ||
import threading | ||
|
||
from metricflow_semantics.mf_logging.mf_logger import MetricFlowLogger, MetricFlowLoggerFactory | ||
from metricflow_semantics.mf_logging.pretty_printing_logger import PrettyPrintingLoggerFactory | ||
|
||
|
||
class MetricFlowLoggerConfiguration: | ||
"""Global object used to configure logger object creation in MF. | ||
|
||
The factory can be configured to use other implementations to realize different behaviors like passing JSON to the | ||
`extra` argument. By default, it pretty-formats the objects and appends them to the message. | ||
""" | ||
|
||
_logger: MetricFlowLoggerFactory = PrettyPrintingLoggerFactory() | ||
_state_lock = threading.Lock() | ||
|
||
@classmethod | ||
def set_logger_factory(cls, logger_factory: MetricFlowLoggerFactory) -> None: | ||
"""Set the factory to be used for creating logger objects in the `get_logger` method.""" | ||
with cls._state_lock: | ||
cls._logger = logger_factory | ||
|
||
@classmethod | ||
def get_logger(cls, module_name: str) -> MetricFlowLogger: | ||
"""Get the logger using the module where the call was made. | ||
|
||
This is a replacement for the standard `logging.getLogger` pattern. | ||
|
||
e.g. | ||
import logging | ||
logger = logging.getLogger(__name__) | ||
|
||
-> | ||
|
||
import ... | ||
logger = MetricFlowLoggerConfiguration.get_module_logger() | ||
""" | ||
with cls._state_lock: | ||
return cls._logger.get_logger(module_name) | ||
|
||
|
||
def mf_get_logger() -> MetricFlowLogger: | ||
"""Get the logger using the module where the call was made. | ||
|
||
This is a replacement for the standard `logging.getLogger` pattern. | ||
|
||
e.g. | ||
import logging | ||
logger = logging.getLogger(__name__) | ||
|
||
-> | ||
|
||
import ... | ||
logger = mf_get_logger() | ||
""" | ||
caller_frame = inspect.stack()[1] | ||
caller_module = inspect.getmodule(caller_frame[0]) | ||
|
||
if caller_module is None: | ||
raise RuntimeError(f"Unable to get module for {caller_frame[0]=}") | ||
|
||
return MetricFlowLoggerConfiguration.get_logger(caller_module.__name__) |
103 changes: 103 additions & 0 deletions
103
metricflow-semantics/metricflow_semantics/mf_logging/mf_logger.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
from abc import ABC, abstractmethod | ||
|
||
|
||
class MetricFlowLogger(ABC): | ||
"""Logger backed by the Python standard-library logger to conveniently and efficiently log objects. | ||
|
||
In MF, there is a need to log complex objects like DAGs and paths for debugging. Previously, the pattern for logging | ||
those objects used a pretty-formatting method like: | ||
|
||
logger = logging.getLogger(__name__) | ||
... | ||
logger.debug(mf_pformat_many("Found matching path.", {"matching_path": matching_path, "source_nodes": source_nodes})) | ||
|
||
-> | ||
|
||
Found matching path. | ||
matching_path: [1, 2, 3] | ||
source_nodes: ["a", "b", "c"] | ||
|
||
However, there were a few issues: | ||
|
||
* The `mf_pformat_many()` call was evaluated regardless of whether the given logging level was enabled. The | ||
formatting of complex objects was found to be significant in performance profiling. | ||
* Since the message is the only object passed to the logger, it was not reasonable to convert the objects to a | ||
different form (e.g. JSON) for different backends. | ||
* Logging calls were verbose, and string keys like "matching_path" were not amenable to auto-complete in the IDE. | ||
|
||
The class is used in a new approach that uses a custom logging class that takes keyword arguments that are only | ||
converted to strings when the corresponding logging level is enabled. The logger is created via a configurable | ||
factory that can be used to implement different ways of logging those objects (e.g. JSON). | ||
|
||
The new pattern addresses the above issues and looks like: | ||
|
||
logger = mf_get_logger() | ||
... | ||
logger.debug("Found matching path.", matching_path=matching_path, source_nodes=source_nodes) | ||
|
||
A signature for the logging calls that allows non-named arguments and uses the variable names as keys in the output | ||
was considered for reduced verbosity: | ||
|
||
logger.debug("Found matching path.", matching_path, source_nodes) | ||
|
||
This approach was prototyped using the `varname` library to resolve the variable name, but it was found to be too | ||
slow (~0.7s for 1000 calls). | ||
""" | ||
|
||
@abstractmethod | ||
def log_implementation(self, level_int: int, message: str, stack_level: int, **kwargs) -> None: # type: ignore[no-untyped-def] | ||
"""Appropriately log the message and the appropriate representation of `kwargs`. | ||
|
||
An example of the representation of `kwargs` might be the string form of the objects on separate lines. | ||
The `stack_level` argument should be passed to the Python-standard-library log-call so that the correct file | ||
and line number are used for the source of the log line, not this library. | ||
""" | ||
raise NotImplementedError | ||
|
||
def _safely_log(self, level_int: int, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] | ||
"""Calls `.log_implementation` with a try-except so that bugs do not cause a fatal exit.""" | ||
if not self.standard_library_logger.isEnabledFor(level_int): | ||
return | ||
try: | ||
self.log_implementation(level_int, message, stack_level=4, **kwargs) | ||
except Exception: | ||
logging.exception( | ||
f"Got an exception while using logger instance {self}. " | ||
f"Since there might be an issue with the logger instance, this message is logged using the root logger." | ||
) | ||
|
||
def debug(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.DEBUG, message, **kwargs) | ||
|
||
def info(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.INFO, message, **kwargs) | ||
|
||
def warning(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.WARNING, message, **kwargs) | ||
|
||
def error(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.ERROR, message, **kwargs) | ||
|
||
def critical(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.CRITICAL, message, **kwargs) | ||
|
||
def fatal(self, message: str, **kwargs) -> None: # type: ignore[no-untyped-def] # noqa: D102 | ||
self._safely_log(logging.FATAL, message, **kwargs) | ||
|
||
@property | ||
@abstractmethod | ||
def standard_library_logger(self) -> logging.Logger: | ||
"""Return the Python-standard-library logger that this uses.""" | ||
raise NotImplementedError | ||
|
||
|
||
class MetricFlowLoggerFactory(ABC): | ||
"""Interface for creating logger instances.""" | ||
|
||
@abstractmethod | ||
def get_logger(self, module_name: str) -> MetricFlowLogger: | ||
"""Create a logger for the given module. Similar to the standard `logging.getLogger(__name__)`.""" | ||
raise NotImplementedError |
42 changes: 42 additions & 0 deletions
42
metricflow-semantics/metricflow_semantics/mf_logging/pretty_printing_logger.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
|
||
from typing_extensions import override | ||
|
||
from metricflow_semantics.mf_logging.mf_logger import MetricFlowLogger, MetricFlowLoggerFactory | ||
from metricflow_semantics.mf_logging.pretty_print import mf_pformat_many | ||
|
||
|
||
class PrettyPrintingLogger(MetricFlowLogger): | ||
"""Logger that pretty-prints the objects passed as keyword arguments. | ||
|
||
e.g. | ||
logger.debug("Found matching path.", matching_path=matching_path, source_nodes=source_nodes) | ||
|
||
-> | ||
|
||
Found matching path. | ||
matching_path: [1, 2, 3] | ||
source_nodes: ["a", "b", "c"] | ||
""" | ||
|
||
def __init__(self, python_logger: logging.Logger) -> None: # noqa: D107 | ||
self._standard_library_logger = python_logger | ||
|
||
@override | ||
def log_implementation(self, level_int: int, message: str, stack_level: int, **kwargs) -> None: # type: ignore[no-untyped-def] | ||
self._standard_library_logger.log(level_int, mf_pformat_many(message, kwargs), stacklevel=stack_level) | ||
|
||
@property | ||
@override | ||
def standard_library_logger(self) -> logging.Logger: | ||
return self._standard_library_logger | ||
|
||
|
||
class PrettyPrintingLoggerFactory(MetricFlowLoggerFactory): | ||
"""Factory for creating `PrettyPrintingLogger`.""" | ||
|
||
@override | ||
def get_logger(self, module_name: str) -> MetricFlowLogger: | ||
return PrettyPrintingLogger(logging.getLogger(module_name)) |
55 changes: 55 additions & 0 deletions
55
metricflow-semantics/metricflow_semantics/test_helpers/logging_helpers.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
from contextlib import contextmanager | ||
from typing import Iterator, List, Optional, Sequence, Tuple | ||
|
||
from typing_extensions import override | ||
|
||
from metricflow_semantics.mf_logging.logger_configuration import mf_get_logger | ||
from metricflow_semantics.mf_logging.mf_logger import MetricFlowLogger | ||
|
||
|
||
class RecordingLogHandler(logging.Handler): | ||
"""A log-record handler that stores them so that they can be checked in tests.""" | ||
|
||
def __init__(self) -> None: # noqa: D107 | ||
super().__init__() | ||
self._log_records: List[logging.LogRecord] = [] | ||
|
||
@override | ||
def emit(self, record: logging.LogRecord) -> None: | ||
self._log_records.append(record) | ||
|
||
@property | ||
def log_records(self) -> Sequence[logging.LogRecord]: | ||
"""Return the log records seen by the handler.""" | ||
return self._log_records | ||
|
||
def get_last_message(self) -> Optional[str]: | ||
"""Return the message in the last log record, or None if this hasn't seen any.""" | ||
if len(self._log_records) == 0: | ||
return None | ||
|
||
return self._log_records[-1].message | ||
|
||
|
||
@contextmanager | ||
def recorded_logging_context(logging_level_int: int) -> Iterator[Tuple[MetricFlowLogger, RecordingLogHandler]]: | ||
"""Context with a logger (with the given log level) and associated handler to check what was logged. | ||
|
||
The handler records all log records emitted that is appropriate for the given level during this context. | ||
Log propagation could be disabled in this context to clean test log output, but some issues need to be resolved. | ||
""" | ||
mf_logger = mf_get_logger() | ||
standard_logger = mf_logger.standard_library_logger | ||
|
||
previous_logging_level = standard_logger.level | ||
handler = RecordingLogHandler() | ||
standard_logger.addHandler(handler) | ||
try: | ||
standard_logger.setLevel(logging_level_int) | ||
yield mf_logger, handler | ||
finally: | ||
standard_logger.removeHandler(handler) | ||
standard_logger.setLevel(previous_logging_level) |
Empty file.
76 changes: 76 additions & 0 deletions
76
metricflow-semantics/tests_metricflow_semantics/mf_logging/test_mf_logger.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
from pathlib import Path | ||
|
||
import pytest | ||
from metricflow_semantics.formatting.formatting_helpers import mf_dedent | ||
from metricflow_semantics.mf_logging.mf_logger import MetricFlowLogger | ||
from metricflow_semantics.test_helpers.logging_helpers import RecordingLogHandler, recorded_logging_context | ||
|
||
|
||
def test_log_file_name_and_line_number() -> None: | ||
"""Tests that the file and the line number associated with the log call points to the source. | ||
|
||
The source is the file / line where the `.info(...)` and similar call was made, not within the MF logging library. | ||
""" | ||
with recorded_logging_context(logging.INFO) as (logger, handler): | ||
log_line_number = 19 | ||
logger.info(f"Example message from line {log_line_number} in {__file__}") | ||
assert len(handler.log_records) == 1, f"There should have been 1 record but got: {handler.log_records}" | ||
|
||
record = handler.log_records[0] | ||
assert record.filename == Path(__file__).name | ||
assert ( | ||
record.lineno == log_line_number | ||
), f"The log line should have been {log_line_number} but got {record.lineno}" | ||
|
||
|
||
@pytest.mark.skip(reason="Running this prints error messages in the test results.") | ||
def test_log_levels() -> None: | ||
"""Test that the logger works for different logging levels.""" | ||
for level_int in (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL, logging.FATAL): | ||
with recorded_logging_context(level_int) as (logger, handler): | ||
# Log a message at the appropriate level | ||
message = f"Example message at level {level_int} in {__file__}" | ||
if level_int == logging.DEBUG: | ||
logger.debug(message) | ||
elif level_int == logging.INFO: | ||
logger.info(message) | ||
elif level_int == logging.WARNING: | ||
logger.warning(message) | ||
elif level_int == logging.ERROR: | ||
logger.error(message) | ||
elif level_int == logging.CRITICAL: | ||
logger.critical(message) | ||
elif level_int == logging.FATAL: | ||
logger.fatal(message) | ||
else: | ||
raise ValueError(f"Logging level not handled {level_int=}") | ||
|
||
# Check that the log record is correct. | ||
assert ( | ||
len(handler.log_records) == 1 | ||
), f"There should have been 1 record for {level_int=} but got: {handler.log_records}" | ||
|
||
record = handler.log_records[0] | ||
assert record.levelno == level_int, f"Expected a record with {level_int=} but got {record.levelno}" | ||
assert record.message == message | ||
assert record.levelno == level_int | ||
|
||
|
||
def test_log_kwargs() -> None: | ||
"""Test that objects included via keyword args are formatted.""" | ||
logger: MetricFlowLogger | ||
handler: RecordingLogHandler | ||
with recorded_logging_context(logging.DEBUG) as (logger, handler): | ||
logger.debug("Found candidates.", matches=[1, 2, 3], parameters={"field_0": "value_0", "field_1": "value_1"}) | ||
assert handler.get_last_message() == mf_dedent( | ||
""" | ||
Found candidates. | ||
matches: | ||
[1, 2, 3] | ||
parameters: | ||
{'field_0': 'value_0', 'field_1': 'value_1'} | ||
""" | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my concern. This whole file.
What we're doing is formatting a log message based on some input parameters. That's literally it. But to do it, we have a state lock, and a factory class, and a support class shim, and a custom accessor function, and a stack inspector.
State management, stack-based accessors, and lazy evaluation of messages and message arguments are all things the built in logger class handles - a LogRecord Formatter can access the message input and the format args and operate on them as needed, etc.
Is there a problem with using a separate LogRecord Formatter to solve the issue you have with portioning things into extras? If you need different formatting from what MetricFlow is doing internally here you'll need a different formatter configuration regardless, and all of these wrappers don't make a lick of difference in terms of functionality there. Make our LazyPrinter or whatever have an API that a custom Formatter object can use to do what needs doing.
Also, the
extra
parameter is just a kwarg dict that gets merged into the LogRecord attributes before formatting or routing. You can accomplish the same thing in formatting by passing in a single dict-style arg and then using it in a custom Formatter.Basically you have a config for the MetricFlow logger objects and you set up a custom Formatter configuration that does the necessary format operations. If we're passing in objects and args instead of a pre-formatted string we can do what we like under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was just about lazy evaluation, this is the standard approach. If you take a look at the server implementation, the log calls use the
extra
field and use a separate formatter. In the PR description, I noted as one of the issues was:I got tired of having to write these dictionaries / format strings, especially when auto-complete does not help. As per PR title, one of the 3 goals of this PR was to make these calls more convenient.
Each of those can be removed to make things either less convenient or less configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. Remove the stack accessor and change the call style from:
to