-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds Schema class and modifies schema handling #2081
base: pangea-v1alpha
Are you sure you want to change the base?
Changes from 7 commits
47474a9
f22246a
7670971
c37be67
aba6c10
83aacdf
7c08ca8
af02937
ef2b95d
cfa609c
1f00e76
39d4c1d
c4f5fd5
7006a31
561f05f
e3f57a6
3280e79
1837f81
94e9e6e
db6ef9e
0b0b85a
ba2795b
67c0182
3fa949c
7c77bfb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3680,7 +3680,7 @@ def insert_rows( | |
if selected_fields is not None: | ||
schema = selected_fields | ||
|
||
if len(schema) == 0: | ||
if not schema: | ||
raise ValueError( | ||
( | ||
"Could not determine schema for table '{}'. Call client.get_table() " | ||
|
@@ -4029,7 +4029,7 @@ def list_rows( | |
|
||
# No schema, but no selected_fields. Assume the developer wants all | ||
# columns, so get the table resource for them rather than failing. | ||
elif len(schema) == 0: | ||
elif not schema: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. Let's be careful about breaking changes regarding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also switched this back to the old check. |
||
table = self.get_table(table.reference, retry=retry, timeout=timeout) | ||
schema = table.schema | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -549,33 +549,29 @@ def _build_schema_resource(fields): | |||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
def _to_schema_fields(schema): | ||||||||||||||||||||||||
"""Coerce `schema` to a list of schema field instances. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
schema(Sequence[Union[ \ | ||||||||||||||||||||||||
:class:`~google.cloud.bigquery.schema.SchemaField`, \ | ||||||||||||||||||||||||
Mapping[str, Any] \ | ||||||||||||||||||||||||
]]): | ||||||||||||||||||||||||
Table schema to convert. If some items are passed as mappings, | ||||||||||||||||||||||||
their content must be compatible with | ||||||||||||||||||||||||
:meth:`~google.cloud.bigquery.schema.SchemaField.from_api_repr`. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
Sequence[:class:`~google.cloud.bigquery.schema.SchemaField`] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Raises: | ||||||||||||||||||||||||
Exception: If ``schema`` is not a sequence, or if any item in the | ||||||||||||||||||||||||
sequence is not a :class:`~google.cloud.bigquery.schema.SchemaField` | ||||||||||||||||||||||||
instance or a compatible mapping representation of the field. | ||||||||||||||||||||||||
"""TODO docstring | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Flagging with a comment just so we don't lose track of this TODO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||||||||||||
CAST a list of elements to either: | ||||||||||||||||||||||||
* a Schema object with SchemaFields and an attribute | ||||||||||||||||||||||||
* a list of SchemaFields but no attribute | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
for field in schema: | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need this check if we get a
Suggested change
Note: I just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, we could call |
||||||||||||||||||||||||
if not isinstance(field, (SchemaField, collections.abc.Mapping)): | ||||||||||||||||||||||||
raise ValueError( | ||||||||||||||||||||||||
"Schema items must either be fields or compatible " | ||||||||||||||||||||||||
"mapping representations." | ||||||||||||||||||||||||
) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
if isinstance(schema, Schema): | ||||||||||||||||||||||||
schema = Schema( | ||||||||||||||||||||||||
[ | ||||||||||||||||||||||||
field | ||||||||||||||||||||||||
if isinstance(field, SchemaField) | ||||||||||||||||||||||||
else SchemaField.from_api_repr(field) | ||||||||||||||||||||||||
for field in schema | ||||||||||||||||||||||||
], | ||||||||||||||||||||||||
foreign_type_info=schema.foreign_type_info, | ||||||||||||||||||||||||
) | ||||||||||||||||||||||||
return schema | ||||||||||||||||||||||||
Comment on lines
+553
to
+562
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we get a Returning the original object should be sufficient.
Suggested change
This also future-proofs us a bit. In case any additional properties are added to |
||||||||||||||||||||||||
return [ | ||||||||||||||||||||||||
field if isinstance(field, SchemaField) else SchemaField.from_api_repr(field) | ||||||||||||||||||||||||
for field in schema | ||||||||||||||||||||||||
|
@@ -796,8 +792,6 @@ def serde_info(self) -> Any: | |||||||||||||||||||||||
prop = _get_sub_prop(self._properties, ["serDeInfo"]) | ||||||||||||||||||||||||
if prop is not None: | ||||||||||||||||||||||||
prop = StorageDescriptor().from_api_repr(prop) | ||||||||||||||||||||||||
print(f"DINOSAUR prop: {prop}") | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lol, oops! 😆 Could be worth a separate cleanup PR to have something small and easy to approve. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed. |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
return prop | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@serde_info.setter | ||||||||||||||||||||||||
|
@@ -921,3 +915,83 @@ def from_api_repr(cls, resource: dict) -> SerDeInfo: | |||||||||||||||||||||||
config = cls("") | ||||||||||||||||||||||||
config._properties = copy.deepcopy(resource) | ||||||||||||||||||||||||
return config | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
class Schema: | ||||||||||||||||||||||||
# TODO docstrings and type hints | ||||||||||||||||||||||||
def __init__(self, fields=None, foreign_type_info=None): | ||||||||||||||||||||||||
tswast marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
self._properties = {} | ||||||||||||||||||||||||
self._fields = [] if fields is None else list(fields) # Internal List | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to convert the items of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolving. I missed that _fields was a property with a setter. I think your code does actually convert Iterable of SchemaField or dict to the correct API representation in _properties There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unresolving. I don't think
Suggested change
And remove the |
||||||||||||||||||||||||
self.foreign_type_info = foreign_type_info | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@property | ||||||||||||||||||||||||
def foreign_type_info(self) -> Any: | ||||||||||||||||||||||||
"""TODO: docstring""" | ||||||||||||||||||||||||
return self._properties.get("foreignTypeInfo") | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@foreign_type_info.setter | ||||||||||||||||||||||||
def foreign_type_info(self, value: str) -> None: | ||||||||||||||||||||||||
value = _isinstance_or_raise(value, str, none_allowed=True) | ||||||||||||||||||||||||
self._properties["foreignTypeInfo"] = value | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@property | ||||||||||||||||||||||||
def _fields(self) -> Any: | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why private? I guess because we're trying to emulate a list and want to support mutation that way? Would be worth an explanation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are correct, the idea behind having it be private was to try and emulate a list. |
||||||||||||||||||||||||
"""TODO: docstring""" | ||||||||||||||||||||||||
return self._properties.get("_fields") | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
@_fields.setter | ||||||||||||||||||||||||
def _fields(self, value: list) -> None: | ||||||||||||||||||||||||
value = _isinstance_or_raise(value, list, none_allowed=True) | ||||||||||||||||||||||||
self._properties["_fields"] = value | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __len__(self): | ||||||||||||||||||||||||
chalmerlowe marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
return len(self._fields) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __getitem__(self, index): | ||||||||||||||||||||||||
return self._fields[index] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __setitem__(self, index, value): | ||||||||||||||||||||||||
self._fields[index] = value | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __delitem__(self, index): | ||||||||||||||||||||||||
del self._fields[index] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __iter__(self): | ||||||||||||||||||||||||
return iter(self._fields) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __str__(self): | ||||||||||||||||||||||||
return f"Schema({self._fields}, {self.foreign_type_info})" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def __repr__(self): | ||||||||||||||||||||||||
return f"Schema({self._fields!r}, {self.foreign_type_info!r})" | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def append(self, item): | ||||||||||||||||||||||||
self._fields.append(item) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def extend(self, iterable): | ||||||||||||||||||||||||
self._fields.extend(iterable) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
def to_api_repr(self) -> dict: | ||||||||||||||||||||||||
"""Build an API representation of this object. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
Dict[str, Any]: | ||||||||||||||||||||||||
A dictionary in the format used by the BigQuery API. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
return copy.deepcopy(self._properties) | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||||||||
|
||||||||||||||||||||||||
@classmethod | ||||||||||||||||||||||||
def from_api_repr(cls, resource: dict) -> Schema: | ||||||||||||||||||||||||
"""Factory: constructs an instance of the class (cls) | ||||||||||||||||||||||||
given its API representation. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||
resource (Dict[str, Any]): | ||||||||||||||||||||||||
API representation of the object to be instantiated. | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||
An instance of the class initialized with data from 'resource'. | ||||||||||||||||||||||||
""" | ||||||||||||||||||||||||
config = cls("") | ||||||||||||||||||||||||
config._properties = copy.deepcopy(resource) | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. May be better to omit the deepcopy. |
||||||||||||||||||||||||
return config |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -70,6 +70,7 @@ | |||||
from google.cloud.bigquery.schema import _build_schema_resource | ||||||
from google.cloud.bigquery.schema import _parse_schema_resource | ||||||
from google.cloud.bigquery.schema import _to_schema_fields | ||||||
from google.cloud.bigquery.schema import Schema | ||||||
from google.cloud.bigquery.external_config import ExternalCatalogTableOptions | ||||||
|
||||||
if typing.TYPE_CHECKING: # pragma: NO COVER | ||||||
|
@@ -452,17 +453,20 @@ def schema(self): | |||||
instance or a compatible mapping representation of the field. | ||||||
""" | ||||||
prop = self._properties.get(self._PROPERTY_TO_API_FIELD["schema"]) | ||||||
if not prop: | ||||||
if not prop: # if empty Schema, empty list, None | ||||||
return [] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we return an "empty" Schema here for consistency?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the following influence your thoughts on this comment? Historically, we have returned an empty My general principal, focused on backwards compatibility is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could go either way on this one. One one hand, using On further consideration, I agree with your assessment that avoiding breaking existing users is preferable. +1 on empty list in this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a related concept. That complicates getting a What are your thoughts on the following approach? (I seem to recall you mention keeping objects in a separate place and ensuring that _properties is a view into the object)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My worry with Also, there are other cases to consider besides "someone gave us a Alternatively, we introduce separate properties for the pangea fields to |
||||||
else: | ||||||
return _parse_schema_resource(prop) | ||||||
elif isinstance(prop, Schema): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be concerning if this were ever true. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tswast Question for you. This response relates to setting the In general, I see the reasoning behind maintaining the If the user provides a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If so, I wouldn't put it in Ideally, the objects we provide would just be a "view" on top of the REST resource ( In some cases, we have to store state outside of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment still stands. We shouldn't have |
||||||
return prop | ||||||
return _parse_schema_resource(prop) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably we want to return a
Suggested change
Aside: Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where possible I -tried- to stay consist about what the user provides versus gets back: if someone handed the library a |
||||||
|
||||||
@schema.setter | ||||||
def schema(self, value): | ||||||
api_field = self._PROPERTY_TO_API_FIELD["schema"] | ||||||
|
||||||
if value is None: | ||||||
self._properties[api_field] = None | ||||||
elif isinstance(value, Schema): | ||||||
self._properties[api_field] = value | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
else: | ||||||
value = _to_schema_fields(value) | ||||||
self._properties[api_field] = {"fields": _build_schema_resource(value)} | ||||||
|
@@ -1359,7 +1363,8 @@ def _row_from_mapping(mapping, schema): | |||||
Raises: | ||||||
ValueError: If schema is empty. | ||||||
""" | ||||||
if len(schema) == 0: | ||||||
|
||||||
if not schema: | ||||||
raise ValueError(_TABLE_HAS_NO_SCHEMA) | ||||||
|
||||||
row = [] | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, could you make
len(schema_object)
work? I guess this is to allowNone
asschema
now, too? Iftable.schema
could indeed returnNone
now, that might be considered a breaking change, depending on how strict we want to be about types.Either way, folks might have to change their code to deal with missing schema, so something to consider. Maybe we return an object of
len(...) == 0
for now and have a warning that it will be changed toNone
in future?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike the idiom of schema-less tables, but it is indeed something we've seen people do intentionally in the wild. My recollection is that it was used historically as a signal for some workflow state (e.g. some kind of is-table-present check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that
Table.schema
can currently be set toNone
. It appears that we allow for that in the existing code:https://github.com/googleapis/python-bigquery/blob/d4d39acb8574f0d06d4e490b859e5fe6b57d0d9e/google/cloud/bigquery/table.py#L458C1-L466C84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched it back to the old check.