Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL] Approaches First Draft #3378

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions exercises/practice/etl/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"introduction": {
"authors": ["BethanyG"],
"contributors": []
},
"approaches": [
{
"uuid": "0a408f3f-d1ea-4739-a900-dbb65ab34520",
"slug": "dict-keys-and-generator",
"title": "Iterate Over Dictionary Keys and Use a Generator to Lowercase Strings.",
"blurb": "Use dict.keys() to iterate, and a generator to lowercase values.",
"authors": ["BethanyG"]
},
{
"uuid": "a35a1496-b092-4634-a514-2c02d7c899c5",
"slug": "dict-keys-and-dict-methods",
"title": "Iterate Over Dictionary Keys and Use Dictionary Methods to Update.",
"blurb": "Used dict.keys() for iteration and dict.get() with dict.setdefault() to Update",
"authors": ["BethanyG"]
},
{
"uuid": "c8de8d53-154c-4f05-ba44-44d8fcd739aa",
"slug": "dict-items",
"title": "Iterate over Dictionary Items",
"blurb": "Use dict.items() for iteration.",
"authors": ["BethanyG"]
},
{
"uuid": "5105b287-5062-4404-81df-0afe865315da",
"slug": "dict-constructor-and-generator",
"title": "Dictionary Constructor with a Passed Generator Expression.",
"blurb": "Pass a generator expression to a dictionary constructor to create a new dictionary.",
"authors": ["BethanyG"]
},
{
"uuid": "59fad251-66ca-4f7d-a466-3bbd19260849",
"slug": "dictionary-comprehension",
"title": "Dictionary Comprehension",
"blurb": "Use a dictionary comprehension to process and transform data into new dictionary.",
"authors": ["BethanyG"]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Dictionary Constructor with Generator Expression

```python
def transform(legacy_data):
new_data = dict((letter.lower(), score)
for score, tiles in
legacy_data.items()
for letter in tiles)
return new_data
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
def transform(legacy_data):
new_data = dict((letter.lower(), score)
for score, tiles in
legacy_data.items()
for letter in tiles)
return new_data
11 changes: 11 additions & 0 deletions exercises/practice/etl/.approaches/dict-items/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Iterate over Dictionary Items

```python
def transform(input_dict):
new_data = {}

for key, value in input_dict.items():
for item in value:
new_data[item.lower()] = key
return new_data
```
7 changes: 7 additions & 0 deletions exercises/practice/etl/.approaches/dict-items/snippet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
def transform(input_dict):
new_data = {}

for key, value in input_dict.items():
for item in value:
new_data[item.lower()] = key
return new_data
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Iterate Over Dictionary Keys and Use Dictionary Methods


```python
def transform(input_data):
transformed = {}

for key in input_data:
for value in input_data.get(key):
transformed.setdefault(value.lower(), key)
return transformed
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
def transform(input_data):
transformed = {}

for key in input_data:
for value in input_data.get(key):
transformed.setdefault(value.lower(), key)
return transformed
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Iterate over Dictionary Keys and Process Values in a Generator

```python
def transform(input_dict):
result = {}

for key in input_dict:
values = (item.lower() for item in input_dict[key])
for value in values:
result[value] = key
return result
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
def transform(input_dict):
result = {}

for key in input_dict:
values = (item.lower() for item in input_dict[key])
for value in values:
result[value] = key
return result
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Dictionary Comprehension

```python
def transform(input_dict):
return {value.lower():key for key in
input_dict for
value in input_dict[key]}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
def transform(input_dict):
return {value.lower():key for key in
input_dict for
value in input_dict[key]}
167 changes: 167 additions & 0 deletions exercises/practice/etl/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Introduction

There are multiple Pythonic ways to solve the ETL exercise.
Among them are:

- Iterate over `dict.keys()` & lowercase all values in a `generator-expression` before inserting into the new dict.
- Iterate over `dict.keys()` & use `dict` methods `dict.get()` and `dict.setdefault()` to retrieve values and insert keys into new dict.
- Iterate over`dict.items()` and deal with lowercasing the values `list` in a nested loop.
- Use the `dict()` constructor with a `generator expression` to unpack and lowercase values in a nested loop.
- Use a dictionary comprehension



## General guidance

The goal of the ETL exercise is to:

* **E**xtract the data from the 'legacy' dictionary given as input. It has numeric **keys** with a `list`of uppercased strings as **values**.
* **T**ransform the data, by turning the `list` of **values** into individual lowercased **keys**, with the former **keys** used as **values**.
* **L**oad the data into a new dictionary and return it.


The challenge here is to deal efficiently with lowercasing the **values**, which are `lists` containing strings.
Unfortunately, there is no way to avoid an extra loop for lowercasing the string values, so all current approaches to this exercise have equivalent performance.

But there may be other considerations such as readability, or how to deal with duplicate data in **values** (_and whether that is necessary or not_) when selecting an approach.

Additionally, while the test data for this exercise does not contain any [unhashable][unhashable] values, if this code were to be used in a situation where the legacy values were of an unknown datatype, measures would need to be taken to test the values before attempting to create keys with them.


## Approach: Iterate over `dict.keys()` & Lowercase values in a generator or list comprehension.

```python
def transform(input_dict):
result = {}

for key in input_dict:
values = (item.lower() for item in input_dict[key])
for value in values:
result[value] = key
return result

##OR##

def transform(input_dict):
result = {}

for key in input_dict:
values = [item.lower() for item in input_dict[key]]
for value in values:
result[value] = key
return result
```


This approach iterates over `dict.keys()` , converting all the strings in the returned values `list` to lowercase via `generator expression` or `list comprehension`.
Once the values are converted to lowercase, they are iterated through in an inner loop.
Each value is then inserted into the new dictionary as a key, with the 'old' key (_from the outer loop_) used as the value.
For more details, see the [dictionary keys and generator][dict-keys-and-generator ] approach.


## Approach: Use Dictionary Methods `dict.get()` and `dict.setdefault()`

```python
def transform(input_data):
transformed = {}

for key in input_data:
for value in input_data.get(key):
transformed.setdefault(value.lower(), key)
return transformed
```


As with the approach described above, this iterates through the keys of `input_data`.
Each value `list` is looked up via `input_data.get(key)`, and the new dictionary (_transformed_) is updated via `dict.setdefault(value.lower(), key)`.
For details, read the [dictionary keys and dictionary methods][dict-keys-and-dict-methods] approach.


## Approach: Iterate over `dict.items()`

```python
def transform(input_dict):
new_data = {}

for key, value in input_dict.items():
for item in value:
new_data[item.lower()] = key
return new_data
```


This approach iterates over both keys and values via `dict.items()`.
The inner loop then iterates over the values `list`, transforming each string and inserting it into the `new_data` dictionary using _bracket notation_, with the lowercased string as key and the former key as the new value.
For more details, see the [dictionary items][dict-items] approach.


## Approach: Use a generator with the Dictionary Constructor

```python
def transform(legacy_data):
new_data = dict((letter.lower(), score)
for score, tiles in
legacy_data.items()
for letter in tiles)
return new_data
```


This approach encapsulates the loops described in prior approaches within a `generator expression`.
The generator includes a nested loop to iterate over the strings within the value `list`, lowercasing them.
The generator is then passed to the `dict()` constructor, which unpacks it and creates a new dictionary.
For more information, see the [dictionary constructor with generator][dict-constructor-and-generator] approach.


## Approach: Use a Dictionary Comprehension

```python
def transform(input_dict):
return {value.lower():key for key in
input_dict for
value in input_dict[key]}
```



This approach is very similar to the one above, but uses a `dictionary comprehension` format instead of a generator fed to a constructor.
For more details, see the [dictionary comprehension][dictionary-comprehension] approach.



## Other approaches

Besides these five idiomatic approaches, there are a multitude of possible variations using different string or dictionary methods or strategies for extracting and lowercasing the input dictionary values.

The strategy below employs `zip_longest` with `dict.items()` to re-package keys and values.

```python
from itertools import zip_longest

def transform(input_dict):

lowercased = (zip_longest([element.lower() for element in item],
key, fillvalue=key) for
key, item in input_dict.items())

return dict(lowercased)
```


But note that it still has the nested loop all of these solutions share -- as the values returned by `dict.items()` still needs to be unpacked and lowercased before anything can be added to the new dictionary.



## Which approach to use?

All of these approaches are roughly equivalent given that the values in the input dictionary are a list of strings that must be lowercased.
This demands that those values be looped through, making all strategies loop-within-loop.
Using generators or comprehensions might still give a slight performance boost, but they may also be harder to read or understand for others.


[dict-constructor-and-generator]: https://exercism.org/tracks/python/exercises/etl/approaches/dict-constructor-and-generator
[dict-items]: https://exercism.org/tracks/python/exercises/etl/approaches/dict-items
[dict-keys-and-dict-methods]: https://exercism.org/tracks/python/exercises/etl/approaches/dict-keys-and-dict-methods
[dict-keys-and-generator ]: https://exercism.org/tracks/python/exercises/etl/approaches/dict-keys-and-generator
[dictionary-comprehension]: https://exercism.org/tracks/python/exercises/etl/approaches/dictionary-comprehension
[unhashable]: https://docs.python.org/3/glossary.html#term-hashable