go-cardinality is a Go library that calculates the cardinality and distinct count of values in a dataset, providing efficient and accurate estimations.
- Retrieve all unique values of a specific field in a dataset, which is useful for creating enums or generating dimension tables.
- Analyze the distribution of values within a particular field in a dataset to gain insights into the most frequently occurring values.
fields := naive.DistinctCount(Movie{}, movies, "Year", "Genres")
genres, err := fields.GetField("Genres")
genres.PrettyPrint()
Comedy = 350
Drama = 338
Thriller = 194
Horror = 162
Action = 162
Romance = 117
...
Check examples here : Examples
- Naive approach using a map data structure
- Naive approach with concurrent processing for improved performance
- Naive approach with RxGo
- API for calculating cardinality in a list of objects
- Use HyperLogLog algorithm for accurate estimation of distinct values
- Type
int
- Type
string
- Type
[]string
- Type
[]int
- Run unit tests
make test
- Run benchmark tests
make bench
- Run coverage
make coverage
- Usage of reflect library : even if greatly discouraged, we needed to use it (moderately) to build generic methods based on struct fields and types (schema).
- Cardinality is a mathematical term. It translates into the number of elements in a set. In databases, cardinality refers to the relationships between the data in two database tables. Cardinality defines how many instances of one entity are related to instances of another entity.
- Distribution refers to the way data values are spread or organized within a dataset. It describes the frequency or occurrence of different values or groups of values in a specific attribute or field.
- HyperLogLog algorithm: is a probabilistic data structure that can be used to approximate the number of distinct elements in a data set.
- Cardinality aggregation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
- https://towardsdatascience.com/count-distinct-metrics-at-scale-95a394c03f1
- https://pkg.go.dev/github.com/datadog/hyperloglog
- https://vertabelo.com/blog/cardinality-in-data-modeling/
- https://go.dev/blog/laws-of-reflection