Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why yet another database? #91

Open
dreua opened this issue Feb 21, 2024 · 6 comments
Open

Why yet another database? #91

dreua opened this issue Feb 21, 2024 · 6 comments

Comments

@dreua
Copy link

dreua commented Feb 21, 2024

Couldn't this data be added to an already existing database (OpenStreetMap, Wikidata) and be scraped and aggregated from there if the use-case needs it?
On the other hand your approach is probably easier if you want to get started fast and it seems to work well for instance in endoflife.date, so whatever. Maybe someone will write tools to synchronize data between those databases, maybe someone wont, we'll see 🤔

Some examples of already present data

37% of the world's route=train in OSM already have a "colour" tag according to taginfo
Example in Wikidata with icon and "sRGB color hex triplet": https://www.wikidata.org/wiki/Q96388302
Now on OSM the same line has a different color set unfortunately: https://www.openstreetmap.org/relation/2552686

@vainamov
Copy link
Member

I think you've captured most of the rationale behind creating a custom database in your question already.

Most data sources list a single color value, which is sufficient for distinguishable presentation, but not as close to the actual signage for example used in the real world (thus not as recognizable). This database stores three colors and the shape, which enable apps to display the line icons more accurately. (Most of the colors in https://taginfo.openstreetmap.org/keys/colour#values e.g. are simply names like "brown" or "orange", whereas we would like the values to be more precise)

Another issue are the discrepancies across different sources. This is a problem even found in official materials (sometimes these are vastly different due to e.g. separate maps from the operator of a line and the network it belongs to). We can try to collect the colors for lines within a network from the same source to achieve a coherent look, but when scraping from the different public sources, there is plainly no guarantee for that.

And lastly, a custom mapping allows us to easily prepare the data to be used in conjunction with the trip data from Traewelling.

@jheubuch
Copy link
Collaborator

Closing because of missing response

@jheubuch jheubuch closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024
@dreua
Copy link
Author

dreua commented Mar 23, 2024

Do you have all the data you need? Using your DB format and its identifiers I am unable to see whether there is data missing in my area which I'd be able to provide. Furthermore we could consider adding links to the openstreetmap wiki page so that people looking for this data could find you database.

@MrKrisKrisu
Copy link
Member

MrKrisKrisu commented Sep 2, 2024

I would like to reopen the topic, as in my opinion some problems have arisen in the past that the current CSV cannot map well, such as the sometimes very complex shapes.

Wikidata might actually be a good source. Colours are already stored there for many traffic lines. Corresponding icons are also available as SVG, which would make generating your own obsolete and solve the problem with the complex shapes.

For the link with e.g. HAFAS Trips, a property might have to be applied for at Wikidata.

I have created an automatic list of transport lines in Germany, which shows the current status. Maybe you can already start with the data maintenance there:
https://www.wikidata.org/wiki/User:Mkkagain/Verkehrslinien_in_Deutschland

What is your opinion on this?

(Especially @jheubuch, @vainamov, @marhei as you are using the colours in your apps)

Screenshot 2024-09-02 at 12 26 12

@MrKrisKrisu MrKrisKrisu reopened this Sep 2, 2024
@MrKrisKrisu MrKrisKrisu pinned this issue Sep 2, 2024
@jheubuch
Copy link
Collaborator

jheubuch commented Sep 4, 2024

I generally would be fine by that, but anyhow (from a Träwelldroid perspective) I wouldn't like to query Wikidata dataset for every check-in or for every connection on the departure board. The most convenient way would be to anyhow implement that into that Träwelling API that:

  • all clients can display the same icon
  • implementing is easy for all clients

@vainamov
Copy link
Member

vainamov commented Sep 7, 2024

I'm with @MrKrisKrisu, that SVG icons would improve rendering across all platforms and are probably the only reasonable solution for complex shapes or gradients, but I imagine the hurdle to collect these is much higher than what we currently have. I also agree with @jheubuch that as long as the data is well formatted and readily available, I don't mind where it's coming from. In the current state, I prefer the CSV because it's easily fetch- and cacheable.

I think it comes down to this:

  1. We are lacking a solution to represent complex shapes in a CSV-compatible way.
    > Adding SVGs for these cases to our current database seems like the easiest solution and could be an incentive for apps to start supporting SVGs.
  2. There is currently no mapping between HAFAS line data and the corresponding Wikidata entities.
    > I can't estimate the amount of work it takes, to store the data we need at Wikidata, so maybe we could prepare a transition to Wikidata by adding a new column to the CSV and map the data we have with their entity URIs (Q...).
  3. Wikidata can store SVGs but does not nearly have enough data for the lines we currently cover.
    > Once possible, moving our data to Wikidata should be easy thanks to our mapping.
  4. Until there are enough SVGs, applications would still have to fallback to the other system.
    > Individual SVGs are probably overkill for most of the lines. A service that would generate these on the fly based on the basic color/shape values we already have would solve this. (Templating this should be easy work).
  5. We have a general issue identifying some conflicting lines.
    > This is where I still see the value in this repository. We could e.g. add station ids of a line's start and end to further reduce the numer of conflicting lines. Maybe in the end, this database would only be consumed by Träwelling where in turn the API could use it as a mapping to provide the Wikidata entity ID with each line it returns to the consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants