Replies: 3 comments 3 replies
-
Some thoughts I had on traits of a useful sample data:
|
Beta Was this translation helpful? Give feedback.
-
This kind of the reason I didn't tread into suggesting a JSON file 😏 not sure what was required. Anyhow, I think it'd be good to have crafted files with each data type in them (int, str, date etc. bool?). Json for example would be a good example here where it can have ints, strs, bools and null, so the dataset should include rows which have them in. |
Beta Was this translation helpful? Give feedback.
-
Sorry I'm getting OT here. I wonder if people use vd for cleaning data. I've tried to create simple databases of data or make small changes to existing data, and many times I end up upset. Mostly due to my own fault of typing the wrong key and losing my work, but also vd could have been more helpful. I only learned about If I had cleaning needs vs exploring/reshaping I would more seriously think about trying https://github.com/OpenRefine/OpenRefine. I don't really need to clean data in that way. OR seems to have many tools with the intention of cleaning data. It seems like it would be useful to add some of these features to VisiData or as a cleaning plugin. OR seems to have a more robust history mechanism, with the history automatically saved to a file. Partly responding here to see if #703 could be revived as a discussion thread. I think that information would also be useful for the main question of this discussion. I don't feel like I know how people regularly share their data with vd. It would be nice to get input from more folks, and it was great reading about how both Saul and AJ share data outside of vd. It looks like the wonderful #595 was converted to a discussion. |
Beta Was this translation helpful? Give feedback.
-
The current
sample_data
directory is a total pile. It's a collection of small datasets in various formats, each of which was whatever I found laying around when I was developing the particular loader, under the notion that any test data is better than no data. They've been working okay I guess, but lately I've been thinking about reexamining our whole sample data strategy, which would likely lead to an overhaul ofsample_data
.So the aim of this discussion is to explore desirable characteristics of test datasets, and strategies for how to accumulate/generate/curate them. In particular I would love to hear your direct experiences with sample data, positive and negative (VisiData's or otherwise).
Some opening questions:
Beta Was this translation helpful? Give feedback.
All reactions