forked from nothingmuch/kiokudb
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathTODO
270 lines (236 loc) · 11.4 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
=== Small ===
* move live object leak reporting from Catalyst::Model into mainline
* scope directory handles KiokuDB::API
* scoped_txn method (txn_do( scope => 1, body => ... )
* test for data migration
* two directories, make sure whole graphs and subgraphs can be moved between
* especially KiokuDB::Set and other directory aware objects
* Attribute traits
* KiokuDB::ID (kiokudb_object_id like, but not stored in 'data')
* KiokuDB::Backend/KiokuDB::Directory (for directory aware objects)
* Doc work
* Doc audit
* make a doc todo list
* especially for backend authors and KiokuDB::Role::*
* More command line tool docs
* backup, recovery, etc
* FAQ and index
* where to find information, for instance to do proper backup
* procedures for the BDB backend you need to read BerkeleyDB::Manager
* Start writing a cookbook
* more roles for KiokuX::User
* non ID based
* allow username to != object ID, so that users can be renamed
* email address handling
* role based access control
* "real" runtime roles, or a list of RBAC role names
* integrate with Catalyst::Plugin::Authorization::Roles
* high level password change/reset role using auth token objects
* inserts an object pointing to the user, that can reset the password for the user
* the ID of the token is secret, and can be emailed to the user
* an action that then loads the object, resets the password, and
then deletes the auth token object
* symbolic aliases
* KiokuDB::Entry::Alias or somesuch
* used to set additional high level IDs for objects, after they are
already in the database
* Linker api to handle these on load
* on insert they are just special objects
* simple api in KiokuDB.pm to manage aliases (create, delete, etc).
* configuration file support
* need to be able to specify:
* regular options
* typemaps
* extra libs to load
* shouldn't be hard with MooseX::YAML
* KiokuDB->connect("/path/to/my_db.yml");
* KiokuDB->connect("/path/to/dir"); # reads /path/to/dir/kiokudb.yml
* subdirectories for standard directory layout:
* data/ - actual storage
* lib/ - possibly used for typemap defs, etc
* RPC backend
* simple proxying of backend methods and a KiokuDB::Backend::Remote client
* try for a nonblocking api, be able to plug in to a standalone blocking
daemon, a larger setup, or into anyevent/POE without horrible performance
* we'll need a preforking daemon for the backend performance
(Concurrency::POSIX can make this automatic), but ideally we can have a
nonblocking api role for most backends (see event based api below)
* see also RPC server below
* Cache::Bounded for immutable objects
* simple hack to keep them live longer than normal
* simple profiling hooks
* under some debug mode collect timing info for linker & collapser
* display this info in Catalyst::Model::KiokuDB's debug output
=== Medium ===
* root set membership delegated to typemap
* will allow Root and NonRoot roles in MOP typemap entry
* top level values for ->store and ->insert still set default
* ->set_root always authoritative
* Schema versioning
* store version in entries
* add an upgrade hook to typemap entries
* if $Class::VERSION ne $entry->class_version then the upgrade hook is fired
* the upgrade hook is responsible for creating a new entry
* convenience api:
* hash of callbacks:
{
$version => $per_version_hook
$version => $aliased_to_version, # treat objects
# with version $version like they are objects
# of version $aliased_to_version
},
* if a version is missing that is an error
* add a schema upgrade command line tool
* scans the DB for entries with versions that != current, and loads and
then updates those objects
* $dir->refresh($object), $dir->deep_refresh($object)
* partially done
* allow typemaps to disable or completely take over refreshing (i.e. DBIC)
* provide a check that compares, if entry data != live entry data, skip
* add deep refresh
* entry todos:
* MOP entry needs a MOP API to clear objects, not yet in CMOP
* Set
* if shallow downgrade to Deferred and reload IDs assign new member IDs if shallow, downgrade to Deferred
* if deep upgrade to Loaded and refresh members
* KiokuDB::Server
* server side scope tracking
* the client informs does scope open/close
* the server automatically follows all references for the client, to
reduce latency
* (no need to actually inflate)
* server side filtering
* server side transactions
* fix the set_root work on immutable/CAS objects bug (the 'root' flag is not
written because the object is skipped)
* caching support
* authoritative cache support
* entry caching for slower backends
* non authoritative cache
* various degrees of correctness (simpledb like "gurantees" ;-)
* data sharding
* map classes of UIDs to different backends or different tables in a
backend
* string based i guess, though we could also decorate the entry object with
metadata about where it should go in addition
* uses:
* transient data storage (e.g. web sessions)
* grouping of data for scanning purposes
* grouping of data for configuration purposes (e.g. different search
columns in DBI, content id objects in CAS storage)
* actual backend work is more involved but doing the high level is pretty easy
* XS acceleration
* LiveObjects
* implement custom uvar magic hash instead of Scope::Guard all over
* should provide significant speed & memory consumption
improvements
* Linker
* inflate_data is actually really ugly in Perl, could be
smaller/faster/cleaner in XS due to lack of of code duplication
* Data::Visitor
* generic acceleration for Data::Visitor, ask nothingmuch for details
* affects:
* Collapser
* $entry->references
* jspon? not anymore but could convert back
* Set::Object and hash key sharding
* each hash entry or set member is an entry/row in a table (BDB and DBI)
hash).
* This allows finer grained commits (e.g. insertion to a set from two
competing transactions does not cause a failed transaction) under
MULTIVERSION for BDB.
* this also allows to run queries that test for set membership (but we
still can't write those queries)
* sharding thsould be implemented as a base role that backends can
implement
* CodeRefs serialization
* if the subname of the CV is valid and it has a ->FILE then maybe store as
a symbolic ref instead (requiring the file to load it)?
* Attribute meta traits
* each of these is relatively small and self contained
* lazy build attributes (for cached values)
* build on store
* keep when storing but dont build on store
* make sure update() on immutable objects works for this, too
* do not serialize
* ID attribute
* don't store ID twice, once as a field and once in the entry (skip the
field)
* better than KiokuDB::Role::ID
* Digest Part
* resistent to subclassing/role composition
* however, order must be stable, sort get_all_attributes
* order by sort index is provided
* rest of attributes sorted:
* required attributes before non required ones
* sub sort by name
* required attributes first, ordered by sort index or name if no sort index is provided
* Build on store meta trait
=== Large ===
* RDF backend
* generate triples
* predicates as FQ attr names names
* predicates as short attr names
* predicates as UUIDs?
* disable simple ref collapsing by default?
* SPARQL matching for simple search
* event based api
* linker is almost ready to integrate event based linking
* if backend returns a cond var for get() then we can return a cond var for
the whole graph. start with an api for it, and slowly implement actual
async behavior using a backend role
* AnyEvent::BDB, Files and CouchDB backends could benefit
* skeptical about performance of DBI with forking
* does the live object scope still make sense? probably, but it's much
easier to leak it. the event oriented wrapper should keep live object
scopes for the user at least for the duration of a callback, in
additional to the user tracking to minimize confusion. $lookup->recv
could return the scope into which the objects were loaded, along with the
results
* threading
* what happens with a shared KiokuDB directory? i don't think that's a good
idea... better that each thread has its own copy? how can we guarantee
recursive thread sharing of passthrough/callback objects?
* investigate by writing tests and then fixing as appropriate
* persistent metaclass
* store Moose class definition in code by creating metaclasses which when
loaded redefine the class in memory
* might need to subclass Moose::Meta::Class to take care of stored code
* this allows us to create a smalltalk inspired environment
* Garbage collection
* Online garbage collection schemes
* entries can already enumerate their out links
* several possibilities:
* refcounting
* on store diff $entry->references with
$entry->prev->references and update counts
* rel index table is another
* all references are cross referenced in a table that can be also
used to list backrefs. In SQL this table can have delete triggers
i guess (ask mugwump), in BDB this would be manual.
* incremental scheme
* parrot's tricolor garbage collection alrogithm is interesting
* the data could be partly maintained during store/delete
operations, with a partial sweep performed every time some
statistic is tipped
* generational GC could make sense here, due to the persistence
of the data
* Offline schemes
* currently we do a full sweep in a single transaction
* this is potentially like a global lock, except with the
possibility of deadlocking, it would be nice to have an
incremental scheme with small transactions
* mark & sweep
* tri-color using some auxillary table
* smaller transactions to update the aux table
* online updating of the aux table interleaved
* final txn reads the aux table, and deletes in a single txn
* allows the sweep to be performed incrementally without locking
everything
* collection of clusters of data: http://www-sor.inria.fr/publi/GC-PERS-DSM_POS94.html
* http://pauillac.inria.fr/~lefessan/dgc/
* http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.663
* transactional ref counting: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.2363
=== Misc ===
this is just a link dump really
http://www.ietf.org/rfc/rfc1960.txt