Skip to content

2023 04 05

wwerkk edited this page Apr 14, 2023 · 1 revision

Plenty got done over that day, as this project's direction became a little more clear. What I arrived at, is that I should keep the project minimal, well documented as possible, and not extend the scope to include accessibility issues of AI-powered audio or other projects (such as my work with RAVE, some of it available in my max_latent repo here). There will certainly be plenty of critical thinking to be done about the project once I get to a point where it's usable for other people, but for now it's still all about development and tackling github issues, which is how I currently plan out adding features/refactors.

There were several objectives I have set for that week, which funnily enough have all been dealt with within a day or so:

  • implementing the basic case of a Python script and a Max patch talking to each other through OSC
  • refactoring the frame dictionary so that instead of storing frames sample by sample, it only keeps track of their respective start/end indices in the original file
  • a script independent from the notebook capable of loading the model, dictionaries and running predictions

Afterwards, I carried on, putting all the parts together into a script+patch combo, in which the python-osc server keeps listening for messages coming in from Max. When receiving a /g message, the Dispatcher objects calls a function which generates a sequence of tokens and sends them back to the Max patch. This was pretty smooth to implement, apart from two issues:

  • the sequence returned by the generate function would always include the prompt, picked at random from the tokenized sequence dictionary. It is the size of sequence_length hyperparameter set when training the model. That does indeed sound logical, but does not prove so, when a sequence of length 4 is asked for and one of length 132 is returned. It should probably be improved in the future. so that the prompt does not have to match sequence_length in size, but can be just a single token. Uhh...
  • Max dict object is fine to read .json dictionaries, as long as they don't have any funny values under any of the keys. As it turns out, funny values include 2D Python arrays, which it somehow treats as a dictionary. I spent a large chunk of time trying to figure out whether it's something about my array being a numpy one that makes the dict object return what is called an <atomarray object>, while dict.view would show my 2D array of frame indices just fine. I had to diverge into Cycling'74 forum and JavaScript parsing, which in the end didn't work anyway. Dessperate, I stored my sample indices as an interleaved 1D list, where each odd index is the start of a frame, and the even index following it is the end of a frame. This worked.

To sum up, here are the issues which have been dealt with:


Update 11.04.2022: What I had missed completely is that my array is zero-indexed. In consequence, all start sample values will be even integers, while all end sample values will be odd. This meant the get_frame function in streamer.py required fixing. Funnily enough, even though the function was described by comment in line 43 and categorised in my head as generating a random odd value, was generating even numbers. Seems like two wrongs make a right.

Clone this wiki locally