Document not found (404)
+This URL is invalid, sorry. Please use the navigation bar or search to continue.
+ +diff --git a/.lock b/.lock new file mode 100644 index 0000000..e69de29 diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..6c49820 --- /dev/null +++ b/404.html @@ -0,0 +1,200 @@ + + +
+ + +This URL is invalid, sorry. Please use the navigation bar or search to continue.
+ +While most applications will use user-defined filters, vidformer ships with a handful of built-in filters to get you started:
+DrawText
does exactly what it sounds like: draw text on a frame.
For example:
+DrawText(frame, text="Hello, world!", x=100, y=100, size=48, color="white")
+
+BoundingBox
draws bounding boxes on a frame.
For example:
+BoundingBox(frame, bounds=obj)
+
+Where obj
is JSON with this schema:
[
+ {
+ "class": "person",
+ "confidence": 0.916827917098999,
+ "x1": 683.0721842447916,
+ "y1": 100.92174338626751,
+ "x2": 1006.863525390625,
+ "y2": 720
+ },
+ {
+ "class": "dog",
+ "confidence": 0.902531921863556,
+ "x1": 360.8750813802083,
+ "y1": 47.983140622720974,
+ "x2": 606.76171875,
+ "y2": 717.9591837897462
+ }
+]
+
+The Scale
filter transforms one frame type to another.
+It changes both resolution and pixel format.
+This is the most important filter and is essential for building with vidformer.
Arguments:
+Scale(
+ frame: Frame,
+ width: int = None,
+ height: int = None,
+ pix_fmt: str = None)
+
+By default missing width
, height
and format
values are set to match frame
.
+pix_fmt
must match ffmpeg's name for a pixel format.
For example:
+frame = Scale(frame, width=1280, height=720, pix_fmt="rgb24")
+
+IPC allows for calling User-Defined Filters (UDFs) running on the same system.
+It is an infrastructure-level filter and is used to implement other filters.
+It is configured with a socket
and func
, the filter's name, both strings.
The IPC
filter can not be directly invoked, rather IPC filters are constructed by a server upon request.
+This can be difficult, but vidformer-py handles this for you.
+As of right now IPC
only supports rgb24
frames.
HStack & VStack allow for composing multiple frames together, stacking them either horizontally or vertically. +It tries to automatically find a reasonable layout.
+Arguments:
+HStack(
+ *frames: list[Frame],
+ width: int,
+ height: int,
+ format: str)
+
+At least one frame is required, along with a width
, height
and format
.
For example:
+compilation = HStack(left_frame, right_frame, width=1280, height=720, format="rgb24")
+
+
+ vidformer builds on the data model introduced in the V2V paper.
+Frames are a single image.
+Frames are represented as their resolution and pixel format (the type and layout of pixels in memory, such as rgb24
, gray8
, or yuv420p
).
Videos are sequences of frames represented as an array. +We index these arrays by rational numbers corresponding to their timestamp.
+Filters are functions which construct a frame.
+Filters can take inputs, such as frames or data.
+For example, DrawText
may draw some text on a frame.
Specs declarativly represent a video synthesis task. +They represent the construction of a result videos, which is itself modeled as an array.
+domain
and render
functions.
+Data Arrays allow using data in specs symbolically, as opposed to inserting constants directly into the spec. +These allow for deduplication and loading large data blobs efficiently.
+In short, essentially everything. +vidformer uses the FFmpeg/libav* libraries internally, so any media FFmpeg works with should work in vidformer as well. +We support many container formats (e.g., mp4, mov) and codecs (e.g., H.264, VP8).
+A full list of supported codecs enabled in a vidformer build can be found by running:
+vidformer-cli codecs
+
+Yes, vidformer uses Apache OpenDAL for I/O, so most common data/storage access protocols are supported. +However, not all storage services are enabled in distributed binaries. +We guarantee that HTTP, S3, and the local filesystem always work.
+vidformer is far more expressive than the FFmpeg filter interface. +Mainly, vidformer is designed for work around data, so edits are created programatically and edits can reference data. +Also, vidformer enables serving resut videos on demand.
+vidformer uses the FFmpeg/libav* libraries internally, so any media FFmpeg works with should also work in vidformer.
+vidformer orchestrates data movment in video synthesis tasks, but does not implement image processing directly. +Most use cases will still use OpenCV for this.
+ +This is a walkthrough of getting started with vidformer OpenCV cv2
compatability layer.
++⚠️ Adding
+cv2
functions is a work in progress. See the cv2 filters page for which functions have been implemented.
++⚠️ Due to how Colab networking works, vidformer can't stream/play results in Colab, only save them to disk.
+cv2.vidplay()
will not work!
Copy in your video, or use ours:
+curl -O https://f.dominik.win/data/dve2/tos_720p.mp4
+
+Then just replace import cv2
with import vidformer.cv2 as cv2
.
+Here's our example script:
import vidformer.cv2 as cv2
+
+cap = cv2.VideoCapture("tos_720p.mp4")
+fps = cap.get(cv2.CAP_PROP_FPS)
+width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+out = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
+ fps, (width, height))
+while True:
+ ret, frame = cap.read()
+ if not ret:
+ break
+
+ cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
+ 1, (255, 0, 0), 1)
+ out.write(frame)
+
+cap.release()
+out.release()
+
+Saving videos to disk works, but we can also display them in the notebook. +Since we stream the results and only render them on demand this can start practically instantly!
+First, replace "output.mp4"
with None
to skip writing the video to disk.
+Then you can use cv2.vidplay()
to play the video!
import vidformer.cv2 as cv2
+
+cap = cv2.VideoCapture("tos_720p.mp4")
+fps = cap.get(cv2.CAP_PROP_FPS)
+width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+out = cv2.VideoWriter(None, cv2.VideoWriter_fourcc(*"mp4v"),
+ fps, (width, height))
+while True:
+ ret, frame = cap.read()
+ if not ret:
+ break
+
+ cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
+ 1, (255, 0, 0), 1)
+ out.write(frame)
+
+cap.release()
+out.release()
+
+cv2.vidplay(out)
+
+++ +⚠️ By default
+cv2.vidplay()
will return a video which plays in a Jupyter Notebook. If running outside a jupyter notebook you can passmethod="link"
to return a link instead.
This is a walkthrough of getting started with vidformer-py
core DSL.
++⚠️ We assume this is in a Jupyter notebook. If not then
+.play()
won't work and you have to use.save()
instead.
We start by connecting to a server and registering a source:
+import vidformer as vf
+from fractions import Fraction
+
+server = vf.YrdenServer(domain='localhost', port=8000)
+
+tos = vf.Source(
+ server,
+ "tos_720p", # name (for pretty printing)
+ "https://f.dominik.win/data/dve2/tos_720p.mp4",
+ stream=0, # index of the video stream we want to use
+)
+
+print(tos.ts())
+print(tos.fmt())
+
+This will print the timestamps of all the frames in the video, and then format information: +This may take a few seconds the first time, but frame times are cached afterwords.
+> [Fraction(0, 1), Fraction(1, 24), Fraction(1, 12), Fraction(1, 8), ...]
+> {'width': 1280, 'height': 720, 'pix_fmt': 'yuv420p'}
+
+Now lets create a 30 second clip starting at the 5 minute mark. +The source video is at at a constant 24 FPS, so lets create a 24 FPS output as well:
+domain = [Fraction(i, 24) for i in range(24 * 30)]
+
+Now we need to render each of these frames, so we define a render function.
+def render(t: Fraction, i: int):
+ clip_start_point = Fraction(5 * 60, 1) # start at 5 * 60 seconds
+ return tos[t + clip_start_point]
+
+We used timestamp-based indexing here, but you can also use integer indexing (tos.iloc[i + 5 * 60 * 24]
).
Now we can create a spec and play it in the browser.
+We create a spec from the resulting video's frame timestamps (domain
), a function to construct each output frame (render
), and the output videos format (matching tos.fmt()
).
spec = vf.Spec(domain, render, tos.fmt())
+spec.play(server)
+
+This plays this result: +
+++Some Jupyter environments are weird (i.e., VS Code), so
+.play()
might not work. Using.play(..., method="iframe")
may help.
It's worth noting that we are playing frames in order here and outputing video at the same framerate we recieved, but that doesn't need to be the case. +Here are some things other things you can now try:
+.play()
will not work with VFR, but .save()
will.Now let's overlay some bouding boxes over the entire clip:
+# Load some data
+import urllib.request, json
+with urllib.request.urlopen("https://f.dominik.win/data/dve2/tos_720p-objects.json") as r:
+ detections_per_frame = json.load(r)
+
+bbox = vf.Filter("BoundingBox") # load the built-in BoundingBox filter
+
+domain = tos.ts() # output should have same frame timestamps as our example clip
+
+def render(t, i):
+ return bbox(
+ tos[t],
+ bounds=detections_per_frame[i])
+
+spec = vf.Spec(domain, render, tos.fmt())
+spec.play(server)
+
+This plays this result (video is just a sample clip): +
+We can place frames next to each other with the HStack
and VStack
filters.
+For example, HStack(left_frame, middle_frame, right_frame, width=1280, height=720, format="yuv420p")
will place three frames side-by-side.
As a larger example, we can view a window function over frames as a 5x5 grid:
+hstack = vf.Filter("HStack")
+vstack = vf.Filter("VStack")
+
+w, h = 1920, 1080
+
+def create_grid(tos, i, N, width, height, fmt="yuv420p"):
+ grid = []
+ for row in range(N):
+ columns = []
+ for col in range(N):
+ index = row * N + col
+ columns.append(tos.iloc[i + index])
+ grid.append(hstack(*columns, width=width, height=height//N, format=fmt))
+ final_grid = vstack(*grid, width=width, height=height, format=fmt)
+ return final_grid
+
+domain = [Fraction(i, 24) for i in range(0, 5000)]
+
+def render(t, i):
+ return create_grid(tos, i, 5, w, h)
+
+fmt = {'width': w, 'height': h, 'pix_fmt': 'yuv420p'}
+
+spec = vf.Spec(domain, render, fmt)
+spec.play(server)
+
+This plays this result (video is just a sample clip): +
+This notebook shows how to build custom filters to overlay data.
+This plays this result (video is just a sample clip): +
+ +A research project providing infrastructure for video interfaces and pipelines. +Developed by the OSU Interactive Data Systems Lab.
+Vidformer efficiently transforms video data, enabling faster annotation, editing, and processing of video data—without having to focus on performance.
+It uses a declarative specification format to represent transformations. This enables:
+⚡ Transparent Optimization: Vidformer optimizes the execution of declarative specifications just like a relational database optimizes relational queries.
+⏳ Lazy/Deferred Execution: Video results can be retrieved on-demand, allowing for practically instantaneous playback of video results.
+🔄 Transpilation: Vidformer specifications can be created from existing code (like cv2
).
The easiest way to get started is using vidformer's cv2
frontend, which allows most Python OpenCV visualization scripts to replace import cv2
with import vidformer.cv2 as cv2
:
import vidformer.cv2 as cv2
+
+cap = cv2.VideoCapture("my_input.mp4")
+fps = cap.get(cv2.CAP_PROP_FPS)
+width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+out = cv2.VideoWriter("my_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"),
+ fps, (width, height))
+while True:
+ ret, frame = cap.read()
+ if not ret:
+ break
+
+ cv2.putText(frame, "Hello, World!", (100, 100), cv2.FONT_HERSHEY_SIMPLEX,
+ 1, (255, 0, 0), 1)
+ out.write(frame)
+
+cap.release()
+out.release()
+
+You can find details on this in our Getting Started Guide.
+Vidformer is a highly modular suite of tools that work together; these are detailed here.
+❌ vidformer is NOT:
+However, vidformer is highly complementary to each of these. +If you're working on any of the later four, vidformer may be for you.
+File Layout:
+License: Vidformer is open source under Apache-2.0. +Contributions welcome.
+ +