-
Notifications
You must be signed in to change notification settings - Fork 7
Vision
- [Infrastructure] (#wiki-infrastructure)
- [Fovea] (#wiki-fovea)
- [Colour Histograms] (#wiki-colour-histograms)
- [Modules] (#wiki-modules)
- [Field Edge Detection] (#wiki-field-edge-detection)
- [Goal Detection] (#wiki-goal-detection)
- [Robot Detection] (#wiki-robot-detection)
- [Ball Detection] (#wiki-ball-detection)
- [Field Feature Detection] (#wiki-field-feature-detection)
- [Foot Detection and Avoidance] (#wiki-foot-detection-and-avoidance)
The Vision module begins by retrieving both the top and bottom camera images. If one of the images is not ready, the thread will wait until they are both ready. Two primary foveas are constructed from the raw images, one per camera, to form the basis of the vision pipeline. The top camera image is scaled down to 160x120 whilst the bottom camera image is scaled down to 80x60.
A detailed description of the vision infrastructure can be found here.
A fovea is used to represent a section of an image. It is a flexible class that provides a generic interface between images (or sub images) and the vision pipeline. Many of the vision algorithms are designed to be run on any fovea, allowing for maximal code reuse and cleanliness. A fovea can have colour and edge saliency data available. The colour saliency is generated from the colour calibration table and the edge saliency is generated from the grey scale version of the raw image.
The colour histograms contain a count of the number of pixels of each colour for each row and column in a fovea. They are calculated once per fovea, reducing the amount of repetition across the vision pipeline.
The field edge plays a vital role in the vision pipeline. The field edge is used to determine where the field starts so that we can save time when scanning the image for other features like the ball or field lines. The module sets an index for each column to indicate the pixel at the edge of the field for all other algorithms to use as a start when scanning.
The algorithm for detecting the field edge starts at the top of the image and scans down until it finds a significant patch of green. This scan is run on every column and results in a set of points across the image. RANSAC is then applied to the set of points to attempt to extract straight lines out of it. We attempt to detect up to two lines in any one image, since there can be two field edges in view at any point in time. This algorithm is run in both the top and bottom cameras independently since the field edge may run across both images.
If no field edge is detected, then we guess if the field edge is above or below the current camera view. This guess is based on the amount of green present in the image. If the image contains a large portion of green, but no distinct field edge, the field edge is assumed to be above the camera view and the entire image is treated as being "on the field". If not enough green is present, then the robot is deemed to be looking off the field or into the sky.
More details can be found here.
Due to the rule change for 2015 Goal Detection has been updated to deal with White Goalposts, the goal detection algorithm makes use of only the top camera and performs a scan along the field edge from left to right as provided by Field Edge Detection, any white pixels along a set radius around the field edge mark the possibility of a goalpost intersecting the field edge and are stored as candidate left vertical edge of a goalpost. Each potential starting region is expanded downwards and towards the right to generate a bounded region that corresponds to a potential base of a goalpost(region that is under the field edge).
Using the width of the goalpost base a top edge is then estimated through knowing the ratio between the width and height of a post. Once a full goalpost candidate region is created, quality checks are performed to remove false positives involving the following criteria:
- Strong edge along base of goalpost, strong edge along the left and right edge of goalpost
- % of white within the candidate region
- Ratio of goalpost above and below field edge
- Amount of field below the candidate region
In many cases the back support goal posts are also generated as candidate regions, these are culled out by checking the height difference between the larger(forward) and smaller(support) goalposts and the horizontal distance between them, if there is too large a difference in both these parameters the smaller goalpost is culled from the candidate region.
The next step is to tune the bounding box and calculate a distance measurement to the goalpost. To calculate the distance to the goal we have 2 measurements, one using the kinematics-based distance to the pixel at the base of the post and one using the width of the post. The kinematics distance is generally more reliable than the width, however is very heavily influenced by the rock of the robot as it walks. Thus the kinematics distance measurements lose their accuracy significantly while walking. During this process, higher resolution foveas are used to more accurately detect the base and width of the post. Data from these measurements is used to fine tune the bounding box around the goal with more accurate measurements.
The posts are then labelled as left / right and the data from the two cameras is merged and saved in the vision frame.
Robot detection uses breaks in the field edge as the basis for finding potential robots. A break in the field edge is when there aren't any green pixels along one section of the field edge. The assumption is that if there aren't green pixels there, some form of obstruction must be there.
Each candidate region is examined more closely to determine where the top and the bottom of the obstruction are. Once the area is finalised, a bayesian machine learner is used to determine if a region contains part of a robot or not.
The next step is to attempt to find the jersey and identify which team the robot is from. If the team cannot be identified, the robot can still be detected, but is labelled as unknown team. Finally the robot arms and legs are merged into the primary robot region, since they often form independent regions either side of the main robot.
More details can be found here.
Ball Detection runs in the primary bottom camera fovea first, and if it does not detect a ball there, is run on the primary top camera fovea. If the ball cannot be detected without prior knowledge, we then attempt to search areas we expect the ball to be in using outside information, such as team mate ball locations or previous ball locations.
Ball Detection utilises colour histograms to determine points of interest that may contain a ball. It matches columns and rows that both contain orange and then examines those areas in a higher resolution fovea. The actual ball detection algorithm has two steps, finding the ball edges and then fitting a circle to those edges.
The process for finding the ball edges involves scanning around the fovea and keeping track of the strongest edges found in the fovea. The scan starts at the centre of the fovea and scans outwards, radially, a number of times. If a strong edge point is detected during each scan it is added to the total list of edge points.
Once all the edge points are found, a RANSAC algorithm is applied to fit a circle to them. The RANSAC algorithm takes 3 points, generates a circle and tests how many other edge points lie on the circle. This process is repeated a set number of times and if a good enough match is found then the centre is calculated and stored as a detected ball.
Ball Detection tends to over detect balls, rather than under detect balls. As a result the algorithm works best by underclassifing orange pixels so that the ball has some orange, but other non-ball items (such as jerseys) have little to no orange on them.
More details can be found here.
Field Feature Detection runs in both cameras, but is expensive to run so we attempt to minimise usage where possible. It will always run in the bottom camera, then it will run inside small windows in the top camera. If we still don't find enough features, it will run in the entire top camera frame, which is the most expensive, but also the most likely to detect good features.
The first attempt to run in the top camera uses a searchForFeatures function which guesses where interesting field line data exists based on the current localisation estimate. If the robot is well localised, this works quite well and the robot is able to detect features from 4-5m away and remain well localised. If the robot isn't well localised, sometimes the windows are lucky and still detect a good feature, but often we rely on searching the entire top camera when localisation is uncertain.
Field feature detection has 3 stages, finding points that might lie on a field line, fitting lines and circles to those points, and finally generating more complex shapes like corners and t-intersections.
Finding field line points involves scanning both vertically and horizontally whilst examining edge data. The algorithm searches for matching pairs of strong edges that have opposing directions and uses the midpoint as the output. The pair of strong edges represent the green-to-white and white-to-green edges on either side of the line, so the midpoint should lie on the centre of the line. There are a variety of checks on the points, including the distance between them, the colour of the midpoint, etc to ensure quality points.
Fitting lines and circles involves using a RANSAC based approach. Each RANSAC cycle picks two points at random and attempts to fit both a line and a circle through them. It calculates how many other points also fit each shape to determine which is the best fit, with neither being an acceptable result. This is repeated a set number of times with the best overall match being tracked along the way. Once a cycle is complete, all the points matching the best fitting line or circle are removed and the process is started again. All the points are projected into the ground plane, using the kinematics of the robot, to make matching shapes easier.
Matching higher order shapes involves combining primitive lines and circles into more complicated and identifiable features, including corners, t-intersections and parallel lines. The key metrics to match shapes are the angles of lines relative to each other and the distance between each line's endpoints and the endpoints of neighbouring lines.
More details can be found here.
Foot Detection is used on the robot to detect opponents robots feet and avoid them when attempting to maneuver around a robot, thus reducing the total number of fall during the game. Foot detection and avoidance mostly occurs in a ball steal action. Foot detection works by scanning the bottom camera for edges and generating a set of points that occur at the edges. Given this set a series of algorithms are used to create a set of bounding boxes around a robots feet.
Initially straight edges are removed to avoid detecting lines as feet, following this points near a tall section of white in the image are kept and the rest are removed. This is done as supposedly a foot would be under a robot. Then only the lowest point at each x coordinate is kept as the feet should be at the bottom of the image. Given these final clusters of points around the robot's foot, the sets of points are averaged to a single point through bucketing. At each bucket a region is grown using a bfs through the non-green and non ball coloured pixels in the image. These regions are restricted in height and width, the regions are not able to grow into each-other as a 2D array of traversed points is kept globally.
When the region has finished growing its, max and min x, y points are used to define the bounding boxes for the feet. These bounding boxes are retrieved from the blackboard in the walk generator. A set of equations is used to determine if a given set of forward left and turn parameters will cause the robot to intersect with a bounding box. If this occurs the parameters are scaled until the do not fall within a box. These updated parameters are then used by the walk.