Sketch Recognition - CSE 624: 十月 2010

2010年10月17日星期日

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comment
Jonathan Hall
Summary

The paper aims to differentiate shapes and texts in hand-drawn diagrams by using entropy. The algorithm in the paper is a context-free method, which only employs the feature of both structures.

The only feature used for classification is the angle of each point. Then all angles are encoded into 7 different symbols according to the value of the angle. The information entropy of all 7 symbols is calculated, and averaged by the diagonal of the bounding box. A threshold is set to determine which are shapes, which are texts and which are unclassified strokes. Also, a measure of confidence is designed to make the algorithm easily be integrated into other systems.

The algorithm is tested on both seen datasets and unseen datasets. The result of seen datasets is much higher than Patel's, and that of unseen datasets is also higher. Also, the author employs GZIP entropy instead of zero-order entropy, and find it useful when dealing with repeated patterns.

Discussion

It is a great paper published in IJCAI 09. The idea is very creative and excellent to obtain high accuracy. The feature is more intuitive than those gesture-based ones, like Rubines. The idea works well in diagram recognition, because shapes in a diagram are always more regular than in other domains. It cannot work well in music notes, just because music notes are often not as regular as shapes in a diagram. But it is enough for us to implement in our Project 2.

Also, the author leaves some strokes as unclassified strokes, which can reduce the error rate and put these strokes into higher-level recognition. It is a interface for us to integrate other algorithms into it. As the paper says, when unclassified strokes are more, then the accuracy is higher. But we cannot leave such strokes too much.

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Comment
Jonathan Hall
Summary

The paper aims to differetiate the significance of ink features for Diagram Recognition. 46 features from kinds of sources are tested in the paper to judge their importance when distinguishing shape and text. The core technique of differentiating is based on a decision tree. To realize a decision tree, the author uses rpart function in R Statistical Package. The best partitioning feature used in rpart is chosen by minimizing a measure of purity using the Gini index. The decision tree is the figure below:

The decision tree for differentiating features

Through the experiment, 8 features are found to be the most important ones. They are Time till next stroke,Speed till next stroke, Distance from last stroke, Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, Total angle.

Discussion

The paper provides some important information when recognizing diagrams. Those 8 features mentioned in the paper seem useful in differentiating shapes and texts. It may be useful in Project2, helping me to distinguish shapes and texts.

However, I think the paper is limited in several ways. First, it is just a "beginner" paper, which just gives some hints in the diagram recognition. And the decision tree can only be used in binary classification. If there is another kind of elements in the diagram, how does the decision tree work? Second, the result seems not very good. There are still 42% misclassified shapes and 21% misclassified texts. The algorithm needs more improvement.

2010年10月16日星期六

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Comment
Jonathan
Summary

The paper aims to recognize sketches by introducing constellation models from computer vision. It uses constellation models to develop probabilistic models with a multi-pass branch-and-bound search method for object sketches based on multiple example drawings. Constellation models are as the figure below.

constellation model for face sketch recognition

The core of the system is to match all parts of a new sketch with those of training examples. All parts of a sketch are divided into two categories, mandatory parts and optional parts. The mandatory parts are more important than optional parts when recognizing.

An object class model is represented by the distribution of features in object constellation models. The system adopts multivariable gaussian distribution to learn the model. The quality function of a match is defined by

Then it adopts maximum likelihood search to find the most plausible labelling for all strokes that appear in the image. Also it adopts two search phase to reduce the time cost. The first one is to label strokes that correspond only to the mandatory object parts, and the second one is to use hard constraints.

The system is tested on 5 classes of objects. No recognition rate is shown in the paper.

Discussion

It is a good idea to introduce constellation model into sketch recognition. The model is good representation of the relative information between any two parts. It will help us to recognize symbols in Project2, I guess.

But there are several problems in this article that make me skeptical of it. First, there is no recognition rate shown in the paper. Whether the algorithm is feasible can not be proved. Five classes are not a large number for sketch recognition, so the test on those classes seem unpersuasive. Second, features used in the paper seem too simple for sketch recognition. Only the centeroid, the diagonal and angle of the bounding box are not enough to recognize sketch.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Comment
Martin Field
Summary

LADDER, is a language to describe how sketched diagrams in a domain are drawn, displayed, and edited. It aims to build a multi-domain system that can be customized for each domain. LADDER is primarily based on shapes, which means the minimum unit is one shape. A shape includes both geometric information and other useful information, like stroke orders and directions.

The system includes recognition of primitive shapes, recognition of domain shapes, editing recognition, and constraint solver.

Recognition of primitive shapes: To recognize a stroke as an ELLIPSE, LINE, CURVE, ARC, POINT, POLYLINE or some combination using techniques.

Recognition of domain shapes: It is based on Jess rule, which searches for all possible combinations of shapes that can satisfy it. When a new primitive shape is recognized, it will be added to the system to find whether it can be combined with others. It is bottom-up.

Editing recognition: To recognize the editing gesture, in order to allow users edit shapes.

Constraint solver: To display a shape's ideal strokes by using optimazation functions on constraints.

Discussion

A great work to design a language for sketch recognition. It is an excellent idea to design a language, rather than design an algorithm. LADDER, is a more generalized system than others. It can be used in multi-domain sketch recognition. LADDER is also a framework, so it can combine with others easily.

However, as the paper says, there are still several work to improve LADDER. To some extent, the core of LADDER is the recognition of the primitive shape. It is just bottom-up, can not back-trace. So if there is an error in this stage, then this error will not be removed. Though LADDER can deal with multiple interpretation from primitive shapes, I think it is better to design a probability based recognition algorithm to solve the problem of multiple interpretation. The limitation in Section 2.1 is also to be improved.

For complicated shapes, I think it is impossible for LADDER to recognize. LADDER can only recognize what users can exactly describe. Exactly means there is only one shape according to the description.

2010年10月10日星期日

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comments
Chris Aikens
Summary

The paper is motivated by the desire to involve computers into the design process. It involves the user to make decision. The contribution of this paper is to involve the user in the system and make the system interactive. To involve the user, the system adopts some extra functions and context. These functions aim to capture users' intention, and context makes the system closed to the way of human thinking.
The paper firstly gives the introduction of their previous work-HUNCH system, introducing the low-level inference in it, like latching, overtracing and so on. The key assumption in the HUNCH is that speed is the intention of the user, however, which always causes errors in the system.

The author develops the system by introducing the high-level of inference-context and improving the low-level inference.The structure of context in HUNCH is context-free, as the figure below. The structure of context in the new system is context-based, as the figure below.

context-free

context based

The main program will see a tablet which produces not only X, Y and Z but also speed, bentness, corners, and curves. They will be used in inference-making. The scale that the user worked will be determined by other cluse, such as busyness. The main program will run in the background, without interrupting users. The low-level inference starts when the user begins to draw, while the high-level inference starts when the user stops.

Discussion

A very old paper, but some excellent ideas. Context, in my opinion, is very important in high-level recognizing. I guess, it may be the first time to introduce context in the sketch recognition. The core of it is to involve the user to make decision, which actually is now very often used in our recognition. The feedback, or backtrace seems important in the recognition, which is better than the traditional method, like bottom-up.

But there is very few details in the paper about how the context worked. Is there some methods to obtain the result from the context, like Markov? Also, I cannot really understand what the paper want to express, but I can find some excellent ideas in it.

Reading 9# PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Comments

Martin Field

Summary

PaleoSketch is a new low-level recognition and beautification system that can recognize eight primitive shapes, as well as combinations of these primitives, with recognition rates at 98.56%. The contribution of PaleoSketch is few constraints on how users draw, returning multiple interpretations and recognizing more primitives than before. PaleoSketch is geometric based. The most important assumption in PaleoSketch is that all primitive shapes are completed by a single stroke.

Implementation:

(1) Pre-recognition: Remove consecutive points and calculate kinds of graphs and values (including direction graph, speed graph, curvature graph, and corners). Two new features are added. The first is normalized distance between direction extremes (NDDE), which of polyline is lower than that of curve. The second is direction change ratio (DCR), which of polyline is larger than that of curve.
(2) The recognizer conducted kinds of tests for each stroke, like line test, polyline test, ellipse test, circle test, arc test, curve test, spiral test, helix test and complex test, based on some common geometric rules.
(3) Hierarchy: Set priority for each primitive shape, and sort all interpretation according to the priority.

Result:
The accuracy is 98.56% when calculating the top interpretation as the correct one. NDDE and DCR are demonstrated to have significant effect on recognizing.

Discussion
PaleoSketch is an excellent tool in recognizing primitive shapes. Everyone should have the experience when doing Truss Recognition Project. It will help us recognizing objects not only in sketch recognition, but also in kinds of other fields of recognition. It also returns multiple interpretations, which will help us correct the result by context when the top interpretation is not right. I think it would be better if PaleoSketch can return each interpretation with its probability. So a sketch can be recognized with multiple results with probability.

订阅：博文 (Atom)