Sketch Recognition - CSE 624: 2010

2010年12月12日星期日

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comments:
Wenzhe Li
Summary:

The paper presents a a dual-view, multi-stroke sketch recognition environment for class diagrams in UML, called Tahuti. It is a geometry-based method to give users more freedom to draw and edit.

The multi-layer framework includes processing, selection, recognition, identification. The main idea is to find any possible collection of strokes, then recognize and identify. In order to reduce the burden of grouping, the framework rules out a lot of impossible collections by setting the maximum number of strokes in a collection and setting some restriction on each stroke.

Then the author introduces some recognition method for basic shape, like rectangle, ellipse.

The experiment shows that Tahuti is the most welcomed by users compared with other systems.

Discussion:
The idea of grouping is really good. It does not require users to draw an object in a specific manner. Also, the author find some methods to reduce the amount of collections, which allow the computer to run the program in real time. I see the same idea in the CivilSketch code.

However, in this paper, there is no accuracy about recognition rate. We do not know whether the system really works well. There should be a test on its geometric method.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comments:
Chris
Summary:
The paper presents an acoustic-based finger input system, called Scratch Input, that can be used to create large, inexpensive and mobile finger input surfaces. The system is easy and convenient to carry, and can be used in desk, wall, mobile phone and etc. Only one microphone is used to record sounds.

The recognizer employed in the system is to recognize gestures by their sounds. Peak counts and amplitude is extracted from the sound of each gesture, and a shallow decision tree is used to make a decision. The system is tested by six gestures, and the accuracy is near 89%.
Discussion:
An interesting topic to recognize gestures by sounds. In the paper, authors give us several examples of the application of their system. However, I find the system seems too trivial. First, gestures are too different. Six gesture may be easily differentiated by peak counts due to different strokes. Second, the vocabulary is limited, because too many gestures have the same sound. Third, the order of strokes is very important to the system, and only specified order can be recognized. Finally, the system need a quiet environment. Though easy and convenient, the system really has many limitations.

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Comments:
Wenzhe Li
Summary:
The paper presents a system, called iCanDraw to teach novice drawers how to draw a person's face. It can provide directions and feedbacks for users to help draw a face as accurately as possible. It is important, because users cannot always find a instructor to teach and correct them.

The system starts with a reference image. This image is recognized by a face recognition technique, and refined by authors in order to make the template as good as possible. The users look at the image, create reference lines, and then draw a face. Users can check whether their drawing is consistent with the template at any time, and get corrective feedback.

User studies show the good performance of the system.
Discussion:
It is a great work to teach novice drawers how to draw a face when an instructor is not available. What I appreciate more is the section of corrective feedback. In my opinion, the most important part in human-computer interaction is the feedback. The system can really give a good feedback to assist drawers to draw a beautiful image as the user study says.

However, the system is only limited to drawing a face, due to the mature techniques on face recognition. It seems not easy to extend to other pictures. I have a idea whether can allow users draw reference line themselves? Draw in the template and display in both the template and the drawing area.

Reading #27: K-sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comments:
Sam
Summary:
The paper proposes a general-purpose, informal, 2D animation sketching system, called K-Sketch, to help novices create a wide range of animations quickly.

The system began with a lot of user studies, interview with animators and non-animators, which demonstrated the importance of designing a informal animation tool with little time to learn or use. 18 animation operations were proposed by those users. The goal of the system is not only fast, but also powerful. So the system selected 9 operations, which can meet most requirements of users. The system is implemented in C#.

The system is evaluated by three small user studies. All studies indicated that K-Sketch is stronger than PowerPoint in many aspects, except comfort sharing.
Discussion:
It is a good paper to help novice researcher, like me, to conduct a research on human-computer interaction based sketching system. First, conduct a study to know people's requirements. Second, design a system and make a trade-off between functions and computational time. Final, conduct user studies to compare the performance with other tools.

However, for this paper, I treat it more as a technical report, rather than a conference paper. I think for a conference paper, there should be some new ideas. But the paper includes more implementations rather than ideas.

2010年12月11日星期六

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comments:
Francisco
Summary:
The paper proposes a sketch-based game, Picturephone, for collecting data on how people amke and describe sketches. It is inspired by a children's game called Telephone. There are three modes, draw, describe and rate. Each user will be randomly assigned a mode to involve into the game. In draw mode, users will be asked to draw a sketch based on the description. In describe mode, users will be asked to give a description based on the sketch. In rate mode, users will be asked to giave a point to each pair of sketches.

Picturephone is a web-based application, using the standard HTTP protocol. The main adavantage is that it doesn't require all users play the game synchronously.
Discussion:
A good application of hand-drawn sketch. The idea of the game is really good. Compared to Stellasketch, I think the asynchronous game is better. It is really hard to let a lot users play a game at the same time unless the game has been as popular as Chess, Poker. To be honest, it is impossible for such a game.

Also, what I am concerned more is show in discussion of Reading 24, how to use these data. I think there are still a long way to develop a recognizer to use these sketches as examples. How to filter out dirty data, how to remove ambiguity and conflict is still a main topic for these games.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comments:
Chris
Summary:
The paper presents a tensor-based descriptor for large scale image retrieval based on sketched feature lines. The descriptor is used to search an image in the database, which is similar to the input sketch. It solves the problem of asymmetry between the binary sketch input and the full color image.

The proposed tensor descriptor provides the information about the main orientation of the gradients in a cell. The descriptor is tested by a set of 1.5 million pictures related to outdoor sceneries. It performs comparably or slightly better than the MPEG-7 edge histogram descriptor variant. And it is easy to implement and efficient in evaluation.

Discussion:
It is a good idea to search an image from a database by an input sketch. Sketch based image retrieval is also another direction in the field of sketch. The descriptor proposed in the paper is simple to implement and better than another descriptor. However, there is no extra comparison between tensor and others, so I have no idea about the performance of the descriptor. And in the experiments, an input sketch can always find a lot of candidate pictures, some of which seems unrelated to the input. So there should be other descriptors to be added to make an efficient retrieval. Also, the descriptor has some limitation in transformation, which need improvement in future.

Reading #24: Games for Sketch Data Collection (Johnson)

Comments:
Kim
Summary:
The paper presents two games for sketch data collection. One is a asynchronous game called Picturephone, the other is a synchronous game called Stellasketch. Both these two games are web-based, and need a lot of users to participate.

Picturephone collects long sentences that describe sketches. Each user is randomly assigned to one of three modes (draw, describe and rate). In draw mode, users draw a sketch based on the description. In describe mode, users describe the sketch. In rate mode, users give a score to each pair of sketches.

Stellasketch gathers short noun-phrase that label sketches as they are made. Each round, one user draw a sketch based on the nouns, and other users describe the sketch by noun-phrase. Each person doesn't know other's job.

Discussion:
These two games are only used to collect users' sketches. The ideas of these two games are really good, because it captures data when people are entertaining. They are a good application of human-computer interaction.

However, as the author says, there are still some difficulties on how to use these sketches. Some data is with much noise, some data is with extra strokes, and some data even is consisted of unrelated sketch. The quality of data cannot be guaranteed. Also, how to use these data to train recognizers is also very hard.

I appreciate the idea of collecting data by games, but I don't think there is much more adavantages than the normal method, such as user study.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comments:
Francisco
Summary:
The paper presents a pen-based active note taking system, called InkSeine. It offers rapid, minimally distracting interactions for users to seek, gather and manipualte the "task detritus" of electronic work. It has several important design properties: levereages preexisting ink to initiate search, promotes queries as first class objects that are commingled with ink notes, interleaves inking, searching and gathering, tightly couples queries with application content.

The design goal of the system is as follows:
In situ search experience: No switch, shift or force transcription
Optimum workflow or maximum flexibility: Allow interruption at any point, and can resume from where it left off.
Enable rich trade-offs: Lower computional time
Gather content: allow users to gather beneficial task detritus
Minimize search screen real estate: result returned by search should be smaller than the screen
Span application boundaries: enable access to information from a variety of sources
Tailored to pen input: easy gestures

Then the paper introduce all components of the system in detail. And user studies are also conducted to help improve the system.

Discussion:
Excellent work for note taking in sketch recognition! A promising direction in sketch recognition. The system provides complex interface that almost supported kinds of operations for users. And the design of the system is from users' view and verified by user studies. However, what I am concerned is whether there is a robust handwritting recognizer to support such a system. The paper doesn't give any information about the recognition accuracy of the recognizer. Maybe the paper is just to provide the excellent idea of an human-interaction interface.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comments:
Kim
Summary:
The paper presents an interactive design system to help users design 3D plush toys. The system creates a 3D plush toy model from scratch by simply drawing its desired silhouette. The 3D model is associated with a 2D pattern, and it is the result of a physical simulation than mimics the inflation effect caused by stuffing. The system uses simple iterative adjustment method to implement the physical simulation.

In the user interface, the system provides a lot of functions for users, such as creating a new model, cut, creation of a part, pull and insertion and deletion of seam lines.

The author conducted several user studies. The system was first supported by professional balloon designers. And novice users also fount it easy to learn.

Discussion:
Another good paper for 3D design by 2D sketching. After reading these two papers, I find it is really a good direction for sketch recognition, maybe sketch rendering. In computer vision, rendering is a good direction, so I think sketch rendering may be also good for research.

In this paper, the system used very simple algorithm to convert 2d sketching to 3d toys. Time cost is also very important for rendering. In computer vision, 3D reconstruction always need a large amount of time. But in these two papers, they all avoid some uncommon situations and reduce the burden of computation. Although limited by some situations, these two systems both work well in user studies.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comments:
Kim
Summary:

The paper proposes a sketching interface for 3D freeform design, called Teddy. Users draw 2D strokes to construct the silhouette for the object, and the system automatically convert them into 3D polygonal surface. The whole process for the user is very easy. According to the user study in the paper, a first-time user can create their own models fluently within 10 minutes.

How Teddy works

There are kinds of operations in the system, creating a new object, painting and erasing on the surface, extrusion, cutting, smoothing and transformation. The core algorithm is to use a standard polygonal mesh to represent a 3D picture. The system is robust and efficient enough for experimental use, but it can fail ro generate unintuitive results when users draw unexpected strokes.

Discussion:

An interesting topic of 3D rendering and reconstruction. It is the first time for me to contact with the topic, which seems very interesting to me. According to my knowledge, 3D rendering is always complex, but in the paper the author proposes an easy and quick algorithm to realize it. The key to make an easy one is to create some reference strokes when drawing, in my opinion. Those reference strokes reduce the big burden for the system to calculate large amount of possibilities. A 2D picture always correpsonds to several 3D pictures.

2010年12月8日星期三

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Comments:
Sam
Summary:
The paper proposes a novel pen-based modeless gestural interaction paradigm for mathmatics problem solving. The system recognized mathmatics expressions, associations between expressions and diagrams, and converted them to a MATLAB language. MATLAB is the background computational tool for MathPad.

Writing Expressions
Due to the difficulty on recognizing mathmatic expressions, the paper proposes a gesture-based method to assist the recognition. It is impossible to recognize the whole sketch for the system. A lasso and a tap are always used to help the recognizer find the area that to be recognized. And when users identify a recognition error, they can earse the offending symbols and rewrite them.

Making Diagrams
There are two operations about making a diagram, Nailing Diagram Components and Grouping Diagram Components.

Associations
There are two kinds of associations, explicit and implicit ones. Implicit assiociations are based on the familiar variable names and constant labels. Explicit associations are made by drawing a line and tapping on a drawing an element.

At the end of paper, the author provides an example 2D projectile motion scenario to help me understand how MathPad works.

Discussion:
Good work for free-hand mathmatic sketches. It works like a hand-draw MATLAB, the best computational tool in the world. The system is welcomed by a lot of users, as the user study shows.

What I am concerned with is how it can recognize those mathmatic symbols. However, there is very little content about it. And no recognition rate is provided in the paper. Also, the recognizer in the paper is still a user-dependent one. It means, the system should provide a recognize per user. It should be improved. My suggestion is that whether the system can provide a procedure that help a new user to train their own recognizer.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comments:
Francisco
Summary:

The paper propose a recognition method based on Bayesian conditional random fields (BCRFs), which jointly analyzes all drawing elements. The first half of the paper is to introduce how BCRFs works and how ARD (Automatic Relevance Determination) incorporates into BCRFs. Honestly speaking, the mathmatics is too hard to understand for me. Then it introduces the application to ink classification. First, subdivision of pen strokes in fragments, just like PaeloRecognizer. Second, construction of a conditional random field on the fragments. Third, training and inference on the network using BCRFs. The experiments show the efficiency of BCRF-ARD, though it costs longer time.

Discussion:

The algorithm proposed in the paper seems good, because it incorporate two good algorihtms into a better one. The experiments show that it is more efficient that other algorithms.

However, generally speaking, I cannot really understand how to apply BCRFs into sketch recognition. So the discussion may be biased. After reading, I would like to say the author writes the paper in order to write. I think such a complex method is used to solve a binary classification (container or connector). Can it be realized by PaleoRecognizer, just to find the rectangle?

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comments:
Francisco
Summary:
The paper propose a spatial recognition and grouping algorithm for graphics and symbol recognition. It can be treated as an optimization over a large space of possible groupings.

The neighborhood graph
According to the relationship between each pair of strokes, a neighborhood graph is constructed. Each node represents a stroke, and each edge represents the close proximity between two nodes. So there are a few connected subsets. Each connected subsets can produce a lot of groups.

A* based Optimization and Adaboost Recognition
The goal is to find the best grouping among all possible groupings of strokes. A* search is employed to find the best combination according to the combination cost. Adaboost is used to implement recognition on each combination

The algorithm is evaluated in HHReco sketched shape database. And it performs very well, high to 97% accuracy. It is also evaluated on a more complex set of randomly synthesized flowchars, and performs well. But the time cost is too high when recognizing complex symbols.

Discussion:
The grouping technique is very common in sketch recognition. The main problem is how to deal with the huge time cost, especially in some real-time applications. The author also encountered the same problem when recognizing complex charts.

In the paper, I think there is a situation that the algorithm cannot work. For example, symbol A is consisted by symbol B and symbol C. Actually it is very common in sketch recognition, such as COA diagrams.

2010年12月7日星期二

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comments
Francisco
Summary
The paper presents a system that separates text from graphics strokes. Different with the previous paper about entropy, this paper proposes a HMM-based method to distinguish text from graphics.

Independent Stroke Model
9 features are extracted to represent a stroke. A MLP model was trained for classification. The object function is the cross entropy error, which is defined as

Hidden Markov Model

The paper first proposes a uni-partite HMM, including a transition Matrix and a emission probability distribution over stroke features. The HMM is based on only temporal context.

uni-partite HMM

And the author propose a bi-partite HMM, including the gaps between strokes besides. Vertibi algorithm is used to find the optimal solution.

bi-partite HMM

Experiment and Result

The author tests his algorithm on Cambridge test set and Redmond test set. Both these two HMM models are better than the independent model. In Redmond test set, the bi-partite HMM model is worse than the uni-partite model.

Discussion
The paper gives us an idea about how to distinguish text from graphics. It should be not only dependent on their features, but also dependent on context. The context actually can increase the accuracy of recognition. Context seems very useful in recognition problems.

And in the experiment results, we can see that almost half of graphics still be recognized as text. So though the texts are rarely recognized as graphics, the cost is that many graphics are recognized falsely. I dont think it is very easy to distinguish text from graphics. Sometimes it is still very hard for people to distinguish.

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comments
Francisco
Summary
The paper introduces a graph-based (ARG graph) symbol recognizer.

Each symbols is represented by attributed relational graph, which describes its geometry and topology. A template or definition of a symbol is created by constructing an average ARG from lots of training samples.

A rectangle and its ARG

Error Metrics and their corresponding weights

Six error metrics are used to measure similarity. And each error metric is assigned a weight to represent its contribution when matching. Recognition is implemented by graph matching. Stochastic, error-driven, greedy and sort matching are all used to recognize symbols.

A user study is conducted to test the performance of four matching algorithms. The first three have similar performance of about 92% for top-one accuracy and about 97% for top three accuracy. Sorting, though has lower performance, is still a good choice for PDA or those devices whose computational resources are constrained.

Discussion
Graph-based symbol recognizer is very useful to deal with multi-stroke symbols, as shown in the paper. And it is also robust to rotation, translation, uniform or non-uniform scaling. But the problem is how to construct a correct graph for each symbol? In the paper, the author requires users to draw a symbol in the same order or should be similar to what he has drawn in training period, so there will be very little error when constructing the average ARG. I think it seems difficult for each user to do this. According to my experience on project #1 truss recognition, I also used a graph-based method to recognize a truss. I encountered a big problem, that users always draw a truss with some uncertain strokes. Their relationship cannot be judged easily. For graph-based method, it requires users to draw a symbol very carefully, that those connected ones must be connected.

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comments
Jonathan Hall
Summary
The paper discussed an algorithm of image-based symbol recognition for hand-drawn sketches.

The key idea is very similar to "one dollar", which collectes training sets from users and directly add them into template sets. However, the paper gives some better idea in recognition process.

Processing and Representation
The symbol is first compressed to 48*48 by quantization. Using polar coordinate instead of x-y coordinate to deal with rotation. The rotation in x-y coordinate is the translation in polar coordinate.

Pre-Recognizer
Polar transform can also be used as a pre-recognizer, which can filt out those impossible samples. It reduces a lot of computation time.

Mutliple classifiers
User four kinds of distance metrics, Hausdorff distance, Modified Hausdorff distance, Tanimoto coefficient, Yule coefficient. Each classifier make a decision, and combine four decisions to make a final decision. It can reduce inaccuracy caused by only one classifier.

User Study
The paper conducts two user studies to verify its algorithm, graphic symbol and digits. Both these two studies support its algorithm very well.

However, this algorithm is also limited in several aspects. It is sensitive to non-uniform scaling, and may wash out small image details during quantization.

Discussion
Excellent idea in employing four classifiers and combine results to give a final result. Sometimes, we don't know which classifier is good or fit for the current problem, or which classifier is good or fit for which samples. So using four classifier can really reduce the inaccuracy which usually occurs in only one classifier. I think it is a very valuable idea for me, and give me some inspiration in my final project.

However, the disadvantage is also very obvious as one dollar is. The number of templates increases very fast, because it must collect as many samples as it can. This is the main disadvantage that all "one dollar" systems have.

2010年10月17日星期日

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comment
Jonathan Hall
Summary

The paper aims to differentiate shapes and texts in hand-drawn diagrams by using entropy. The algorithm in the paper is a context-free method, which only employs the feature of both structures.

The only feature used for classification is the angle of each point. Then all angles are encoded into 7 different symbols according to the value of the angle. The information entropy of all 7 symbols is calculated, and averaged by the diagonal of the bounding box. A threshold is set to determine which are shapes, which are texts and which are unclassified strokes. Also, a measure of confidence is designed to make the algorithm easily be integrated into other systems.

The algorithm is tested on both seen datasets and unseen datasets. The result of seen datasets is much higher than Patel's, and that of unseen datasets is also higher. Also, the author employs GZIP entropy instead of zero-order entropy, and find it useful when dealing with repeated patterns.

Discussion

It is a great paper published in IJCAI 09. The idea is very creative and excellent to obtain high accuracy. The feature is more intuitive than those gesture-based ones, like Rubines. The idea works well in diagram recognition, because shapes in a diagram are always more regular than in other domains. It cannot work well in music notes, just because music notes are often not as regular as shapes in a diagram. But it is enough for us to implement in our Project 2.

Also, the author leaves some strokes as unclassified strokes, which can reduce the error rate and put these strokes into higher-level recognition. It is a interface for us to integrate other algorithms into it. As the paper says, when unclassified strokes are more, then the accuracy is higher. But we cannot leave such strokes too much.

Reading #13. Ink Features for Diagram Recognition (Plimmer)

Comment
Jonathan Hall
Summary

The paper aims to differetiate the significance of ink features for Diagram Recognition. 46 features from kinds of sources are tested in the paper to judge their importance when distinguishing shape and text. The core technique of differentiating is based on a decision tree. To realize a decision tree, the author uses rpart function in R Statistical Package. The best partitioning feature used in rpart is chosen by minimizing a measure of purity using the Gini index. The decision tree is the figure below:

The decision tree for differentiating features

Through the experiment, 8 features are found to be the most important ones. They are Time till next stroke,Speed till next stroke, Distance from last stroke, Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, Total angle.

Discussion

The paper provides some important information when recognizing diagrams. Those 8 features mentioned in the paper seem useful in differentiating shapes and texts. It may be useful in Project2, helping me to distinguish shapes and texts.

However, I think the paper is limited in several ways. First, it is just a "beginner" paper, which just gives some hints in the diagram recognition. And the decision tree can only be used in binary classification. If there is another kind of elements in the diagram, how does the decision tree work? Second, the result seems not very good. There are still 42% misclassified shapes and 21% misclassified texts. The algorithm needs more improvement.

2010年10月16日星期六

Reading #12. Constellation Models for Sketch Recognition. (Sharon)

Comment
Jonathan
Summary

The paper aims to recognize sketches by introducing constellation models from computer vision. It uses constellation models to develop probabilistic models with a multi-pass branch-and-bound search method for object sketches based on multiple example drawings. Constellation models are as the figure below.

constellation model for face sketch recognition

The core of the system is to match all parts of a new sketch with those of training examples. All parts of a sketch are divided into two categories, mandatory parts and optional parts. The mandatory parts are more important than optional parts when recognizing.

An object class model is represented by the distribution of features in object constellation models. The system adopts multivariable gaussian distribution to learn the model. The quality function of a match is defined by

Then it adopts maximum likelihood search to find the most plausible labelling for all strokes that appear in the image. Also it adopts two search phase to reduce the time cost. The first one is to label strokes that correspond only to the mandatory object parts, and the second one is to use hard constraints.

The system is tested on 5 classes of objects. No recognition rate is shown in the paper.

Discussion

It is a good idea to introduce constellation model into sketch recognition. The model is good representation of the relative information between any two parts. It will help us to recognize symbols in Project2, I guess.

But there are several problems in this article that make me skeptical of it. First, there is no recognition rate shown in the paper. Whether the algorithm is feasible can not be proved. Five classes are not a large number for sketch recognition, so the test on those classes seem unpersuasive. Second, features used in the paper seem too simple for sketch recognition. Only the centeroid, the diagonal and angle of the bounding box are not enough to recognize sketch.

Reading #11. LADDER, a sketching language for user interface developers. (Hammond)

Comment
Martin Field
Summary

LADDER, is a language to describe how sketched diagrams in a domain are drawn, displayed, and edited. It aims to build a multi-domain system that can be customized for each domain. LADDER is primarily based on shapes, which means the minimum unit is one shape. A shape includes both geometric information and other useful information, like stroke orders and directions.

The system includes recognition of primitive shapes, recognition of domain shapes, editing recognition, and constraint solver.

Recognition of primitive shapes: To recognize a stroke as an ELLIPSE, LINE, CURVE, ARC, POINT, POLYLINE or some combination using techniques.

Recognition of domain shapes: It is based on Jess rule, which searches for all possible combinations of shapes that can satisfy it. When a new primitive shape is recognized, it will be added to the system to find whether it can be combined with others. It is bottom-up.

Editing recognition: To recognize the editing gesture, in order to allow users edit shapes.

Constraint solver: To display a shape's ideal strokes by using optimazation functions on constraints.

Discussion

A great work to design a language for sketch recognition. It is an excellent idea to design a language, rather than design an algorithm. LADDER, is a more generalized system than others. It can be used in multi-domain sketch recognition. LADDER is also a framework, so it can combine with others easily.

However, as the paper says, there are still several work to improve LADDER. To some extent, the core of LADDER is the recognition of the primitive shape. It is just bottom-up, can not back-trace. So if there is an error in this stage, then this error will not be removed. Though LADDER can deal with multiple interpretation from primitive shapes, I think it is better to design a probability based recognition algorithm to solve the problem of multiple interpretation. The limitation in Section 2.1 is also to be improved.

For complicated shapes, I think it is impossible for LADDER to recognize. LADDER can only recognize what users can exactly describe. Exactly means there is only one shape according to the description.

2010年10月10日星期日

Reading #10. Graphical Input Through Machine Recognition of Sketches (Herot)

Comments
Chris Aikens
Summary

The paper is motivated by the desire to involve computers into the design process. It involves the user to make decision. The contribution of this paper is to involve the user in the system and make the system interactive. To involve the user, the system adopts some extra functions and context. These functions aim to capture users' intention, and context makes the system closed to the way of human thinking.
The paper firstly gives the introduction of their previous work-HUNCH system, introducing the low-level inference in it, like latching, overtracing and so on. The key assumption in the HUNCH is that speed is the intention of the user, however, which always causes errors in the system.

The author develops the system by introducing the high-level of inference-context and improving the low-level inference.The structure of context in HUNCH is context-free, as the figure below. The structure of context in the new system is context-based, as the figure below.

context-free

context based

The main program will see a tablet which produces not only X, Y and Z but also speed, bentness, corners, and curves. They will be used in inference-making. The scale that the user worked will be determined by other cluse, such as busyness. The main program will run in the background, without interrupting users. The low-level inference starts when the user begins to draw, while the high-level inference starts when the user stops.

Discussion

A very old paper, but some excellent ideas. Context, in my opinion, is very important in high-level recognizing. I guess, it may be the first time to introduce context in the sketch recognition. The core of it is to involve the user to make decision, which actually is now very often used in our recognition. The feedback, or backtrace seems important in the recognition, which is better than the traditional method, like bottom-up.

But there is very few details in the paper about how the context worked. Is there some methods to obtain the result from the context, like Markov? Also, I cannot really understand what the paper want to express, but I can find some excellent ideas in it.

Reading 9# PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (Paulson)

Comments

Martin Field

Summary

PaleoSketch is a new low-level recognition and beautification system that can recognize eight primitive shapes, as well as combinations of these primitives, with recognition rates at 98.56%. The contribution of PaleoSketch is few constraints on how users draw, returning multiple interpretations and recognizing more primitives than before. PaleoSketch is geometric based. The most important assumption in PaleoSketch is that all primitive shapes are completed by a single stroke.

Implementation:

(1) Pre-recognition: Remove consecutive points and calculate kinds of graphs and values (including direction graph, speed graph, curvature graph, and corners). Two new features are added. The first is normalized distance between direction extremes (NDDE), which of polyline is lower than that of curve. The second is direction change ratio (DCR), which of polyline is larger than that of curve.
(2) The recognizer conducted kinds of tests for each stroke, like line test, polyline test, ellipse test, circle test, arc test, curve test, spiral test, helix test and complex test, based on some common geometric rules.
(3) Hierarchy: Set priority for each primitive shape, and sort all interpretation according to the priority.

Result:
The accuracy is 98.56% when calculating the top interpretation as the correct one. NDDE and DCR are demonstrated to have significant effect on recognizing.

Discussion
PaleoSketch is an excellent tool in recognizing primitive shapes. Everyone should have the experience when doing Truss Recognition Project. It will help us recognizing objects not only in sketch recognition, but also in kinds of other fields of recognition. It also returns multiple interpretations, which will help us correct the result by context when the top interpretation is not right. I think it would be better if PaleoSketch can return each interpretation with its probability. So a sketch can be recognized with multiple results with probability.

2010年9月16日星期四

Reading 8#: A Lightweight Multistroke Recognizer for User Interface Prototypes

Comments:
Jonathan Hall
Summary:
$N, a good extension of $1, aims to recognize multistroke sketches. The key idea of $N is to treat multistroke as kinds of unistroke by connecting their end points together. Besides, $N can recognize the mixture of multistroke, unistroke and 1D gestures. Also, $N is able to recognize orientation-dependent and orientation-independent sketches.

$N treated a n-component multistroke sketch as 2^n*n! kinds of unistroke sketches, according to the stroke order and stroke direction. A large number of sketches were produced. However, $N adopted optimization using the start angle and the number of strokes. The start angle reduced unistroke comparison by 79% and increased accuracy by 1.3%. The number of strokes reduced an additional 10.4% comparison, and increased an additional accuracy by 1.7%.

$N solved 1D sketch recognition by ratio of sides of the bounding box. If the ratio was less than a threshold, then the sketch was scaled to preserve aspect. In order to recognize orientation-dependent and orientation-independent sketches, the system required users to flag sketches when training. For those orientation-dependent sketches, they would be rotated from their original angles, otherwise from 0.

$N still had some limitation, like lacking provisions for scale or position dependence and not recognizing sketches whose gestalt is their appeal.

$N was tested by youth from middle and high school classrooms, and got 96.6% accuracy on the algebra symbols.

Discussion:
$N is a great work,and great extension of $1. $N can recognize multistroke with only 200 lines of codes,high accuracy and high speed. It should be a milestone in multistroke-sketch recognition. Besides, tricks in dealing with orientation and 1D gestures are also very impressive.

A pity is that $N is also a writer-dependent fashion as "future work" said. An important feature of a recognizer is to have some kind of generalization and to recognize other users' sketches. $N is still be further studied in future.

I am still skeptical with "indicative angle". The indicative angle is totally determined by the centroid and the start point. If the start point has some noise, or users begin with a wrong start angle, the recognition should be affected. Maybe the indicative angle should be more robust.

Reading 7#: Sketch Based Interfaces: Early Processing for Sketch Understanding

Comments:
Jonathan Hall
Summary:
The paper is to introduce a freehand sketch recognition system that allow users to draw in an unrestricted fashion. The system is suitable for unistroke sketch, and consists of three parts, approximation, beautification, and basic recognition.

Stoke Approximation: Vertex detection is implemented by detecting the minimum speed or the maximum absolute value of curvature. With both two sources, the system can generate hybrid fits. Then, Bezier curves are used for approximating curves sketch.
Beautification: Adjust the slopes of line segments, in order to have the same slope end up being parallel.
Basic recognition: It is implemented by template matching.

Overall, the system can get 96% accuracy.

Discussion:
It is a good idea to represent sketches by vertexes. Actually, vertexes are important features for recognition, because they means that some kind of changes happened. Changes are what we are concerned. For example, we are always concerned about what teachers say more loudly.

Robust to noise? In the paper, very few contents discussed how to deal with noise. Noise, I mean, is not the noise when detecting vertexes, but the noise when drawing. Users don't always draw a totally correct sketch. Recognition techniques should be improved to be more robust to kinds of noise.

2010年9月14日星期二

Reading 6#: Protractor: A Fast and Accurate Gesture Recognizer

Comment:
Yue Li
Summary:
For personalized, gesture-based interaction, it is hard to foresee what gestures end-users will specify. And end-users always want to provide only a few samples. That is the reason why Protractor, a template-based recognizer, was designed.

Preprocess is to remove irrelevant factors. Resample and translation are for drawing speed and location, and rotation are for reducing noise in orientation.

Classification is to calculate optimal angular distances, which measure the angle between two samples in the space. To be robust, protractor always rotates templates with a extra angle. And use close-form solution to find the minimum angular distance.

Due to the close-form solution, protractor is faster than $1, which searches the minimum distance step by step.

Discussion:
Protractor, though similar to $1, is faster. It employs close-form solution instead of search step by step. However, error rates are always as same as $1. Protractor has advantages in dealing with orientation-sensitive application. So its dataset can include orientation-sensitive samples, which enlarge the dataset of $1. Actually, a good idea is Protractor.

Metric of distance is very important in classification. A good metric can give high accuracy and save much time, like Protractor. However, it seems that there is no detailed method to find which metric is the best, except trying one by one.

Template-based method is always fast, but cannot deal with some unknown gestures.
While, parametric method is always slow, but can deal with some unknown gestures, due to its ability to module the distribution of samples.

2010年9月11日星期六

Reading 5#: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Comments:
Wenzhe Li
Summary:
The paper introduces a fast, easy and accurate recognizer for unistroke gestures, which enable novice programmers to incorporate into UI.

Several advantages of $1 are described, such as
1.resilience to variation in sampling
2.rotation, scale and position variance
3.teach new gestures with only 1 gesture
4.more accurate than DTW and Rubine
while disadvantages are
1. cannot distinguish gestures whose identities depend on specific orientation
2. cannot distinguish gestures whose identities depend on speed

$1 consists of 4 steps:
1. resample the point path to form a fixed number of equidistantly spaced points
2. rotate based on indicative angle, which is the angle between the first point and the centroid point.
3. scale to a square and translate the centroid to the original point.
4. find the optimal angle for the best score.

Finally, the author makes comparison among $1, DTW and Rubine.
1. $1 and DTW are significantly more accurate than Rubine.
2. $1 and DTW improve very slightly by the increase of the number of training examples, while Rubine a lot.
3. All three are affected similarly by the articulation speed.
4. $1 has the best falloff, DTW second, and Rubine third.
5. $1 has the fastest speed, Rubine second, and DTW third.

Discussion:
Is $1 really better than others? Probably not. $1 is only better in the dataset described in the paper. Different recognizers for different datasets. Any algorithm can be the best one if your want. The author spend 2 pages on demonstrating the advantage of $1, which seems useless.

However, I still appreciate $1, due to its easiness and fast speed. It is suitable to novice programmers, especially those who are not familiar with sketch recognition. In this view, $1 is a great work.

2010年9月9日星期四

Reading 4#: Sketchpad A man-machine graphical communication system

Comments:
Wenzhe Li
Summary:
The paper is a detailed introduction of Sketchpad.

The design of Sketchpad is based on a ring structure, which makes it easy to find the next and the previous member. Then, it introduced how it works. How to display, such as display lines, circles, digits, text and so on. Some general recursive functions are very important in the Sketchpad, such as expanding, deleting and merging. All these functions are to operate all related elements recursively.

An important idea in the Sketchpad is the "copy" function, which makes drawing more easily. However, the disadvantage of copy function is that it can only copy a instance as a whole. Besides, another core of Sketchpad is constraint satisfaction, which is far more different with pencil and paper. Users can design some constraints for their drawing and make Sketchpad satisfy. It can help users draw very easily.

At that time, Sketchpad can used for linkage, bridge design and drawing, except electrical circuit diagram.

Discussion:
Generally speaking, Sketchpad was a milestone at that time. I cannot imagine such a machine could be designed. Sketchpad, was equally important with the first computer, in my opinion. It seems a ancestor of several design software, like CAD, Pspice. And it is also the ancestor of Sketch Recognition.

I am very impressive of its function about constraint satisfaction. I think it is the most important feature of Sketchpad. Even someone who is not good at drawing, can draw a beautiful picture through setting up some constraint and making Sketchpad satisfy. Drawing is not as difficult as before.

2010年9月6日星期一

Reading 3# : "Those Look Similar!" Issues in Automating Gesture Design Advice

Comments:
Yue Li
Summary:
A user interface design tool, QUILL, was designed to give unsolicited advice (active feedback) to help designers create and improve gestures. The advice was given based on similarity metrics of Rubine.

Interface challenges, implementation challenges and similarity metrics challenges that authors encountered in QUILL were all discussed in the paper. Long discussed the time of advice, the length and frequency of advice, the content of advice, background analysis, advice for hierarchies and similarity metrics. All his design was in the point of users' view and desired to make users more comfortable. For example, Quill always gave a concise message with a hyperlink to find more details.

Long also gave the future work of quill, such as collecting more date to improve the similarity model etc.
Discussion:
The paper mainly discussed the design of an automating system. Many challenges that they encountered in the design will also occur in our design. Advice on how to deal with these challenges seems beneficial to designers, especially to novice designers. But I expect more details on some core problems, such as similarity metrics and the analysis about the performance of Rubine's recognizer.

Besides, maybe the system could be better by giving other options of recognizers besides Rubine. Users can find the best recognizer for their own gesture datasets. It will make quill more popular.

2010年9月5日星期日

Reading 2# Specifying Gestures by Example (Rubine)

Comments:
chris aikens
Summary:
The paper introduced a gesture recognition system called GRANDMA, Gesture Recognizer Automated in a Novel Direct Manipulation Architecture.

Firstly, it introduced the design of GRANDMA, such as how to create new gestures, how to delete gestures, how to edit gestures' semantics, etc. It made us aware of how GRANDMA worked.

Secondly, it described the principle of the recognizer. Gestures were recognized by a linear classifier with 13 features including angle, bounding rectangle,etc. The decision was made by finding the maximal value of similarity. "Rejection" was introduced to prevent ambiguous results. The recognizer can run in real time with high accuracy.

Finally, extensions were discussed, such as eager recognition, multi-finger recognition.

Discussion:
In general, GRANDMA should be a milestone in Rubine's time. An automated system with high accuracy but little runtime was a great work in the field of gesture recognition. And it was able to get high accuracy in several datasets.

Recognizer: Accuracy rate decreases by increase of the gesture class. Overlearning will occur due to the linear classifier. Linear classifier is fast but limited to the number of class. The classifier only spends less than 20ms to recognize. Maybe the classifier should be more complex to include more classes.

Eager Recognition: Eager Recognition makes GRANDMA more intelligent, and makes people more satisfied with GRANDMA. However, no free lunch. Eager Recognition limits the scope of gesture class to some extent.

订阅：博文 (Atom)