Sketch Recognition - CSE 624: 十二月 2010

2010年12月12日星期日

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams (Hammond)

Comments:
Wenzhe Li
Summary:

The paper presents a a dual-view, multi-stroke sketch recognition environment for class diagrams in UML, called Tahuti. It is a geometry-based method to give users more freedom to draw and edit.

The multi-layer framework includes processing, selection, recognition, identification. The main idea is to find any possible collection of strokes, then recognize and identify. In order to reduce the burden of grouping, the framework rules out a lot of impossible collections by setting the maximum number of strokes in a collection and setting some restriction on each stroke.

Then the author introduces some recognition method for basic shape, like rectangle, ellipse.

The experiment shows that Tahuti is the most welcomed by users compared with other systems.

Discussion:
The idea of grouping is really good. It does not require users to draw an object in a specific manner. Also, the author find some methods to reduce the amount of collections, which allow the computer to run the program in real time. I see the same idea in the CivilSketch code.

However, in this paper, there is no accuracy about recognition rate. We do not know whether the system really works well. There should be a test on its geometric method.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces (Harrison)

Comments:
Chris
Summary:
The paper presents an acoustic-based finger input system, called Scratch Input, that can be used to create large, inexpensive and mobile finger input surfaces. The system is easy and convenient to carry, and can be used in desk, wall, mobile phone and etc. Only one microphone is used to record sounds.

The recognizer employed in the system is to recognize gestures by their sounds. Peak counts and amplitude is extracted from the sound of each gesture, and a shallow decision tree is used to make a decision. The system is tested by six gestures, and the accuracy is near 89%.
Discussion:
An interesting topic to recognize gestures by sounds. In the paper, authors give us several examples of the application of their system. However, I find the system seems too trivial. First, gestures are too different. Six gesture may be easily differentiated by peak counts due to different strokes. Second, the vocabulary is limited, because too many gestures have the same sound. Third, the order of strokes is very important to the system, and only specified order can be recognized. Finally, the system need a quiet environment. Though easy and convenient, the system really has many limitations.

Reading #28: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces (Dixon)

Comments:
Wenzhe Li
Summary:
The paper presents a system, called iCanDraw to teach novice drawers how to draw a person's face. It can provide directions and feedbacks for users to help draw a face as accurately as possible. It is important, because users cannot always find a instructor to teach and correct them.

The system starts with a reference image. This image is recognized by a face recognition technique, and refined by authors in order to make the template as good as possible. The users look at the image, create reference lines, and then draw a face. Users can check whether their drawing is consistent with the template at any time, and get corrective feedback.

User studies show the good performance of the system.
Discussion:
It is a great work to teach novice drawers how to draw a face when an instructor is not available. What I appreciate more is the section of corrective feedback. In my opinion, the most important part in human-computer interaction is the feedback. The system can really give a good feedback to assist drawers to draw a beautiful image as the user study says.

However, the system is only limited to drawing a face, due to the mature techniques on face recognition. It seems not easy to extend to other pictures. I have a idea whether can allow users draw reference line themselves? Draw in the template and display in both the template and the drawing area.

Reading #27: K-sketch: A 'Kinetic' Sketch Pad for Novice Animators (Davis)

Comments:
Sam
Summary:
The paper proposes a general-purpose, informal, 2D animation sketching system, called K-Sketch, to help novices create a wide range of animations quickly.

The system began with a lot of user studies, interview with animators and non-animators, which demonstrated the importance of designing a informal animation tool with little time to learn or use. 18 animation operations were proposed by those users. The goal of the system is not only fast, but also powerful. So the system selected 9 operations, which can meet most requirements of users. The system is implemented in C#.

The system is evaluated by three small user studies. All studies indicated that K-Sketch is stronger than PowerPoint in many aspects, except comfort sharing.
Discussion:
It is a good paper to help novice researcher, like me, to conduct a research on human-computer interaction based sketching system. First, conduct a study to know people's requirements. Second, design a system and make a trade-off between functions and computational time. Final, conduct user studies to compare the performance with other tools.

However, for this paper, I treat it more as a technical report, rather than a conference paper. I think for a conference paper, there should be some new ideas. But the paper includes more implementations rather than ideas.

2010年12月11日星期六

Reading #26: Picturephone: A Game for Sketch Data Capture (Johnson)

Comments:
Francisco
Summary:
The paper proposes a sketch-based game, Picturephone, for collecting data on how people amke and describe sketches. It is inspired by a children's game called Telephone. There are three modes, draw, describe and rate. Each user will be randomly assigned a mode to involve into the game. In draw mode, users will be asked to draw a sketch based on the description. In describe mode, users will be asked to give a description based on the sketch. In rate mode, users will be asked to giave a point to each pair of sketches.

Picturephone is a web-based application, using the standard HTTP protocol. The main adavantage is that it doesn't require all users play the game synchronously.
Discussion:
A good application of hand-drawn sketch. The idea of the game is really good. Compared to Stellasketch, I think the asynchronous game is better. It is really hard to let a lot users play a game at the same time unless the game has been as popular as Chess, Poker. To be honest, it is impossible for such a game.

Also, what I am concerned more is show in discussion of Reading 24, how to use these data. I think there are still a long way to develop a recognizer to use these sketches as examples. How to filter out dirty data, how to remove ambiguity and conflict is still a main topic for these games.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines (Eitz)

Comments:
Chris
Summary:
The paper presents a tensor-based descriptor for large scale image retrieval based on sketched feature lines. The descriptor is used to search an image in the database, which is similar to the input sketch. It solves the problem of asymmetry between the binary sketch input and the full color image.

The proposed tensor descriptor provides the information about the main orientation of the gradients in a cell. The descriptor is tested by a set of 1.5 million pictures related to outdoor sceneries. It performs comparably or slightly better than the MPEG-7 edge histogram descriptor variant. And it is easy to implement and efficient in evaluation.

Discussion:
It is a good idea to search an image from a database by an input sketch. Sketch based image retrieval is also another direction in the field of sketch. The descriptor proposed in the paper is simple to implement and better than another descriptor. However, there is no extra comparison between tensor and others, so I have no idea about the performance of the descriptor. And in the experiments, an input sketch can always find a lot of candidate pictures, some of which seems unrelated to the input. So there should be other descriptors to be added to make an efficient retrieval. Also, the descriptor has some limitation in transformation, which need improvement in future.

Reading #24: Games for Sketch Data Collection (Johnson)

Comments:
Kim
Summary:
The paper presents two games for sketch data collection. One is a asynchronous game called Picturephone, the other is a synchronous game called Stellasketch. Both these two games are web-based, and need a lot of users to participate.

Picturephone collects long sentences that describe sketches. Each user is randomly assigned to one of three modes (draw, describe and rate). In draw mode, users draw a sketch based on the description. In describe mode, users describe the sketch. In rate mode, users give a score to each pair of sketches.

Stellasketch gathers short noun-phrase that label sketches as they are made. Each round, one user draw a sketch based on the nouns, and other users describe the sketch by noun-phrase. Each person doesn't know other's job.

Discussion:
These two games are only used to collect users' sketches. The ideas of these two games are really good, because it captures data when people are entertaining. They are a good application of human-computer interaction.

However, as the author says, there are still some difficulties on how to use these sketches. Some data is with much noise, some data is with extra strokes, and some data even is consisted of unrelated sketch. The quality of data cannot be guaranteed. Also, how to use these data to train recognizers is also very hard.

I appreciate the idea of collecting data by games, but I don't think there is much more adavantages than the normal method, such as user study.

Reading #23: InkSeine: In Situ Search for Active Note Taking (Hinckley)

Comments:
Francisco
Summary:
The paper presents a pen-based active note taking system, called InkSeine. It offers rapid, minimally distracting interactions for users to seek, gather and manipualte the "task detritus" of electronic work. It has several important design properties: levereages preexisting ink to initiate search, promotes queries as first class objects that are commingled with ink notes, interleaves inking, searching and gathering, tightly couples queries with application content.

The design goal of the system is as follows:
In situ search experience: No switch, shift or force transcription
Optimum workflow or maximum flexibility: Allow interruption at any point, and can resume from where it left off.
Enable rich trade-offs: Lower computional time
Gather content: allow users to gather beneficial task detritus
Minimize search screen real estate: result returned by search should be smaller than the screen
Span application boundaries: enable access to information from a variety of sources
Tailored to pen input: easy gestures

Then the paper introduce all components of the system in detail. And user studies are also conducted to help improve the system.

Discussion:
Excellent work for note taking in sketch recognition! A promising direction in sketch recognition. The system provides complex interface that almost supported kinds of operations for users. And the design of the system is from users' view and verified by user studies. However, what I am concerned is whether there is a robust handwritting recognizer to support such a system. The paper doesn't give any information about the recognition accuracy of the recognizer. Maybe the paper is just to provide the excellent idea of an human-interaction interface.

Reading #22: Plushie: An Interactive Design System for Plush Toys (Mori)

Comments:
Kim
Summary:
The paper presents an interactive design system to help users design 3D plush toys. The system creates a 3D plush toy model from scratch by simply drawing its desired silhouette. The 3D model is associated with a 2D pattern, and it is the result of a physical simulation than mimics the inflation effect caused by stuffing. The system uses simple iterative adjustment method to implement the physical simulation.

In the user interface, the system provides a lot of functions for users, such as creating a new model, cut, creation of a part, pull and insertion and deletion of seam lines.

The author conducted several user studies. The system was first supported by professional balloon designers. And novice users also fount it easy to learn.

Discussion:
Another good paper for 3D design by 2D sketching. After reading these two papers, I find it is really a good direction for sketch recognition, maybe sketch rendering. In computer vision, rendering is a good direction, so I think sketch rendering may be also good for research.

In this paper, the system used very simple algorithm to convert 2d sketching to 3d toys. Time cost is also very important for rendering. In computer vision, 3D reconstruction always need a large amount of time. But in these two papers, they all avoid some uncommon situations and reduce the burden of computation. Although limited by some situations, these two systems both work well in user studies.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (Igarashi)

Comments:
Kim
Summary:

The paper proposes a sketching interface for 3D freeform design, called Teddy. Users draw 2D strokes to construct the silhouette for the object, and the system automatically convert them into 3D polygonal surface. The whole process for the user is very easy. According to the user study in the paper, a first-time user can create their own models fluently within 10 minutes.

How Teddy works

There are kinds of operations in the system, creating a new object, painting and erasing on the surface, extrusion, cutting, smoothing and transformation. The core algorithm is to use a standard polygonal mesh to represent a 3D picture. The system is robust and efficient enough for experimental use, but it can fail ro generate unintuitive results when users draw unexpected strokes.

Discussion:

An interesting topic of 3D rendering and reconstruction. It is the first time for me to contact with the topic, which seems very interesting to me. According to my knowledge, 3D rendering is always complex, but in the paper the author proposes an easy and quick algorithm to realize it. The key to make an easy one is to create some reference strokes when drawing, in my opinion. Those reference strokes reduce the big burden for the system to calculate large amount of possibilities. A 2D picture always correpsonds to several 3D pictures.

2010年12月8日星期三

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches (LaViola)

Comments:
Sam
Summary:
The paper proposes a novel pen-based modeless gestural interaction paradigm for mathmatics problem solving. The system recognized mathmatics expressions, associations between expressions and diagrams, and converted them to a MATLAB language. MATLAB is the background computational tool for MathPad.

Writing Expressions
Due to the difficulty on recognizing mathmatic expressions, the paper proposes a gesture-based method to assist the recognition. It is impossible to recognize the whole sketch for the system. A lasso and a tap are always used to help the recognizer find the area that to be recognized. And when users identify a recognition error, they can earse the offending symbols and rewrite them.

Making Diagrams
There are two operations about making a diagram, Nailing Diagram Components and Grouping Diagram Components.

Associations
There are two kinds of associations, explicit and implicit ones. Implicit assiociations are based on the familiar variable names and constant labels. Explicit associations are made by drawing a line and tapping on a drawing an element.

At the end of paper, the author provides an example 2D projectile motion scenario to help me understand how MathPad works.

Discussion:
Good work for free-hand mathmatic sketches. It works like a hand-draw MATLAB, the best computational tool in the world. The system is welcomed by a lot of users, as the user study shows.

What I am concerned with is how it can recognize those mathmatic symbols. However, there is very little content about it. And no recognition rate is provided in the paper. Also, the recognizer in the paper is still a user-dependent one. It means, the system should provide a recognize per user. It should be improved. My suggestion is that whether the system can provide a procedure that help a new user to train their own recognizer.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields (Qi)

Comments:
Francisco
Summary:

The paper propose a recognition method based on Bayesian conditional random fields (BCRFs), which jointly analyzes all drawing elements. The first half of the paper is to introduce how BCRFs works and how ARD (Automatic Relevance Determination) incorporates into BCRFs. Honestly speaking, the mathmatics is too hard to understand for me. Then it introduces the application to ink classification. First, subdivision of pen strokes in fragments, just like PaeloRecognizer. Second, construction of a conditional random field on the fragments. Third, training and inference on the network using BCRFs. The experiments show the efficiency of BCRF-ARD, though it costs longer time.

Discussion:

The algorithm proposed in the paper seems good, because it incorporate two good algorihtms into a better one. The experiments show that it is more efficient that other algorithms.

However, generally speaking, I cannot really understand how to apply BCRFs into sketch recognition. So the discussion may be biased. After reading, I would like to say the author writes the paper in order to write. I think such a complex method is used to solve a binary classification (container or connector). Can it be realized by PaleoRecognizer, just to find the rectangle?

Reading #18: Spatial Recognition and Grouping of Text and Graphics (Shilman)

Comments:
Francisco
Summary:
The paper propose a spatial recognition and grouping algorithm for graphics and symbol recognition. It can be treated as an optimization over a large space of possible groupings.

The neighborhood graph
According to the relationship between each pair of strokes, a neighborhood graph is constructed. Each node represents a stroke, and each edge represents the close proximity between two nodes. So there are a few connected subsets. Each connected subsets can produce a lot of groups.

A* based Optimization and Adaboost Recognition
The goal is to find the best grouping among all possible groupings of strokes. A* search is employed to find the best combination according to the combination cost. Adaboost is used to implement recognition on each combination

The algorithm is evaluated in HHReco sketched shape database. And it performs very well, high to 97% accuracy. It is also evaluated on a more complex set of randomly synthesized flowchars, and performs well. But the time cost is too high when recognizing complex symbols.

Discussion:
The grouping technique is very common in sketch recognition. The main problem is how to deal with the huge time cost, especially in some real-time applications. The author also encountered the same problem when recognizing complex charts.

In the paper, I think there is a situation that the algorithm cannot work. For example, symbol A is consisted by symbol B and symbol C. Actually it is very common in sketch recognition, such as COA diagrams.

2010年12月7日星期二

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (Bishop)

Comments
Francisco
Summary
The paper presents a system that separates text from graphics strokes. Different with the previous paper about entropy, this paper proposes a HMM-based method to distinguish text from graphics.

Independent Stroke Model
9 features are extracted to represent a stroke. A MLP model was trained for classification. The object function is the cross entropy error, which is defined as

Hidden Markov Model

The paper first proposes a uni-partite HMM, including a transition Matrix and a emission probability distribution over stroke features. The HMM is based on only temporal context.

uni-partite HMM

And the author propose a bi-partite HMM, including the gaps between strokes besides. Vertibi algorithm is used to find the optimal solution.

bi-partite HMM

Experiment and Result

The author tests his algorithm on Cambridge test set and Redmond test set. Both these two HMM models are better than the independent model. In Redmond test set, the bi-partite HMM model is worse than the uni-partite model.

Discussion
The paper gives us an idea about how to distinguish text from graphics. It should be not only dependent on their features, but also dependent on context. The context actually can increase the accuracy of recognition. Context seems very useful in recognition problems.

And in the experiment results, we can see that almost half of graphics still be recognized as text. So though the texts are rarely recognized as graphics, the cost is that many graphics are recognized falsely. I dont think it is very easy to distinguish text from graphics. Sometimes it is still very hard for people to distinguish.

Reading #16: An Efficient Graph-Based Symbol Recognizer (Lee)

Comments
Francisco
Summary
The paper introduces a graph-based (ARG graph) symbol recognizer.

Each symbols is represented by attributed relational graph, which describes its geometry and topology. A template or definition of a symbol is created by constructing an average ARG from lots of training samples.

A rectangle and its ARG

Error Metrics and their corresponding weights

Six error metrics are used to measure similarity. And each error metric is assigned a weight to represent its contribution when matching. Recognition is implemented by graph matching. Stochastic, error-driven, greedy and sort matching are all used to recognize symbols.

A user study is conducted to test the performance of four matching algorithms. The first three have similar performance of about 92% for top-one accuracy and about 97% for top three accuracy. Sorting, though has lower performance, is still a good choice for PDA or those devices whose computational resources are constrained.

Discussion
Graph-based symbol recognizer is very useful to deal with multi-stroke symbols, as shown in the paper. And it is also robust to rotation, translation, uniform or non-uniform scaling. But the problem is how to construct a correct graph for each symbol? In the paper, the author requires users to draw a symbol in the same order or should be similar to what he has drawn in training period, so there will be very little error when constructing the average ARG. I think it seems difficult for each user to do this. According to my experience on project #1 truss recognition, I also used a graph-based method to recognize a truss. I encountered a big problem, that users always draw a truss with some uncertain strokes. Their relationship cannot be judged easily. For graph-based method, it requires users to draw a symbol very carefully, that those connected ones must be connected.

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (Kara)

Comments
Jonathan Hall
Summary
The paper discussed an algorithm of image-based symbol recognition for hand-drawn sketches.

The key idea is very similar to "one dollar", which collectes training sets from users and directly add them into template sets. However, the paper gives some better idea in recognition process.

Processing and Representation
The symbol is first compressed to 48*48 by quantization. Using polar coordinate instead of x-y coordinate to deal with rotation. The rotation in x-y coordinate is the translation in polar coordinate.

Pre-Recognizer
Polar transform can also be used as a pre-recognizer, which can filt out those impossible samples. It reduces a lot of computation time.

Mutliple classifiers
User four kinds of distance metrics, Hausdorff distance, Modified Hausdorff distance, Tanimoto coefficient, Yule coefficient. Each classifier make a decision, and combine four decisions to make a final decision. It can reduce inaccuracy caused by only one classifier.

User Study
The paper conducts two user studies to verify its algorithm, graphic symbol and digits. Both these two studies support its algorithm very well.

However, this algorithm is also limited in several aspects. It is sensitive to non-uniform scaling, and may wash out small image details during quantization.

Discussion
Excellent idea in employing four classifiers and combine results to give a final result. Sometimes, we don't know which classifier is good or fit for the current problem, or which classifier is good or fit for which samples. So using four classifier can really reduce the inaccuracy which usually occurs in only one classifier. I think it is a very valuable idea for me, and give me some inspiration in my final project.

However, the disadvantage is also very obvious as one dollar is. The number of templates increases very fast, because it must collect as many samples as it can. This is the main disadvantage that all "one dollar" systems have.

订阅：博文 (Atom)