Sketch Recognition - CSE 624: 九月 2010

2010年9月16日星期四

Reading 8#: A Lightweight Multistroke Recognizer for User Interface Prototypes

Comments:
Jonathan Hall
Summary:
$N, a good extension of $1, aims to recognize multistroke sketches. The key idea of $N is to treat multistroke as kinds of unistroke by connecting their end points together. Besides, $N can recognize the mixture of multistroke, unistroke and 1D gestures. Also, $N is able to recognize orientation-dependent and orientation-independent sketches.

$N treated a n-component multistroke sketch as 2^n*n! kinds of unistroke sketches, according to the stroke order and stroke direction. A large number of sketches were produced. However, $N adopted optimization using the start angle and the number of strokes. The start angle reduced unistroke comparison by 79% and increased accuracy by 1.3%. The number of strokes reduced an additional 10.4% comparison, and increased an additional accuracy by 1.7%.

$N solved 1D sketch recognition by ratio of sides of the bounding box. If the ratio was less than a threshold, then the sketch was scaled to preserve aspect. In order to recognize orientation-dependent and orientation-independent sketches, the system required users to flag sketches when training. For those orientation-dependent sketches, they would be rotated from their original angles, otherwise from 0.

$N still had some limitation, like lacking provisions for scale or position dependence and not recognizing sketches whose gestalt is their appeal.

$N was tested by youth from middle and high school classrooms, and got 96.6% accuracy on the algebra symbols.

Discussion:
$N is a great work,and great extension of $1. $N can recognize multistroke with only 200 lines of codes,high accuracy and high speed. It should be a milestone in multistroke-sketch recognition. Besides, tricks in dealing with orientation and 1D gestures are also very impressive.

A pity is that $N is also a writer-dependent fashion as "future work" said. An important feature of a recognizer is to have some kind of generalization and to recognize other users' sketches. $N is still be further studied in future.

I am still skeptical with "indicative angle". The indicative angle is totally determined by the centroid and the start point. If the start point has some noise, or users begin with a wrong start angle, the recognition should be affected. Maybe the indicative angle should be more robust.

Reading 7#: Sketch Based Interfaces: Early Processing for Sketch Understanding

Comments:
Jonathan Hall
Summary:
The paper is to introduce a freehand sketch recognition system that allow users to draw in an unrestricted fashion. The system is suitable for unistroke sketch, and consists of three parts, approximation, beautification, and basic recognition.

Stoke Approximation: Vertex detection is implemented by detecting the minimum speed or the maximum absolute value of curvature. With both two sources, the system can generate hybrid fits. Then, Bezier curves are used for approximating curves sketch.
Beautification: Adjust the slopes of line segments, in order to have the same slope end up being parallel.
Basic recognition: It is implemented by template matching.

Overall, the system can get 96% accuracy.

Discussion:
It is a good idea to represent sketches by vertexes. Actually, vertexes are important features for recognition, because they means that some kind of changes happened. Changes are what we are concerned. For example, we are always concerned about what teachers say more loudly.

Robust to noise? In the paper, very few contents discussed how to deal with noise. Noise, I mean, is not the noise when detecting vertexes, but the noise when drawing. Users don't always draw a totally correct sketch. Recognition techniques should be improved to be more robust to kinds of noise.

2010年9月14日星期二

Reading 6#: Protractor: A Fast and Accurate Gesture Recognizer

Comment:
Yue Li
Summary:
For personalized, gesture-based interaction, it is hard to foresee what gestures end-users will specify. And end-users always want to provide only a few samples. That is the reason why Protractor, a template-based recognizer, was designed.

Preprocess is to remove irrelevant factors. Resample and translation are for drawing speed and location, and rotation are for reducing noise in orientation.

Classification is to calculate optimal angular distances, which measure the angle between two samples in the space. To be robust, protractor always rotates templates with a extra angle. And use close-form solution to find the minimum angular distance.

Due to the close-form solution, protractor is faster than $1, which searches the minimum distance step by step.

Discussion:
Protractor, though similar to $1, is faster. It employs close-form solution instead of search step by step. However, error rates are always as same as $1. Protractor has advantages in dealing with orientation-sensitive application. So its dataset can include orientation-sensitive samples, which enlarge the dataset of $1. Actually, a good idea is Protractor.

Metric of distance is very important in classification. A good metric can give high accuracy and save much time, like Protractor. However, it seems that there is no detailed method to find which metric is the best, except trying one by one.

Template-based method is always fast, but cannot deal with some unknown gestures.
While, parametric method is always slow, but can deal with some unknown gestures, due to its ability to module the distribution of samples.

2010年9月11日星期六

Reading 5#: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes

Comments:
Wenzhe Li
Summary:
The paper introduces a fast, easy and accurate recognizer for unistroke gestures, which enable novice programmers to incorporate into UI.

Several advantages of $1 are described, such as
1.resilience to variation in sampling
2.rotation, scale and position variance
3.teach new gestures with only 1 gesture
4.more accurate than DTW and Rubine
while disadvantages are
1. cannot distinguish gestures whose identities depend on specific orientation
2. cannot distinguish gestures whose identities depend on speed

$1 consists of 4 steps:
1. resample the point path to form a fixed number of equidistantly spaced points
2. rotate based on indicative angle, which is the angle between the first point and the centroid point.
3. scale to a square and translate the centroid to the original point.
4. find the optimal angle for the best score.

Finally, the author makes comparison among $1, DTW and Rubine.
1. $1 and DTW are significantly more accurate than Rubine.
2. $1 and DTW improve very slightly by the increase of the number of training examples, while Rubine a lot.
3. All three are affected similarly by the articulation speed.
4. $1 has the best falloff, DTW second, and Rubine third.
5. $1 has the fastest speed, Rubine second, and DTW third.

Discussion:
Is $1 really better than others? Probably not. $1 is only better in the dataset described in the paper. Different recognizers for different datasets. Any algorithm can be the best one if your want. The author spend 2 pages on demonstrating the advantage of $1, which seems useless.

However, I still appreciate $1, due to its easiness and fast speed. It is suitable to novice programmers, especially those who are not familiar with sketch recognition. In this view, $1 is a great work.

2010年9月9日星期四

Reading 4#: Sketchpad A man-machine graphical communication system

Comments:
Wenzhe Li
Summary:
The paper is a detailed introduction of Sketchpad.

The design of Sketchpad is based on a ring structure, which makes it easy to find the next and the previous member. Then, it introduced how it works. How to display, such as display lines, circles, digits, text and so on. Some general recursive functions are very important in the Sketchpad, such as expanding, deleting and merging. All these functions are to operate all related elements recursively.

An important idea in the Sketchpad is the "copy" function, which makes drawing more easily. However, the disadvantage of copy function is that it can only copy a instance as a whole. Besides, another core of Sketchpad is constraint satisfaction, which is far more different with pencil and paper. Users can design some constraints for their drawing and make Sketchpad satisfy. It can help users draw very easily.

At that time, Sketchpad can used for linkage, bridge design and drawing, except electrical circuit diagram.

Discussion:
Generally speaking, Sketchpad was a milestone at that time. I cannot imagine such a machine could be designed. Sketchpad, was equally important with the first computer, in my opinion. It seems a ancestor of several design software, like CAD, Pspice. And it is also the ancestor of Sketch Recognition.

I am very impressive of its function about constraint satisfaction. I think it is the most important feature of Sketchpad. Even someone who is not good at drawing, can draw a beautiful picture through setting up some constraint and making Sketchpad satisfy. Drawing is not as difficult as before.

2010年9月6日星期一

Reading 3# : "Those Look Similar!" Issues in Automating Gesture Design Advice

Comments:
Yue Li
Summary:
A user interface design tool, QUILL, was designed to give unsolicited advice (active feedback) to help designers create and improve gestures. The advice was given based on similarity metrics of Rubine.

Interface challenges, implementation challenges and similarity metrics challenges that authors encountered in QUILL were all discussed in the paper. Long discussed the time of advice, the length and frequency of advice, the content of advice, background analysis, advice for hierarchies and similarity metrics. All his design was in the point of users' view and desired to make users more comfortable. For example, Quill always gave a concise message with a hyperlink to find more details.

Long also gave the future work of quill, such as collecting more date to improve the similarity model etc.
Discussion:
The paper mainly discussed the design of an automating system. Many challenges that they encountered in the design will also occur in our design. Advice on how to deal with these challenges seems beneficial to designers, especially to novice designers. But I expect more details on some core problems, such as similarity metrics and the analysis about the performance of Rubine's recognizer.

Besides, maybe the system could be better by giving other options of recognizers besides Rubine. Users can find the best recognizer for their own gesture datasets. It will make quill more popular.

2010年9月5日星期日

Reading 2# Specifying Gestures by Example (Rubine)

Comments:
chris aikens
Summary:
The paper introduced a gesture recognition system called GRANDMA, Gesture Recognizer Automated in a Novel Direct Manipulation Architecture.

Firstly, it introduced the design of GRANDMA, such as how to create new gestures, how to delete gestures, how to edit gestures' semantics, etc. It made us aware of how GRANDMA worked.

Secondly, it described the principle of the recognizer. Gestures were recognized by a linear classifier with 13 features including angle, bounding rectangle,etc. The decision was made by finding the maximal value of similarity. "Rejection" was introduced to prevent ambiguous results. The recognizer can run in real time with high accuracy.

Finally, extensions were discussed, such as eager recognition, multi-finger recognition.

Discussion:
In general, GRANDMA should be a milestone in Rubine's time. An automated system with high accuracy but little runtime was a great work in the field of gesture recognition. And it was able to get high accuracy in several datasets.

Recognizer: Accuracy rate decreases by increase of the gesture class. Overlearning will occur due to the linear classifier. Linear classifier is fast but limited to the number of class. The classifier only spends less than 20ms to recognize. Maybe the classifier should be more complex to include more classes.

Eager Recognition: Eager Recognition makes GRANDMA more intelligent, and makes people more satisfied with GRANDMA. However, no free lunch. Eager Recognition limits the scope of gesture class to some extent.

2010年9月2日星期四

Reading 1# : Gesture Recognition

Comments:
Marty Field
Summary:
Gesture recognition can relate to sketch recognition, only when sketches are drawn in the same manner every time. Then the paper surveys three common technologies on gesture recognition, Rubine, Long, Wobbrock 1$. It introduces the definition of features, why to use those features, figures of how to calculate features, how to build classifier, and the accuracy rate respectively. Also it makes some comparison among these three methods. Long introduced 11 new features, but didn't get better result. Wobbrock got better result by his $1 recognizer, but the time cost in recognition process was larger. Besides, it need more templates if sketches could be drawn in different ways.

Discussion:
In general, it is a very good survey to help beginners understand why and how gesture recognition is able to be used in sketch recognition.

About features: For some orientation-independent sketch recognition, orientation-independent features should be very useful, such as Hu invariant moments, which are robust to scale, rotation, translation and so on. I dont know whether they are applicable to gesture recognition, but I think they should be.

About time stamp: I think time stamp is very important in sketch recognition. For example, it is very useful to determine stroke orders. And all these three methods in the paper ignore time stamp to some extent.(not totally ignore) They simplify a 3-D problem to a 2-D problem, so that it will result in some inaccuracy in recognition. I think we may try to solve problems in 3-D coordinate. Maybe it is time-consuming, or maybe problems are able to solve in 2-D coordinate.

Questions
1
The advantage of deleting the first or the second point is to get exact rotation angle without "divided by zero" error.

The disadvantage of deleting the first or the second point is to affect the calculation of the maximum speed. If the first point is deleted, the speed from the previous point of the first to the first will change to the speed from the previous point of the first to the second one. The time lasts longer, so the speed becomes smaller. So is the situation with deleting the second point. However, if the first point and the second point have the same time stamp, there will be no affection.
2.
The advantage of removing the first point or the second one is that we can still get the approximately max speed, while the disadvantage of that is to affect some features relating to rotation angle (such as the sum of rotation angle ).

The advantage of altering the time stamp is to get the approximate speed, while the disadvantage is to produce huge speed which is from the first point to the second point.
I think the best method maybe set a threshold and make Dtp = t(p)-t(p-1)+dtp. dtp is a very small time interval. If the max speed is more than the threshold, it should be caused by the same time stamp.

3 7-f 8-c 6-a 5-b 2-d 1-g 3-h 4-e

4 I think Long used the spirit of "density" to calculate kinds of density. Density here is in the wide definition. It can build kinds of distribution of the sketch in its area, in order to get more exact description of the sketch.

订阅：博文 (Atom)