Interactive Whiteboards
with Sketch-Interpreting
Software
February 22, 2010
Science
writers know as well as anyone how much information a diagram can
contain. We often labor to express in words what a researcher was able
to convey in a single image.
But while a drawing can be rich in information, it's information that's
usually inaccessible to computers. If you draw a diagram on the screen
of a tablet computer, like the new Apple iPad, the computer can of
course store the drawing as an image. But it can't tell what the image
means.
MIT researchers intend to change that, with a new system that can
interpret sketches. If a chemist, for example, uses a stylus — an
inkless plastic pen — to draw a molecule on a tablet computer, the
software can identify different types of chemical bonds and element
symbols and determine the structure of the molecule. Similarly, if an
electrical engineer draws a circuit diagram, the software will identify
the circuit's separate components — like resistors, capacitors,
batteries, and simple wires — and display them in different colors.
Other applications of the system include programs that can interpret
mechanical drawings, family trees, and diagrams of computer programs.
Once a sketch has been interpreted by computer, it becomes much more
useful. A chemical sketch, for instance, could be the basis for a
literature search, to see whether there's any prior research on the same
molecule; analysis software could determine whether the circuit depicted
in a sketch will perform as intended. Or design software could simply
clean up and standardize a sketch for display in a journal or PowerPoint
presentation.
The writing's on the wall
The application of sketch recognition to chemistry grew out of a
collaboration with Pfizer, says Tom Ouyang, a PhD student in MIT's
Computer Science and Artificial Intelligence Laboratory (CSAIL), who
developed the new system together with CSAIL professor Randall Davis.
"We once visited their labs, and we noticed that on all their
whiteboards and even on some of their windows they had all these
chemical structures drawn using dry-erase markers, and when we talked to
them they mentioned that they used these graphical diagrams all the
time." Currently, Ouyang explains, the only way to translate such
diagrams into a format that a computer can understand is to use software
that requires the researcher to select an element — like a bond or a
chemical symbol — from an on-screen palette, click it, drag it across
the screen, drop it into place, and then repeat the process for each
successive element. "That's not as intuitive or as fast as just being
able to jot it down on paper," Ouyang says.
Most of today's tablet computers and even some smart phones come with
software that can recognize handwriting. But interpreting a diagram is
"completely different from handwriting recognition," says Tom Stahovich,
an associate professor of mechanical engineering at the University of
California, Riverside, who researches sketch recognition. "When you do
handwriting recognition, there's a natural temporal and spatial order to
it. In English, you write left to right, top to bottom. And so figuring
out what comes next is much easier." In a circuit diagram, on the other
hand, a resistor might be oriented horizontally or vertically, and it
might appear above, below, or next to the preceding circuit element.
"With handwriting recognition," Stahovich says, "you keep looking to
your right, and you see the next letter." Similarly, Stahovich explains,
handwriting recognition systems exploit regularities that are unique to
language. "They have a lexicon, just a giant word list, and they find
the word most similar to what the recognizer produces," he says. "So if
the recognizer recognizes the word as 'tbe,' that's not in the lexicon,
but the most similar word from the lexicon is 'the,' so that will get
replaced."
Anatomy of a sketch
To meet the particular demands of sketch recognition, the MIT
researchers combine information about the physical appearance of the
final sketch with information about how it was drawn: the system can
recall the direction in which the stylus was moving when a particular
stroke was made. That gives it a better sense of whether a stroke was
intended to be horizontal, vertical, or diagonal. The system then
decomposes a symbol into its constituent parts: its horizontal elements,
its vertical elements, its diagonal elements from both upper left to
lower right and upper right to lower left, and the endpoints of the
strokes. Algorithms automatically refine the components to eliminate
stray marks and enhance intentional ones. Finally, the system searches
through a database of similarly decomposed sample symbols, looking for
matches. Davis and Ouyang say that samples from only 10 or 12 subjects
were enough to make both the molecular-sketch and circuit-diagram
systems highly reliable, even for first-time users.
"Traditionally, there's been some distinct flavors of shape or symbol
recognizers. Some looked at how the shape was drawn — how many pen
strokes and in what direction — and some looked at the final image,"
Stahovich says. "What Tom has managed to do is come up with a technique
that combines strengths of both approaches in a unique way."
The researchers have already developed an additional program that
translates hand-drawn chemical sketches into a format recognizable by
chemical-design software, but they haven't yet done the same for
electrical-engineering sketches. And while their system recognizes
standard symbols for chemical elements — H for hydrogen, C for carbon —
it hasn't yet been trained on the large number of abbreviations that
chemists use for more common molecular structures — "like AC for acetyl
groups, or ME for methyl groups," Ouyang explains.
Ultimately,
however, the researchers see the software as part of a larger project to
make interactions with computers as natural as interactions with human
beings. "We want to interconnect this with some of the other things
we've done with speech and web-based lookup so that one could walk up to
the whiteboard and sketch a molecule and say, 'Has anybody published
anything like this?'" Davis says. "And then there's the multimodal
aspect of that, which is, I draw it, ask if it's ever appeared, and the
system says, 'I can't find anything like it.' And I point at the corner
of the molecule and I say, 'What if I put a methyl group there?' Not
draw it, but just gesture at it." Davis says that other members of his
research group are working on the disparate technologies that would help
enable such a flexible system.
In the short term, however, if the iPad helps bring tablet computers to
a broader audience, sketch recognition could also come into its own.
"Previously, the technology was looking for places to be used," says
Stahovich. "Now, there's hardware everywhere in need of this
technology."