Learning from Descriptive Text

Tamara Berg - University of North Carolina at Chapel Hill

Learning from Descriptive Text

Tamara Berg
University of North Carolina at Chapel Hill

People communicate using language, whether spoken, written, or typed. A significant amount of this language describes the world around us, especially the visual world in an environment, or depicted in images or video. In addition there exist billions of photographs with associated text available on the web; examples include web pages, captioned or tagged photographs, and video with scripts or speech. Such visually descriptive language is potentially a rich source of

1) information about the world, especially the visual world,

2) training data for how people construct natural language to describe imagery, and

3) guidance for where computational visual recognition algorithms should focus efforts.

In this talk I will describe several projects related to images and descriptive text, including our recent approaches to automatically generate natural language descriptions, name objects, or create referring expressions for objects in images. In addition I will introduce our new work on collecting descriptions for fill-in-the-blank image description and question-answering.

All papers, created datasets, and demos are available on my webpage at: http://tamaraberg.com/

If you would like to meet with the speaker, please contact Thomas Kleinbauer.


Tamara Berg

Tamara Berg received her B.S. in Mathematics and Computer Science from the University of Wisconsin, Madison in 2001. She then completed a PhD from the University of California, Berkeley in 2007 and spent 1 year as a research
scientist at Yahoo! Research. From 2008 to 2013 she was an Assistant Professor at Stony Brook University and core member of the consortium for Digital Art, Culture, and Technology (cDACT). Since 2013 Tamara has been an Assistant
Professor at UNC Chapel Hill. She is a recipient of the NSF Career Award, a Google Faculty Research Award, and the 2013 Marr Prize. Tamara’s research focuses on human-centric computer vision, including topics at the boundary
between Computer Vision and Natural Language Processing, as well as recent forays into clothing and style recognition and developing human-computer collaborative recognition systems.