Last Updated on
Are today’s computer vision technologies robust enough to generate text descriptions for photos across a range of domains? That’s the question scientists at LinkedIn have been investigating over the past several years, solutions to which they detailed in a blog post this afternoon. One of their more promising efforts is a tool that adds suggested alternative text descriptions for images uploaded to LinkedIn automatically, which it achieves with Microsoft’s Cognitive Services platform and a unique LinkedIn-derived data set.
“Currently, LinkedIn allows members to manually add alternative text description when uploading images via web interface, but not all members choose to take advantage of this feature,” wrote contributing authors Vipin Gupta, Ananth Sankar, and Jyotsna Thapliyal. “To uphold our vision, we must make rich media accessible for all of our members … [That’s why] we are exploring to help us improve content accessibility at LinkedIn.”
There’s myriad challenges where the task of automatic caption creation is concerned, Gupta and colleagues point out, perhaps most obviously the subjective nature of the captions in question. The best require subject a breadth of expertise and knowledge of various objects and their attributes, along with time-based information that helps to more accurately identify depicted activities.
To address these barriers, the team tapped Cognitive Services’ Analyze API to develop a feature that generates alternative text descriptions for photos ranked by confidence score. They then recruited human evaluators to score its performance by reconciling the scores — which were informed by alternative text descriptions, categories, and tags — with labels they themselves wrote.
While Microsoft’s API recognized groups of people, objects like newspapers, and places like a subway pretty successfully, it initially struggled with LinkedIn media containing images with professional context like slides, projectors, exhibitions, conferences, seminars, posters, certificates, charts, and more. The development team solved this by evaluating the correctness of alternate text descriptions on LinkedIn data, which helped to expose exploitable patterns specific to the quality of the image captions.
Having isolated the patterns, the team developed a meta classifier that helps to filter out text descriptions that “could harm [LinkedIn] member[‘s] experiences,” in addition to an image description correction module that identifies and fixes incorrect descriptions containing words like “screenshot.” This improved automatic caption generation, they say, set the stage for meta classifier models created in partnership with Microsoft that take into account tags taxonomy, an associated dictionary, and additional text associated with LinkedIn feed posts.
“[The] addition of rich media within the LinkedIn feed raises a question: is the feed fully inclusive for all LinkedIn members? For instance, can a member who has a vision disability still enjoy rich media on the feed? Can a member in an area with limited bandwidth, which could stop an image from fully loading, still have the complete feed experience?” wrote Gupta and colleagues. “LinkedIn’s AI teams [continue to build] image description models for rich media content specific to the LinkedIn platform to help improve overall image description accuracy.”
LinkedIn is no stranger to AI, of course. Its Recommended Candidates feature learns the hiring criteria for a given role and automatically surfaces relevant candidates in a dedicated tab. And its AI-driven search engine leverages data such as the kinds of things people post on their profiles and searches that candidates perform to produce predictions for best-fit jobs and job seekers. Indeed, LinkedIn in 2016 changed its feed from reverse-chronological order to something more personalized, making machine-learning based predictions about what users would like and share.