Heartex Raises $ 25 Million for Its AI-TechCrunch-Based Open Source Data Labeling Platform

Heartex, a startup that advertises itself as an open source platform for data labeling, today announced it has received $ 25 million in a Series A funding round run by Redpoint Ventures. Unusual Ventures, Bow Capital and Swift Ventures also took part, bringing Heartex’s total capital to $ 30 million.

Co-founder and CEO Michael Maluk said the new money would go to improving the Heartex product and increasing the company’s staff from 28 to 68 by the end of the year.

“Based on engineering and machine learning, [Heartex’s founding team] knew the value of machine learning and AI can bring organizations, ”Toddler told TechCrunch via email. “At the time, we all worked in different companies and in different fields, but at the same time struggled with the accuracy of the model because of poor training data. We agreed that the only effective solution was to carry internal teams with experience in domains that would be responsible for annotating and storing training data. Who can provide better results than your own experts? ”

Software developers Malyuk, Maxim Tkachenko and Nikolai Lyubimov became co-founders of Heartex in 2019. Lyubimov was a senior engineer at Huawei before moving to Yandex, where he worked as a backend developer in the field of speech technology and dialog systems.

Heartex Dashboard.

Links with Yandex, sometimes referred to as Google Russia, may upset some, especially in light charges The European Union said that the Yandex news department played a significant role in spreading Kremlin propaganda. Heartex has an office in San Francisco, California, but several engineers of the company are located in the former Soviet Republic of Georgia.

Responding to a question, Heartex responds that it does not collect any customer data and does not collect for verification the source code of its labeling platform. “We have built a data architecture that keeps the data in the customer’s repository closed, separating the data plane and the control plane,” added Kid. “As for the team and their location, we are a very international team that has no members who are in Russia.”

Apart from geopolitical affiliations, Heartex is committed to tackling what Toddler considers a major hurdle in the enterprise: extracting value from data through AI. Recently, Gartner has a growing wave of companies looking to become “data-driven”. reported that over the past few years the use of artificial intelligence by businesses has grown by a whopping 270%. But there are many organizations fights use AI to the fullest.

“Having reached the point of reducing the return on the development of specific algorithms, companies are investing in improving data labeling as part of their strategic initiatives focused on data,” – said Malyuk. “This is a progress over previous development methods that have focused almost exclusively on developing and customizing the algorithm.”

If, according to Maluk, artificial intelligence companies, data labeling attracts more attention, it is because labeling is a major part of the AI ​​development process. Many artificial intelligence systems “learn” to understand images, videos, text, and audio on examples indicated by groups of notaries. Tags allow systems to extrapolate the relationships between examples (e.g., the relationship between a “kitchen sink” label and a kitchen sink photo) to data that the systems have not previously seen (e.g., kitchen sink photos that were not included in the data used for “learning”). »Models).

The trouble is that not all labels are created equal. Labeling data, such as legal contracts, medical images, and scientific literature, requires experience in areas not possessed by any annotator. And – being human – annotators make mistakes. In MIT analysis from popular AI datasets, researchers found incorrect data, such as one dog breed confused with another and Ariana Grande’s high note categorized as whistle.

Toddler does not claim that Heartex completely solves these issues. But in an interview, he explained that the platform is designed to support labeling workflows for a variety of AI use cases with features that affect data quality management, reporting and analytics. For example, data engineers who use Heartex can see the names and email addresses of annotators and reviewers of data that are tied to shortcuts they have entered or verified. This helps control the quality of the label and – ideally – eliminate problems before they affect training data.

“The angle for the C-Suite is pretty simple. It’s all about improving the accuracy of the AI ​​production model to achieve the project’s business goal, ”Malyuk said. “We find that most C-suite managers with responsibilities for AI, machine learning and / or data science have confirmed in their experience that with more strategic investment in people, processes, technology and data AI can bring tremendous value to businesses in many different usage options. We also see that success has a snowball effect. Successful teams can more quickly create additional high-value models based not only on their early learning, but also on additional data from the use of production models. ”

In the arena of data labeling tools, Heartex competes with startups, including AIMMO, Label box, Scale IIand Snorkel IIas well as Google and Amazon (which offers data labeling products via Google Cloud and SageMaker, respectively). But Toddler believes Heartex’s focus on software rather than services sets it apart from the rest. Unlike many of its competitors, the startup does not sell labeling services through its platform.

“Because we have built a truly horizontal solution, our customers come from a variety of industries. We have small startups as clients, as well as several Fortune 100 companies. [Our platform] was adopted by more than 100,000 scientists according to data from around the world, “- said Malyuk, declining to disclose revenue figures. «[Our customers] create internal data annotation groups and buy [our product] because their AI production models don’t work well and recognize that poor training data quality is the main reason. ”

Back to top button