Humans: the Final AI Frontier

Artificial intelligence (AI) is not only about large models and certainly not only about language. If we want AI to reach human-level performance it will have to learn from rich-context, high-bandwidth signals such as videos. And it will need to be able to detect, express and manifest human-centric subjective notions such as artistic values, ambition, trust, stress, and happiness. To train those human-centered machines we will need lots of appropriate human labeled data that is not really available today. Our vision is to offer tools that will enable this human-level AI transition. If human-centered AI will ever exist, human feedback is the only way to get there: the final frontier for AI.

Human Feedback is Hard

From philosophical arguments about human values and norms, to the long-standing struggles of psychology to discover ways to measure human responses, one thing is certain: human responses are hard to get and even harder to get right! However there is growing scientific evidence from behavioral economics, neuroscience, data science and AI supporting that you can actually get very reliable1 responses from humans even if you ask them to express their stress or engagement levels when they watch a film, or their overall sentiment during a therapy session.

AI and Human Feedback

We have this strong long-standing belief that AI performs better when tasked with clearly defined goals, such as recognizing an object, moving a robot to a particular location, or writing code. Perhaps this is because we, as humans, find it hard to trust AI to do anything that is human-centric, such as recognizing our emotions, critiquing our artistic style, or expressing views about our living spaces. Or perhaps we are right: even the very best of AI models available today, such as large language models (LLMs), do better than us in many tasks but perform poorly in many of the simple subjectively-defined things we ask them to do.

Current large foundation models are far from reaching human-level performance as they fail to understand our intentions, goals, emotions, artistic styles, and preferences; when they do, we are impressed! Such models also largely rely on low-bandwidth language observations and other static representations of our world such as text descriptions of images. The result is that even the most impressive multimodal models (e.g. GPT-4o) for instance cannot predict our emotional patterns when we watch a movie or play a video game2. This happens primarily because video labels were never available to train these models. Evidently, the performance of current AI systems is conditioned by the lack of reliable subjective labels of multimodal temporal signals.

While human feedback is critical for the development of human-centered AI it currently only exists in text and static forms such as tags and captions of images. The next generation of AI systems however will not only rely on language and semantic descriptions but rather on multimodal signals3. This new age of AI models will need to learn from temporal human labels over media content such as videos so that they perceive and act upon a world which is closer to ours. This push towards human-centered AI will require large volumes of such labelled spatiotemporal data of subjective nature for training entirely new (or fine-tuning existing) models.

HumanFeedback.ai

To support the next generation of human-centered AI algorithms at HumanFeedback.ai we create the definitive platform for the subjective and temporal labeling of multimodal signals such as videos. We build tools that empower AI engineers by offering rapid and reliable large-scale human feedback for their models. Our patented technology builds on the methods that have recently been used to make unprecedented progress in AI (e.g., foundation models, reinforcement learning via human feedback) but it combines it with insights from affective computing, behavioral economics and the psychology of bounded rationality.

At HumanFeedback.ai we believe that less (but more reliable) data is more. Using our tools you can reliably solicit human feedback rapidly and at a competitive cost in terms of human labor and time. Our tools enable you to recognize when human annotators are confused, tired, or overly ambiguous, how many annotators are sufficient for a task, which highlighted parts of video content require annotations, and which parts of your video are not worth annotating. With our platform you can rapidly detect unreliable annotators early (costing a few minutes of crowdsourcing), derive the ground truth of any subjective label and obtain reliable data with the least amount of annotations. You can then use the obtained data to build human-centered models on an end-to-end basis or fine tune existing foundation models.

Beyond the AI industry, any industrial or research sector that requires human data of subjective nature is a potential use case for our technology including but not limited to automotive industry, healthcare, manufacturing, construction architecture and engineering and videogames.

1Yannakakis, G. N., Cowie, R., & Busso, C. (2018). The ordinal nature of emotions: An emerging approach. IEEE Transactions on Affective Computing, 12(1), 16-35. [Outstanding IEEE TAC Paper, Nominated Best of IEEE Paper]

2Melhart, D., Barthet, M., & Yannakakis, G. N. (2025). Can Large Language Models Capture Video Game Engagement?. arXiv preprint arXiv:2502.04379.

3Bardes, A., Garrido, Q., Ponce, J., Chen, X., Rabbat, M., LeCun, Y., ... & Ballas, N. (2024). Revisiting feature prediction for learning visual representations from video. arXiv preprint arXiv:2404.08471.

Georgios N. Yannakakis
Founder, FIEEE Affective Computing

Get in touch

We'd love to hear from you! Whether you have a question, feedback or need assistance, our team is here to help. Please reach out to us by email!