With a lifelong passion for connecting content creators with their audiences, Andrey Doronichev’s career has been dedicated to exploring new frontiers. From his early days working at an internet service provider to his pivotal role at YouTube, and now the founder & CEO of content fraud detection engine Optic, Andrey’s journey has been one of innovation and entrepreneurial spirit.
Doronichev’s foray into technology began in the nascent days of the web. Witnessing the transformative power of this newfound connectivity, he became captivated by the potential of the internet to bridge the gap between creators and consumers. This drive led him to establish a mobile content startup that distributed games before the advent of the iPhone, laying the foundation for his future endeavors.
Recognizing the rising importance of mobile platforms, Doronichev joined YouTube, where he spearheaded the development of their mobile team. Under his leadership, YouTube’s mobile app amassed over a billion users, accounting for more than 50% of the platform’s total traffic. Through this success, Andrey witnessed the evolution of media consumption as YouTube shifted from a website to a dominant app in the digital landscape.
Later on, Doronichev’s attention turned to the emerging frontiers of immersive media and the metaverse. As a founding member of the Google VR team, he played an important role in the development of Google Cardboard. However, as the distribution of VR proved to be a challenge, Andrey recognized the widespread adoption of metaverse-like experiences in the form of games, social platforms, and content creation ecosystems. Determined to make these interactive 3D experiences more accessible, he embarked on his final project at Google: Stadia—a cloud gaming device aimed at making gaming instantly accessible.
As the founder & CEO of Optic, Doronichev is now dedicated to building solutions focused on content authenticity and safety. In this interview, Doronichev and Metaverse Post co-founder Sergei Medvedev unpack the technology behind Optic and its content understanding system for blockchain.
I want to say I love Stadia. When I tried this product back when it existed, it was really cool. I liked the UI/UX, especially this experience when you could easily use your controller, and it goes with all the facts in the game. It’s synchronized. I think it was the best-virtualized software for gaming, in my opinion.
A lot of work went into this. Thank you.
Could you describe yourself? What are your interests in general, and what are you passionate about?
Well, I am a technologist and an entrepreneur. I’ve spent most of my life building things that I’m excited about, and most of them are in the field of technology. Specifically, I’ve always been passionate about connecting people who create media and new forms of media with people who consume this media.
I was one of the founding members of the Google VR team working on the Google Cardboard product that you probably remember. We turned it into a team and a whole VR initiative with a bunch of apps, software, and hardware that Google launched in this area. Later on, it became pretty clear that distributing VR immersive experiences is really challenging — it takes extra hardware to make interactive 3D and immersive. At the same time, millions and millions of people were already using Metaverse; we just call it games. There are social experiences and economies there; there are content creator account platforms like Roblox, and for some reason, games we call them games. That’s just the wrong name for these new social worlds. Some of them are way more than games.
Stadia was about making those interactive 3D experiences more accessible rather than more immersive. Just like YouTube made video way more accessible than getting a DVD or downloading a ginormous video file, we just stream it. Similarly, we felt games were not as accessible to most people because they required expensive hardware. You need a computer; you need a game console or whatnot. Even if you have those, you need many hours of download time before you can enjoy the game, and Stadia made gaming instant. That was the idea behind the platform. I was the Director of Product responsible for the consumer-facing part.
After that, I left Google to explore my own projects. Ever since, I’ve been doing a bunch of creative work, but also as a creator on social. Lately, like in the last year or so, I got back to like my core craft, which is entrepreneurship, and I started the company called Optic, which is an AI company focused on digital media, safety, and authenticity first and foremost.
Let’s discuss Optic, which initially began as a content recognition engine for web3, specifically designed to identify NFT copymints, remixes, or inappropriate content. At that time, it was a trendy topic, but it seems that now you are shifting more towards AI. Is this a pivot in your strategy or simply a diversification of your product to meet the demands of users and offer more functionality to a wider user base compared to the focus on NFTs?
Optic started around the thesis that digital content and authenticity are becoming increasingly important, and it remains true to this day. We’re a team that will be solving digital content authenticity and safety using AI. We consume all sorts of digital media. There’s news, there are images on socials your friends post, there are videos on YouTube, there’s digital art, and there’s a specific subset of digital art that is NFT. All of these areas are digital content and are, in our view, going to be increasingly pressed to invest in the authenticity and safety of content because the amount of content being generated is accelerating. It’s easier to create and distribute, so there is more of it, and there’s more malicious content.
With this thesis, we’ll build an AI that helps humans understand which content is good and which is bad, and we needed to start somewhere, so we started with a very small segment that immediately had very clear economic value: digital art. It was the easiest way for us to start on our vision because there was a very clear way to explain why people should pay for authenticity. After all, if you buy an inauthentic NFT, you immediately lose money. If you consume inauthentic news, you probably lose more than money but over a much more significant period of time. It’s a way harder sell, so that’s why we started where we did.
In a year, we cleaned up the space from millions of inauthentic NFTs and built the most precise, fastest, and most scalable content understanding system for blockchain. It works right now across nine blockchains; it’s detected over 100 million fake NFTs. It’s working as a real-time system with under a second delay in most cases. It’s relied upon by the major and a few marketplaces like OpenSea, which is pretty much the majority of the market for secondary sales where most of the fraud appears. You can see our results at insights.optic.xyz, which is a publicly facing dashboard with a number of bad NFTs detected per collection.
Now with generative AI becoming an explosive topic, I think there is another problem way bigger than digital art counterfeit, and that is, soon enough humans won’t be able to tell what’s real and what’s imaginary. For example, those first attempts at political influence with Trump handcuffed photos. I believe we are in a new era that is going to be really scary for people because AI will be used in all sorts of misinformation campaigns.
Frankly, when we started Optic, AI was already doing a lot of damage because of AI-generated recommendations, as they create echo chambers on social where people would get reinforced on their beliefs and therefore causing societal polarization. But now, with generative AI, it’s multiplied because suddenly, those echo chambers can not only retranslate evidence obtained somewhere, but they can create fake evidence and alternative realities within those little groups of people believing in something. It’s going to be increasingly important to just have some public tools, allowing anyone to check whether or not what they’re looking at is real or imaginary. Of course, at the institutional level, and that’s what our monetization is all around: providing APIs.
I wanted to ask how it works because, at Mpost, we have our AI writer that scans a lot of news sources. Our editors will then write the lede, but the article is actually generated by a couple of AI models just to make it look like human-written text. As a platform delivering solutions that detect fake and misleading content, will Optic be able to recognize AI-generated text as not authentic?
Let’s separate text and media. To be very clear, we don’t have a product for AI-generated text detection at the moment because, very frankly, it’s extremely hard to do as those AI texts are not very different from human-written content. As long as it’s factually correct, it doesn’t even matter if it was written by AI or not unless you’re a school teacher.
However, It does matter a lot when it comes to photographs and videos, like when someone is presented as photographic evidence of something that didn’t happen, like Trump handcuffed or the Pope in a puffy jacket. Or when someone is taking your voice or your likeness, or your face and creates something that you didn’t say you didn’t do, but it appears that it was you. The latest AI-generated track by Drake and The Weeknd, which by the way, is pretty good, is an example of what’s to come. But if you’re Drake, you can fight it and get all the platforms to remove it.
I personally have a pretty popular social account as a content creator on Instagram, and I’ve been sent ads where my face is talking about some bullshit product that is clearly a scam and advertising it to the audience who believes in me, so there are like a few hundred thousand people in the world who know my name and my face, and someone is using me to sell scams to those people.
I think world-renowned artists will have some tools to fight it off. You can make a statement that it’s not you, and everyone will hear this statement. If you’re like an influencer with 100,000 or a million subscribers, and someone is using your face or voice to say things you don’t mean, you might not even find out until it’s too late. And that’s the reality in which we’re all going to live in for the foreseeable future.
As you can see here, that’s where we’re focusing first and foremost:
- Is this photograph real, or is it generated by AI? This is a big hot topic right now.
- Second, is, is this video of a person likely to be a deep fake video?
- Third, is this audio recording the real voice of a person, or is it an AI-generated version of the voice of this person?
With that said, it might be completely legit; I might use my own voice. Here’s an example: I am a co-founder at this startup that created a voice-guided breathing meditation application My co-founder, a breathing instructor, records those guides with her voice in it. Now with AI, she can suddenly create way more content easier because she trains AI to reproduce her voice. She can just generate scripts in many languages, and AI can create versions of the track with her voice in those languages. And it’s a completely legit use case; it’s just a way to scale content production.
The problem comes when you can’t discern real or AI-generated media. For example, when someone calls you on the phone and tells you that they are your loved ones and that they’re in trouble, and you need to send them money. There are tons of reports on social right now about those kinds of skinned voice scams where someone sounds like your loved one. People fall for it and lose money. Our job is to help humans to stay safe in the world of AI-generated content. And by safe, I mean giving humans tools to provide transparency around what is authentic, what is AI altered, and what is AI-generated. As long as you can differentiate, you can make your own decisions.
For individuals who may not have extensive knowledge of AI, how does Optic ensure the security of their voice or detect whether a photograph is authentic or copied? As an ordinary person, what assurances can Optic provide in terms of displaying indicators to verify the authenticity of a photograph?
We’re in the early stages. We launched a web tool, aiornot.org. Let’s say someone sent you a picture of Trump, Hancock, or a picture of you doing things that you’re normally not doing, and you’re like, “What the hell is that?” You can upload that picture on aiornot.org. It tells you with about 80% probability whether it’s AI-generated. You can also send it to our Twitter account with the hashtag AIornot, and we have a bot in Telegram to add AIornot, to which you can just forward the file, and it’ll get back to you with its answer.
We don’t have a live product for voice and video at the moment, but those are the things we are researching and working on.
You have two significant milestones in your roadmap, namely voice and video fraud detection tools.
Yes. We are exploring all sorts of places where safety and authenticity can be endangered. Digital art was one of those, and we solved that. AI-generated images are a problem; we’re working on a solution. We expect that video and voice will become a problem, and we’ll be solving those. If there’s a different, bigger problem, we’ll be solving it instead.
Things are changing super fast right now with AI. For example, I can imagine that maybe a bigger problem will be AI agents that will pretend to be humans and will talk to you on social or on messengers, and you will not know if it’s real or not. So maybe if that is the case, we’ll focus on that. But it’s all connected by this common theme: Optic is an AI company solving issues surrounding content authenticity and safety.
What do you think is the most important skill people should develop nowadays to have better job prospects in the future or maintain their job security now?
By now, I think, we can probably agree that there is more than one form of intelligence. Until recently, we all thought the human brain was so unique that it was the only way to be intelligent. Like birds flying by flapping wings for just thousands of years was considered to be the only way to fly, and humans were trying to produce flying machines by creating flappy wings, and then Wright Brothers proved that there are different ways of flying that, in fact, are way simpler mechanically, but way harder technologically than what we are trying to do. Now we’re all flying.
Similarly, with intelligence, the brain has been the only known form of intelligence for many years, and then suddenly, now, we see that there’s a different form. Its transformer model is way simpler than your brain. However, given way more computing and way more data, it actually can produce intelligence comparable to or soon exceeding humans. So in this world where we’re competing with something that potentially is way smarter than us, I think there are two ways that the human brain, for now, can still be competitive:
- Agility. Being sustained, flexible, and being able to be less specialized is probably the most important skill that anyone should be training for right now. Because we’ll have to maneuver a lot as species to outmaneuver this new form of life if we create AGI in the next five years.
- Sensory experience. The one thing AI does not have. It cannot feel, it doesn’t have all the sensors in the world, and it cannot experience life. That’s what makes humans very special. The human condition is a condition of experiencing life. Feeling all the emotions of sadness and happiness and love and hatred and all those things that we feel every time we breathe in and breathe out. Nobody can take that away from us. If anything, we should learn that we should feel more because, in many cases, we will be outsourcing thinking going forward.
Read more:
Read More: mpost.io