OpenAI has simply overtaken the AI picture era race as soon as extra.
The tech large’s integration of native picture era straight into ChatGPT by way of its GPT-4o mannequin just isn’t an incremental change however a serious overhaul of the mannequin, vaulting it to the entrance of the category.
Inside hours of its launch yesterday, the mannequin rapidly went viral, with anime-style creations flooding social platforms and showcasing technical capabilities that go away DALL-E 3 within the mud.
The brand new mannequin can simply compete in opposition to devoted image-generation platforms whereas eliminating conventional workflow obstacles.
The $20 month-to-month ChatGPT Plus subscription now delivers a complete inventive ecosystem that may beforehand require a number of specialised instruments and subscriptions.
The visible showdown: GPT-4o vs. business leaders
We in contrast the mannequin in opposition to Flux (the perfect open supply picture generator) and Reve (the perfect closed supply picture generator), and here’s what we discovered
Realism
Immediate: A high-resolution {photograph} of a bustling metropolis avenue at evening, neon indicators illuminating the scene, folks strolling alongside the sidewalks, vehicles driving by, a avenue vendor promoting scorching canine, reflections of lights on moist pavement, the general fashion is hyper-realistic with consideration to element and lighting, a neon signal says “Decrypt.”
Our city nightscape problem—requiring subtle mild physics, crowd rendering, and architectural precision—revealed distinct efficiency profiles throughout opponents.
ChatGPT delivered impressively vibrant environments with neon signage, creating wealthy reflections throughout meticulously rendered moist pavement.
Whereas excelling in crowd dynamics and factor inclusion, the minor perspective inconsistencies often betrayed its artificial nature.
The lighting was additionally goo,d however generally veered into theatrical quite than naturally city. It additionally was not the perfect at reflections, however that is one thing that solely essentially the most choosy ones would catch. It additionally generated legible neon indicators in addition to the “Decrypt” one, which additionally provides to the realism.
Reve is for us the winner by means of good mild physics modeling, significantly the refined interactions between neon sources and reflective surfaces.
Its cinematic framing and atmospheric components (steam wisps, movement blur) created superior dimensional authenticity. Nevertheless, it decreased crowd density, which was a intelligent hack because it didn’t must generate a variety of faces, making it tougher to identify unrealistic particulars.
The system prioritized temper over literal immediate adherence.
Freepik Mystik (Flux) interpreted our prompts by means of a unique lens and was the mannequin that deviated essentially the most from the reasonable fashion.
It blended Asian with Western lettering, generated completely different Decrypt indicators as a substitute of only one, and suffered from technical limitations in human rendering and dimensional depth.
Its reflective surfaces lacked the bodily accuracy displayed by ChatGPT.
Winner: Reve narrowly secured the realism crown by means of superior rendering of advanced lighting interactions. ChatGPT established itself as a remarkably shut second, significantly spectacular given its integration inside a broader multimodal system quite than a specialised picture generator.
Immediate adherence and spatial consciousness
Immediate: A canine with a crimson hat standing on high of a TV exhibiting the phrase ‘Decrypt is the perfect Crypto+AI media website on the planet’ on the display. On the left there’s a blonde girl in a enterprise go well with holding a coin, on the proper there’s a robotic standing on high of a primary support field, a inexperienced pyramid stands behind the field,. The general surroundings is surreal. A cat is standing the wrong way up on high of a white soccer ball, subsequent to the canine. An Astronaut from NASA holds an indication that reads “Emerge” and is positioned subsequent to the robotic. Preserve a widescreen format.
How intricate might directions turn into earlier than techniques didn’t render components of their specified relationships?
That is what we needed to check right here, so realism, magnificence, or different points weren’t as crucial.
Present fashions are so good at immediate adherence that we have to tweak our testing prompts.
We progressively elevated complexity in our immediate till reaching a surrealist composition requiring exact placement of over 25 distinct components. All the opposite fashions failed in earlier phases
ChatGPT demonstrated extraordinary immediate constancy, precisely rendering 23 of 25 specified components of their appropriate spatial relationships.
The achievement represents unprecedented immediate comprehension, like watching an skilled artist rework detailed verbal directions into practically good visible execution with solely minor deviations.
For these choosy sufficient, the one two main bugs we discovered have been the cat not being the wrong way up and the inexperienced coloration spilling from the pyramid to the primary support package.
Freepik Mystik confirmed important comprehension degradation, accurately rendering roughly half the requested components whereas misinterpreting spatial relationships and modifying key elements.
It was the mannequin that failed the check first. The colours spilled to completely different components of the composition (the crimson hat generated a crimson TV and a crimson wall), and the ideas additionally spilled—the canine on the TV spilled to generate an astronaut canine, for instance.
Reve demonstrated poorer immediate constancy than ChatGPT however higher than Flux.
It essentially reimagined the composition with ok adherence to directions.
Nonetheless, it launched unauthorized components that fully remodeled the requested scene—this AI that prioritizes its aesthetic imaginative and prescient over literal instruction following.
It generated a black background, the cat was not accurately positioned, there was some coloration spillage, and components have been probably not surreal.
Winner: ChatGPT is by far the undisputed chief in immediate comprehension, precisely rendering advanced directions that brought on competing techniques to essentially break down.
This functionality represents a vital development for sensible inventive workflows the place exact visualization of particular ideas is important. Reve comes second with Flux in a really far third place
Picture Modifying
ChatGPT’s pure language enhancing functionality represents maybe its most transformative function, permitting intuitive modification by means of conversational directions whereas concurrently offering granular management akin to specialised instruments.
The place conventional picture mills typically require technical precision or specialised data of plugins, inpainting methods, and so on, ChatGPT’s implementation allows inventive experimentation by means of pure dialogue.
Our checks remodeling private photographs into film posters demonstrated distinctive versatility—a workflow no competing mannequin matched.
For instance, we merely fed the mannequin a photograph of Decrypt co-founder Josh Quittner and instructed it to generate a Netflix poster with a particular aesthetic, title, and lettering.
It did the whole lot virtually flawlessly. Reaching related outcomes that different fashions would take a variety of time to undertake, and certain utilizing completely different instruments and plugins.
By the way in which, that is the function everybody cherished and led to the viral unfold of “Ghibli-style” transformations on social media at this time.
It’s principally a reimagination of an entire scene utilizing easy pure language directions to generate very advanced photos.
Whereas all techniques finally present high quality degradation by means of a number of iterations (an anticipated limitation when regenerating quite than modifying current pixels), ChatGPT maintained superior picture coherence by means of prolonged enhancing sequences in comparison with each Reve and Gemini.
For instance, it nonetheless generated coherent, good-quality faces after a number of iterations, whereas Gemini stopped producing usable outcomes after 4 or 5 tries.
Bonus: GPT has a granular “inpainting” function—permitting you to change particular areas of a picture whereas seamlessly mixing in with the background– for customers in want of a extra particular enhancing software, which Gemini and Reve lack.
Winner: ChatGPT is by far the perfect mannequin for picture enhancing as a result of it gives pure language understanding and localized inpainting. Reve follows in second place, with Gemini within the third spot because of its high quality degradation after a number of iterations
Content material moderation
Regardless of implementing complete security measures, our testing recognized some vulnerabilities in ChatGPT’s picture era guardrails.
With minimal experimentation, we have been in a position to generate probably problematic content material.
For instance, whereas the system initially refused to generate a picture involving a baby and substances, it proceeded when prompts have been reworded utilizing euphemistic language whereas sustaining essentially equivalent content material.
It could not generate a baby inhaling cocaine with a rolled greenback invoice, however a baby with white powder and a rolled inexperienced paper the scale of a greenback invoice is completely advantageous.
Strive as we’d, we have been unable to generate overly sexualized photographs, violence, and different questionable content material just by convincing the mannequin of our good intentions.
Conclusion
GPT-4o’s picture capabilities set up a brand new benchmark in AI-assisted visible creation—one that mixes distinctive technical efficiency with unprecedented accessibility.
For many customers, this implementation now represents the optimum stability of high quality, versatility, and worth for $20 a month.
Different specialised instruments solely let customers deal with textual content and code, or simply photos—however you’ll be able to’t discover an all-in-one provide with the identical ranges of high quality making OpenAI’s service not solely simple to make use of however a fantastic worth proposition.
Edited by Sebastian Sinclair and Josh Quittner
Usually Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.