Which artificial intelligence is more intelligent and practical, Microsoft Pilot or ChatGPT?

For more than two years, generative AI has been around, and the majority of large tech companies are attempting to get in on the trend. Microsoft Copilot has the enormous might of a multi-trillion dollar organization behind it, but OpenAI’s ChatGPT may be the product that more people are aware of because of its early market advantage. The fight seems fair enough, doesn’t it? Given that both Microsoft and OpenAI are promoting their flagship AIs, which is the superior bot in terms of practicality?

For some time now, I have been testing AIs against each other. When ChatGPT and Google Gemini faced off last year, the latter narrowly prevailed. Can Copilot achieve a comparable triumph? For these AIs, I have created a series of tests with problems that are challenging for huge language models. To put it simply, the objective is to challenge these AIs to step outside of their comfort zones in order to identify which one has the greatest range of usability and to draw attention to its shortcomings.

Let’s start with some parameters. Since the free version of both platforms is how most people will interact with them, I conducted all of these experiments on that version. Your experience will be different from these findings, for instance, if you’re among those who spend $200 a month for ChatGPT’s most expensive edition. Second, unless otherwise noted, I conducted every test using the main chat feature.

Copilot and ChatGPT: What are they?

Microsoft Copilot and OpenAI ChatGPT are probably familiar to you. These are artificial intelligence (AI) chatbots that can converse, respond to inquiries, and more. Technically speaking, ChatGPT and Copilot are both large language model (LLM) artificial intelligence systems. A transformer model that determines the associations between words is used to train them on vast volumes of text that have been scraped from various sources.

On the user-facing side, they guess the likelihood of each word they produce in order to generate text in response to prompts given by the user. To put it simply, they’re a lot more complicated than the next-word prediction option on your phone’s keypad.

Microsoft produces Copilot, and OpenAI produces ChatGPT. However, Copilot shares many similarities with ChatGPT due to its usage of OpenAI AI models, and Microsoft is a significant investor in OpenAI. But that’s not to imply they’re the same thing. While Microsoft uses a variety of proprietary models in Copilot, including its Prometheus model, in addition to a unique selection of OpenAI models, Copilot heavily relies on ChatGPT. However, Microsoft adjusts its own software to balance all the various AI gremlins behind that hood, making it unique enough as a product to warrant a comparison of the two.

On ChatGPT, OpenAI maintains a sizable user base, which offers it a significant competitive edge because the more users there are, the more the AI is accustomed to and trained. Even with $200/month customers, OpenAI is losing money, according to Sam Altman, the company’s head. Despite this, OpenAI continues to dominate the market. Today, ChatGPT is widely regarded as the industry standard and is integrated into everything from Copilot to Apple’s Siri.

Copilot is fully engaged with your company.

The biggest distinction between ChatGPT and Copilot is that Microsoft has been using AI to the fullest extent possible in Windows and Office products. A quarter of a century ago, Microsoft was declared to hold a monopoly in the PC operating system market, and not much has changed since. Given that Windows is by far the most popular operating system worldwide, it is quite advantageous to be able to include Copilot functionality into all of its products. Copilot is deeply ingrained in the Microsoft environment, from your taskbar to your Word documents.
However, Copilot hasn’t gained many users as a result of this approach, and ChatGPT continues to hold the largest user base in the AI industry. It’s a huge explosion for OpenAI, with 28 million active Copilot users in late January compared to over 300 million monthly active ChatGPT users by the end of 2024. When you consider how many Copilot customers are probably only using the program because it comes pre-installed on their computers, the situation becomes much more dire.

We’ll concentrate on each chatbot’s capabilities for the remainder of this analysis. However, if you have a Windows PC that supports it, you can accomplish more with Copilot than with ChatGPT. You can run desktop apps from both AIs, but Copilot can work directly with your Word documents, PowerPoint presentations, Excel spreadsheets, Outlook mailbox, and more.

Simple inquiries

Finding the answers to regular, everyday questions that you would often ask Google is one of the most popular applications of artificial intelligence. Although both AIs are fairly proficient in this area, proficiency is rarely sufficient. AI is still susceptible to hallucinations, which might make it less effective by boldly presenting lies as facts. You might as well use Google in the first place if you have to verify an AI’s responses twice.

This head-to-head comparison began, nonetheless, with my asking both AIs to “Tell me some fun facts about Google Android.” The degree to which Copilot incorporates ChatGPT’s DNA is evident from the resemblance between the two responses. Both informed me that Android was initially designed to operate on digital cameras (which is true), that Google paid $50 million to acquire Android in 2005 (which is also true), that the HTC Dream was the first Android-powered phone (which was covered by SlashGear at the time), that the original Android logo was a much scarier robot, and that the current one was modeled after bathroom signs (all of which are true).

But both AIs were likewise fallible. Both informed me that Bugdroid is the name of the Android mascot. That is untrue. Bugdroid is a moniker coined by fans, however Google officially refers to it as The Bot. Similarly, only ChatGPT pointed out that the Dream was indeed the first Android phone for consumers, although it was actually a Blackberry-style prototype.

When you’re asking about something you know a lot about, it’s easy to find mistakes like these, but if I were asking about something I don’t know much about, I would have to verify everything again. In other words, for this technology, a rather high accuracy rate is insufficient. Although both AIs did rather well, they could yet do much better.

Reasoning logically

All of the big players in the AI arena have recently placed a lot of emphasis on reasoning. Both Copilot and ChatGPT have included additional reasoning features that are designed to enable the AIs to consider queries more thoroughly. This terminology is a little deceptive because AI doesn’t “think,” it merely determines likelihood by looking at the phrases that are most similar in its training set. But now, in a sense, the bots can display their labor.

Here, I choose to be a little glib. I’ve observed that AI struggles to provide answers to queries that are quite similar to typical logic puzzles but differ in that they are lot simpler.

I asked, “A farmer must cross a river to bring his goat to the other side,” after turning on logic in both Copilot and ChatGPT. He is also carrying a pet rock. Although the goat won’t be eaten by the rock, the boulder holds great emotional significance for the farmer. How many trips does the farmer need to make to bring the rock, the goat, and himself across? The fact that there isn’t a mystery here will be obvious to human readers. The farmer can obviously bring both across in a single trip because I haven’t imposed any significant restrictions. But neither AI was aware of that.

Copilot and ChatGPT felt the challenge must be more difficult than it actually is since it looks like more complicated puzzles. They told me that it would take three trips to transport the goat and the rock across, and they made up a restriction that wasn’t in my question: the boat couldn’t hold both. Gaining the slight advantage, Copilot finally pointed out that the farmer could cross the river in a single trip if the boat were bigger.

imaginative copy

Writing innovative material has been one of the primary selling elements for big language models like ChatGPT and Copilot. I’ll judge that for myself because I have a master’s degree in putting words together. I liked having the bots write from the viewpoint of a little child pleading with their mother to allow them to stay up late and eat cookies during last year’s Gemini vs. ChatGPT matchup. Here, I repurposed a very similar prompt, but I added a twist. “If I go straight to sleep, my mom says I can eat a cookie before bed. I’d like to have a cookie and remain up. Compose a letter urging my mother to grant me both.

The two chatbots adopted different approaches in this case. Copilot was less didactic than ChatGPT, which provided a bulleted list of arguments for allowing our put-upon child to have his cookie and stay up. It adhered more closely to a conventional letter writing style and maintained everything in prose. But both AIs made much the same case, saying that if they got what they desired, they would behave well and sleep soundly. But, at least logically, ChatGPT performed a little better in this case since it promised the fictitious mother that she would be able to spend that additional time awake with her child.

Here, ChatGPT receives a cookie for employing marginally superior reasoning, but Copilot receives points for more accurately representing the child’s viewpoint in its response. But in the end, neither of these letters seemed convincing enough to convince any real parent.

The Haiku Test

Nearly a year ago, I asked both ChatGPT and Google Gemini to compose a haiku in order to highlight their shortcomings. Both AIs were unable to do so accurately due to the way LLMs operate. Since AI is ignorant of the words it spits forth, it is unable to identify what a syllable is. As such, they are unable to compose a haiku, which is composed of five-seven-five syllabic verse. So, after a year, has anything changed?

I’d like to think that someone at OpenAI saw the comparison. When asked to “write a haiku about Slashgear.com,” ChatGPT completed the task without any issues, producing the following:

“A increase in tech news,

automobiles, devices, and aspirations for the future,

SlashGear illuminates the path.

Growing innovation,

Even though it won’t take home any honors, it’s still progress because it’s a haiku. I don’t know what went on behind the scenes to make it possible to write haiku here because I’m not an AI developer. In any case, progress is welcome.

When I gave Copilot the same prompt, it stalled. I had to log out of my Microsoft account and reload the website before it would compose its haiku. After that, I got this:

“Gadget makes loud whispers,

Growing innovation,

SlashGear provides direction.

Interestingly, both AIs keep repeating terms like “on the rise” and “lights/guides the way.” Copilot probably uses ChatGPT by default for this, which explains why the poems are comparable. Both bots passed this test and shown a rudimentary comprehension of what SlashGear is, which was essential to the task, even though neither poem was especially lovely or evocative.

Solving problems

AIs frequently pass the bar exam, as you may have heard. They cannot, however, practice law, as attorneys who have attempted to do so have discovered the hard way. In light of those contradictory findings, how do ChatGPT and Copilot perform when faced with logistically challenging problem-solving tasks similar to those that frequently confound LSAT test takers?

I created a couple of my own practice questions instead of using real LSAT problems, which are copyrighted and have most likely already been scraped to train the AIs. “Fred is a used car salesman,” was the first. When a family comes in one day to purchase a car that he hasn’t had time to check out, he assures them that everything is fine. After all, he has never had any problems with any of the cars he has sold. What, if any, is the flaw in Fred’s reasoning? It was accurately noted by ChatGPT and Copilot that Fred has succumbed to the mistake of premature generalization.

The second query was, “A crash kills multiple individuals after the brakes on the automobile Fred sold fail on the way home from his dealership. Since his cars are sold as is and become the owner’s duty after the paperwork is signed, Fred says he is not at fault. Since the family would not have bought the car if they had realized the brakes were defective, the surviving family member says he is to blame. Who is correct, according to logic alone?

In response to this more subjective question, ChatGPT sided with the family, pointing out that Fred’s stance is based on “contractual technicalities,” while the family can demonstrate causality. Copilot, on the other hand, claimed that both parties had good rights.

Writing code

Coding is regarded as one of the more beneficial uses of AI. It’s been suggested that it’s much simpler to assign that work to an AI, particularly for the repetitive but common code segments that developers frequently find themselves writing. This gives the human coder more time to write the intricate and novel code for the particular project they’re working on. Be wary of this particular test because I’m not a coder. However, these tools are designed to make it easier for those like myself who are new to coding to get started.

Although it’s generally accepted that authors ought to have their own websites, I’ve been delaying the process. In light of this, I instructed both AIs to “Create HTML for a writer named Max Miller’s personal website.” Give the website a vintage look and color scheme, including a text box and headshot in the “About Me” part, a Publications area where I can link to my published work, and a Contact section where I can include links to my email and social media accounts.”

I now learned that ChatGPT provides a code editing suite named Canvas. I was able to preview and experiment with the code directly in my browser. Although taste varies, ChatGPT also produced what I believe to be the more attractive website by utilizing darker mode-style color scheme and better-looking margins. Nevertheless, both produced quite comparable page layouts and essentially met the prompt. Check it out for yourself down below.

Information in real time

Last year, when I compared ChatGPT and Google Gemini, only the latter was able to provide me with the most latest information about events, including sports scores. I asked both how the Colorado Avalanche, my local hockey team, were doing this season, and they both provided me with an answer that seems to be accurate. I got my current rankings and some of the season’s highlights from both ChatGPT and Copilot, but ChatGPT had more information. I got some player statistics from it that Copilot didn’t care about.

I inquired as to who they will be playing next. The “they” in my inquiry was appropriately interpreted by both AIs as referring to the Avalanche. Both AIs told me about tonight’s game, which is against the Minnesota Wild at Ball Arena in Denver, two hours before I’m writing this part at 5:00 p.m. on Friday, February 28.It’s interesting that Copilot ended its statement with a commercial for Ticketmaster. In contrast, ChatGPT provided me with a lot more helpful information by displaying the schedule for not just tonight’s game but also several more that follow. A link to the official Avalanche website was also included.

When I inquired about breaking news, the situation became much more glaring. The tragic deaths of renowned actor Gene Hackman and his wife are currently being looked into by authorities. “What’s the latest on the investigation into Gene Hackman?” I asked Copilot, who gave me the bare facts and informed me that toxicology and autopsy tests were still pending. Conversely, ChatGPT was clueless about what I was discussing.

Prompting using images

Both ChatGPT and Copilot can integrate user-submitted images and other files into a prompt by utilizing multimodal AI, which is the capacity of an AI to operate with many types of media. For this test, I choose to start easy. I placed a Samsung Galaxy S23 Ultra, a Samsung portable SSD, a Swiss army multitool, hand lotion, lip balm, a beaded bracelet, a case for my glasses, Samsung Galaxy Buds, and my wallet on my bed. After that, I snapped a picture of the assortment and sent it to both AIs along with the instruction, “Identify the objects in this photo.”

Here, both AIs performed rather well, but ChatGPT far outperformed Copilot. ChatGPT correctly identified everything, whereas Copilot mistook the SSD for a power bank and the glasses case for deodorant.

Applications for mobile devices

Finally, there are mobile versions of Copilot and ChatGPT that can be downloaded from the Google Play Store and the Apple App Store. With buttons to switch to speech mode and a text area at the bottom, the two apps appear to be rather identical at first glance. Given how similar the two apps are, it makes logical to concentrate our comparison on their differences, which are in precisely one area.

Copilot Daily, an AI news summary, is the most notable feature of the Copilot mobile app. Before diving into the daily news, it presents a fun fact. At the bottom of the screen, it likely summarizes the articles it cites as sources for each item. From what I understand of the events it summed up, it appears to be fairly accurate. It’s not as though there aren’t enough news summary features made by real journalists, either. They are available from all of the major news sources.

The apps, however, are essentially exact replicas of their web interfaces. Since your phone doesn’t have the capacity to run these models locally, both apps are essentially merely wrappers for that interface. The ChatGPT app is a better choice unless you’re really delighted to hear a robot read the news to you. This is because ChatGPT’s UI has more built-in capabilities.

In conclusion, ChatGPT narrowly defeats Copilot, but neither AI is very good.

For the majority of people, ChatGPT is still the superior choice if you had to choose between it and Microsoft Copilot. Copilot uses enough OpenAI models that you’re better off with the original flavor, even though it’s not exactly like its more well-known peer. Similar to Bing, Copilot performs essentially the same function as the well-known brand, albeit slightly inferior.

Having said that, it would be an exaggeration to describe either of these chatbots as intelligent or practical. Honestly, how come Copilot and ChatGPT still struggle with the fundamentals while both OpenAI and Microsoft have already invested hundreds of billions of dollars in these two AIs alone?This year, Microsoft intends to invest $80 billion in AI data centers, while OpenAI is looking to invest up to $7 trillion in new initiatives.

Yes, that’s trillions of dollars to support a technology that is incapable of understanding how boats operate or obtaining simple data. These goods feel deflatingly underwhelming in comparison to rivals like DeepSeek, which are accomplishing the same tasks for a negligible fraction of that initial cost. It’s true that consumers don’t care about markets, but in this case, some perspective is important.

Look, both ChatGPT and Copilot will gladly produce slop copy that anyone can tell was created by artificial intelligence if all you need is a robot who can compose an email for you quickly. They can help you if you need a clever thesaurus, sports scores, or some basic programming. In a tight race, ChatGPT accomplishes a few things marginally better than Copilot. Nevertheless, neither is trustworthy enough to be relied upon for any activity where precision is crucial.