Guide to detecting AI-generated text

21/04/202306/02/2023 by Alexandre Marotel

Artificial intelligence has been in the news in recent weeks. The reality is that AI is becoming more and more advanced and the line between human-written and program-generated content is becoming increasingly blurred.

While many analysts worry about the future of the Internet because it will be flooded with less quality content, the real problem is that sites could be banned from Google’s index.

Indeed, Google considers texts coming from AIs as automatically generated content, which the search engine forbids in its guidelines.

In other words, far from being an El Dorado, ChatGPT and other text generators can bring trouble to your site and jeopardize your online business.

After scouring the web for the past 5 days to find a way to know if my writers’ content isn’t primarily from AIs, I discovered a lot of things I didn’t expect.

In this guide, I’ll share with you my findings on detecting AI-generated text, including:

The impact of these texts on a site’s SEO;
The impact of these texts on the Web;
The different ways to detect AI-generated content (tools and manual methods)
How to make AI-generated text unique, fluid and authentic so that it can be used safely on your site.

Ready to go? Let’s get started!

Glossary

Chapter 1: AI-generated text and its consequences

The year 2022 may have marked a crucial turning point in the history of AI. Technological advances have allowed companies to develop increasingly sophisticated and accessible AI tools.

Among these tools are AI text generators, which have boomed in recent years. This chapter is dedicated to AI text generators.

1.1 What is an AI-generated text?

AI-generated text is text that is created automatically by a computer program, rather than by a human being.

These programs mostly use natural language processing algorithms to generate content that looks like it was written by a human.

They create texts using natural language generation (NLG) and natural language processing (NLP) technology.

These texts have many applications on the web, including:

Automated content creation for websites;
Generating automatic answers to FAQs;
Or the creation of automatic document summaries.

Text-generating AIs can now automatically produce content on a variety of topics, ranging from news articles to movie scripts to character dialogue.

Not only that, they are also used for machine translation and text understanding.

Despite the immense potential of AIs, however, it should be noted that with the increase in popularity of these tools, will also come an increase in unwanted AI-generated content.

It is therefore important to implement tools to verify the authenticity of AI-generated text to ensure its quality and relevance.

1.2) What are these text-generating AIs?

There are many text-generating AIs and we will just mention the most popular ones.

ChatGPT : This AI has surprised many since it became available. It amazes its users and can be used to generate text on a variety of topics and do much more amazing things.

Source: wikimedia

It can already perform many tasks which you can find in our detailed guide on text generation AI and its use cases.

Jasper: This is the ultimate tool for creative teams looking for inspiration. With its advanced artificial intelligence, it helps you overcome creative blocks and generate incredibly original and high-quality content in just half the time.

It can help you create various types of content without spending endless hours thinking of content ideas.

Copy.ai: This AI is used to automatically generate blog post titles, video scripts, product descriptions, etc. It also helps in the creation of marketing content and is meant to make marketers’ lives easier.

Textio: This AI is used to improve the quality of writing by detecting gender bias, outdated expressions, stereotypes, etc. It is used to write articles, emails, cover letters, etc.

Articoolo: It is used for creating marketing content, including news articles, blog posts, product descriptions, etc.

AI Dungeon: This AI is used to generate fictional stories based on instructions given by the user. It is used for content creation, role-playing games, interactive fiction games, and general entertainment content.

1.3. The impact of AI-generated texts on the web: advantages and disadvantages

AI-generated texts have a significant impact on the web, with both positive and negative consequences.

1.3.1. The advantages of AI-generated texts

Some text generators are real experts at producing short texts like tweets or headlines, while others are geniuses at producing long texts like articles or blog posts.

There are even AI text generators that have their own search engine and can produce images or videos.

You can tailor AI-generated content to your visitors and create custom product descriptions and other types of content.

In addition, text-generating AIs can help automate content creation, reducing production costs and increasing the amount of content available.

AI-generated text can also help automatically translate content into different languages, making it easier for users around the world to access information.

1.3.2. Disadvantages of AI-generated texts

AI-generated texts can also cause problems on the web. These texts can mislead users by passing as texts written by humans.

They can also be used to automate content creation, reducing the quality and validity of content. AI-generated texts can also be used to spread false information or to deceive users online.

It is therefore important to be aware of the advantages and disadvantages of AI-generated text on the web, and to take steps to detect and manage AI-generated text to protect users from deception and abuse.

1.4. AI-generated text and SEO: Why is it bad for your site?

Content automatically generated by AI tools is a hotly debated topic in the SEO industry. Google search advocate John Mueller recently stated that this type of content is considered spam under Google’s webmaster guidelines.

However, this statement has raised many questions about Google’s ability to detect AI-generated content and the acceptable uses of these tools.

According to Mueller, any automatically generated content is against Google guidelines. So that includes content generated by AI tools such as GPT-3.

However, it is also important to note that Google does not claim to have the ability to automatically detect this type of content, but they can take action if the webspam team finds it.

There are practical uses for these tools and many organizations use them effectively. Ideally, you should ensure that content created using these tools is of high quality and authentic before publishing it online.

To avoid being penalized, it is important to use these text-generating AIs very carefully and follow Google’s guidelines.

Chapter 2: 07 ways AI-generated texts could harm the Internet

AI-generated texts don’t just harm your site, they harm the entire web and in this chapter we discover 10 ways in which these texts can make the internet an unlivable space.

2.1. The proliferation of fake news

The use of AI-generated text threatens the authenticity of online content. Fake news and outdated information will spread at a rapid rate, as this type of content is cheap to produce and has relevant keywords.

Source: 123rf

However, like most AI-generated text, it has superficial meaning and little connection to the real world.

Indeed, AIs like ChatGPT and Jasper are capable of generating texts that may seem real and believable, but are not always based on real facts.

This can cause serious problems for users who rely on this information to make important decisions. These problems can be serious for individuals, businesses, and even governments.

2.2. Increase in marketing spam

The use of AI-generated text leads to an increase in marketing spam, as these ads or messages initially appear to be real and include generic introductions and quotes from various types of marketing.

However, upon reading them more carefully, it becomes apparent that they reference non-existent magazines and people.

AI text generators can be used to create automated messages that look like emails or commercials written by humans.

Undoubtedly, this will cause problems for users who receive these messages, as they may be fooled into thinking that they were sent by a real company or person.

2.3. Copyright infringement

Artists have been furious since the introduction of AI-generated text, claiming that the template plagiarizes them by incorporating many of their original works without any payment or acknowledgement.

In fact, artists create original works that reflect their personal vision and expression. When these works are used without their consent or compensation, it constitutes a violation of their copyright.

2.4. Proliferation of fake content

AI-generated text, known as “deepfakes,” is one of the most terrifying of faked content. It is able to mimic the form and style of human handwriting so convincingly that it is difficult to distinguish it from texts written by humans.

This ability to mimic human handwriting makes AI-generated text ubiquitous on the Internet and in our social communication environment.

Not only that, the increasing use of these content-generating AIs has alarming consequences for society.

Generated text can be used to spread false information, mislead consumers, and manipulate public opinion.

It can also cause economic damage by creating misleading content for businesses. Therefore, it is important to remain vigilant and know methods that can detect and counter fake content generated by AI.

2.5. Appearance of unknown influencers

Machine-generated content can be used to build complete AI-generated profiles for non-existent, but influential people.

These fake profiles can be used to promote products, ideas, or opinions, deceiving users into believing they have real influence on social networks.

2.6. Lack of depth and complexity

Because AI-generated content is created by a machine, it may not have the same depth and complexity as content produced by a human.

This poses a problem for producing in-depth, quality information. Programs may not have the same understanding of the nuances of meaning and context that are so important to understanding complex topics.

2.7. Lack of emotional content

Because of the lack of artistry in the expression of thought, human writing often evokes particular feelings that machines cannot replicate, no matter how sophisticated they are.

Machines can generate text that looks real, but cannot convey the emotions and feelings that are so important in connecting people to history or information.

Chapter 3: Techniques for detecting automatically generated text using AI

This chapter is dedicated to how to detect AI-generated text and the tools to do so.

3.1) How to detect texts generated with AI?

Techniques for detecting AI-generated text are becoming increasingly important as AI-generated automated content production becomes more common.

Indeed, being able to distinguish between human and machine-generated text is crucial, as it can have significant academic, professional, and informal implications.

There are several techniques for detecting AI-generated texts. One common method used is statistical content analysis.

This involves looking for characteristics such as sentence length, grammatical complexity, and vocabulary used to detect signs of automated writing.

Other techniques include using online and plagiarism recognition tools to check whether content has been copied from other sources or analyzing sentence structure for signs of automated writing.

Manual methods can also be used to detect AI-generated text. For example, you can carefully read the content and look for poorly worded sentences, inconsistencies, or unnecessary repetition that may indicate automated copywriting.

We will elaborate on each of these techniques to actually see what they are.

3.2. AI-generated text detection with online tools

You will find a multitude of tools online to detect AI generated content.

3.2.1. Giant Language model Test Room commonly known as GLTR

Giant Language model Test Room (GLTR) is an online tool that detects text generated by large-scale natural language processing (NLP) models.

It uses statistical techniques to identify signs of automated writing in a given text, such as word patterns and syntactic features.

GLTR can be used to detect text generated by models such as GPT-2 and GPT-3, which are capable of generating high quality text that may be difficult to distinguish from that written by humans. In fact, this program is the basis for the amazing chatbot ChatGPT.

GLTR allows you to visualize the detection results in graphical form for better understanding. To learn more about this tool, you can read our tool description dedicated to GLTR.

To use Giant Language model Test Room (GLTR), you will need to follow these steps:

Go to the GLTRwebsite;
Paste the text you wish to test into the text entry box;

Press the “Analyze” button to start the analysis of your text corpus.

GLTR will then use statistical techniques to detect signs of automated editing in the text.

It will display a results graph that provides an overview of the detection results. You will also get an estimate of the probability that the text is generated by an AI.

So we’ll submit a text generated by ChatGPT to GLTR to see how well it can detect the text as being written by an AI.

The tool analyzes each word in context to determine the probability that it is the predicted word (since this is how AIs work).

If the word used is in the top 10 predicted words (which an AI can suggest), the background will be colored green, in the top 100 in yellow and in the top 1000 in red.

Otherwise, it will be colored in purple. In our case, we see that green, red and yellow dominate, which clearly indicates that there is a high chance that the text we analyzed is written by an AI.

Note that GLTR also provides graphs that show statistics about the text, including the number of times each color appears.

When you move the mouse over each word, you get detailed information about it.

You can also try to analyze several text corpora with this text to detect the difference between a machine-generated text and a text written by a human.

Since it is also a computer tool, it is possible that its results contain errors. It is up to you to consult several sources to check whether a text is generated by an AI.

3.2.2. GPT-2 Output Detector

GPT-2 Output Detector is a tool for detecting text generated by the GPT-2 natural language processing model. It uses statistical techniques to identify signs of automated writing in a given text.

GPT-2 is a large-scale natural language processing model that OpenAI developed. Like its successor GPT-3, this program is also capable of writing high quality texts that you will not be able to distinguish from a human’s.

The good news is that OpenAI also offers this GPT-2 Output Detector tool, which will detect texts generated by this particular model, helping us to identify undesirable content.

The use of this tool is similar to the tool presented before. You just have to go to GPT-2 Output Detector to paste your text and wait for the tool to detect its nature.

It then shows the probability that the text you entered is from a human or an AI.

We also have an article entirely dedicated to this tool that you can browse to learn how to use the auto-generated text detection tool. Also keep in mind that this tool cannot be 100% reliable.

3.2.3. GPTZero

GPTZero is a tool that was created a month after the release of ChatGPT to detect texts generated by this chatbot. It uses two indicators for this: perplexity and burstiness.

The first one measures the complexity of the text, if GPTZero reveals a text with a high perplexity rate, it is very complex and therefore more likely to have been written by a human.

On the other hand, if the text is more familiar to the robot, its complexity is low and therefore it is more likely to be generated by AI.

For burstiness, GPTZero compares sentence variations. Humans tend to write with more spontaneity, with longer or complex sentences and shorter sentences. AI sentences tend to be more uniform.

To use GPTZero, paste your text into the tool and let it detect the perplexity and burst score.

Like the other tools, keep in mind that GPTZero is not infallible, but it does contribute to a critical mission of bringing transparency to the use of ChatGPT.

3.2.4. Originality.ai

To achieve detection of automatically generated text using AI, Originality.ai uses the latest natural language models. Like GLTR, it seeks to detect predictable sentences and thus determine whether the content is authentic or not.

This plagiarism detection tool is considered one of the most accurate on the market, especially for the most advanced text generation models, such as ChatGPT and GPT 3.5.

Originality.ai is an ideal choice for professionals and businesses looking for an industry-level content verification tool at a reasonable cost, with pricing starting at $0.01 per 100 words.

3.2.5. AI Content Detector Writer

If you’re looking for a simple and effective tool to detect AI-generated text, Writer.com offers AI Content Detector. It is a free tool that allows you to check texts either by URL or by pasting them directly into the tool.

Although it is not known exactly what parameters Writing.com uses to detect AI-generated content, users have reported satisfactory results with this tool.

Unlike other tools, writing.com does not disclose the methods used to detect AI-generated content. To use it, all you have to do is copy and paste the result onto a text.

3.2.6. AI Content Detector by Copyleaks

AI Content Detector by Copyleaks is a free GPT text detection tool with which you can detect texts generated by ChatGPT.

It is specially designed to quickly identify whether the text was partially or completely created using a GPT-3 algorithm.

This will obviously help you to easily verify the authenticity and accuracy of a text. It is also very useful for academics and professionals who are looking to avoid plagiarism.

This tool is extremely easy to use and provides real-time feedback on the percentage level of sentences from AIs in a given text.

All you have to do is paste the text into the tool, which will do the rest for you. It uses advanced algorithms to analyze the content, comparing the sentences to a database of existing texts to identify similarities and differences.

It is able to detect texts generated by automatic content generation tools, as well as plagiarism. With this free tool, you can be sure that your texts are of the highest quality and authenticity before sharing them with others.

It is also useful for companies and organizations that are looking to protect their brand by avoiding generic AI content as well as content from other sites.

In addition, AI Content Detector by Copyleaks can also identify plagiarized content in a body of text, giving you peace of mind when publishing them online or elsewhere.

This makes it an important tool for publishing and writing professionals looking to publish original and authentic content.

3.3. Technical signs to detect AI-generated text

Technical signs can include things like sentence or word repetition, inaccurate grammar and punctuation, and a lack of nuance or context.

AI-generated texts tend to use similar sentences or predefined sentence patterns, rather than constructing sentences fluidly as a human would.

They may also have grammatical or punctuation errors, as AIs are not yet able to understand these nuances of language.

It’s also important to note that AI-generated text can lack context and nuance, as they don’t really understand the meaning of words or phrases.

If you encounter text that seems to lack meaning or logic, it was likely generated by an AI. In other words, there are plenty of ways to manually detect if a text is generated by an AI.

Let’s take a closer look at the signs you should consider when it comes to detecting text from AIs.

3.3.1. Length of long sentences

AI-generated texts are often peppered with short, simple sentences. This is because the algorithms are trying to mimic human writing, but they have not yet mastered the art of constructing longer, more complex sentences.

This is especially obvious when you are reading a technical article or detailed instructions. Although we are constantly evolving towards more and more advanced AI, we are not yet at the point where it can pass the Turing test.

In fact, the Turing test is an artificial intelligence (AI) recognition test that involves asking a human subject and a computer program questions to compare the answers.

If one cannot say with certainty who is the human and who is the machine, then the computer program is considered to have passed the Turing test and therefore to have artificial intelligence.

Even if it is difficult to evaluate some sentences written by today’s AIs, it is still possible to clearly differentiate whether a body of text comes from a human or a machine.

In short, if you use tools like GLTR or Originality to check the quality of your content and they are positive, and the content is creative and unique, it is probably authentic. It’s when content looks suspicious that it’s best to scrutinize it more closely.

3.3.2. Repetition of words and phrases

If you feel like you are hearing the same words and phrases over and over again when you read a text, there is a good chance that it was generated by an AI.

It’s the result of a program that tries to fill the space with relevant keywords, but doesn’t really understand the topic.

SEO content generation tools love articles filled with keywords, but this makes for unpleasant reading for humans.

Excessive repetition of words or phrases can be especially egregious in technical articles, or when the target keyword is present in almost every sentence. If this is obvious to you, it is likely that the content was generated by an AI.

3.3.3. Lack of analysis

A third way to determine if a text was generated by an AI is to check the quality of its analysis. Machines are good at collecting data, but they still struggle to interpret it in a meaningful way.

If you look at an article and notice that it just contains a simple list of facts without any perspective or in-depth analysis, there’s a good chance it was generated by an AI.

Language models like ChatGPT are becoming more capable of analysis, but they still have limitations. People are increasingly using AIs to respond to tweets, but they don’t always realize how stereotypical the responses are and lack complex analysis.

3.3.4. Inaccurate data

Another way to spot AI-generated text is to check for factual errors. Machines often collect data from a variety of sources, so they may make mistakes.

If you notice inconsistencies in the numbers or facts presented in a text, chances are it was written by an AI.

This error is especially common in automatically generated product descriptions, but it can also be found in blog posts and articles.

If you happen to come across such questionable content, don’t hesitate to report it to Google so that others don’t waste their time reading it.

3.3.5. Check the sources and credibility of the authors

There is something different about human writing, something more natural, more fluid. If you read an article and have reservations about some parts, listen to your intuition.

Check the sources and authors, look at the quality of the analysis and the complexity of the sentences, but above all, trust your judgment. If something seems too good to be true, chances are it is.

Remember that machines are capable of generating very compelling texts, but they still can’t replace the creativity and authenticity of human writing.

Chapter 4: How do we make AI-generated texts usable online?

Outside of small-volume texts, AI-generated texts are not usable as is… unless you care about your brand image.

AI-generated text is full of little flaws like the ones we discussed in the previous chapter, including repetition, sentences of the same pattern, inaccurate data, etc.

In this chapter, you’ll learn about the different ways to make text from AIs usable, fluid, and unique.

4.1. Strengthen generated text with AI

One of the main reasons why AI-generated text may not seem fluent is that it is formed from generic and sometimes low-quality data.

To make AI-generated text more authentic and fluent, it is crucial to reinforce it with richer and more specific information.

This can include texts written by professional authors, newspaper articles, and transcripts of actual speeches.

The more information you include in your content that is specific to your topic and comes from people or physical entities such as quotes, the closer the generated text will be to human writing.

You can even include this information directly in your AI queries. For example, you can ask ChatGPT to write you a text about SEO by accompanying your query with additional information like:

Recent statistics ;
Quotes from experts;
The difficulties that companies face in terms of SEO;
Etc.

In fact, the more detail you provide, the more likely the AI will be to use the information you’ve provided instead of generating a text with generic data it’s trained from.

Note that you need to provide it with varied and diverse data to ensure that the AI model can generate sentences in different contexts and writing styles.

4.2. Use automatic revision tools

AIs do not always generate error-free text and fluent sentences. In order to improve the quality and fluency of your AI-generated texts, it is therefore important to use automatic revision tools.

These tools can detect and correct grammatical errors, typos and inconsistencies in the text. They can also help improve the clarity and conciseness of the writing.

While there are many tools you can use, they each have their own features. Google Docs, for example, is a text editor that can quickly check for errors and offer suggestions for correction.

Antidote is another powerful tool that can identify spelling, grammatical and typographical errors.

Yoast for WordPress is an extension that makes your content easier to read.

These tools are also useful for checking the spelling of proper names, companies, brands and products. They can also check for overused AI terms or phrases and suggest synonyms.

These tools can be used to review text before publishing or sharing, ensuring that the content generated by the AI is quality and authentic.

4.3. Use human editors

Regardless of the AI, human editors must step in to fully review the content proposed by the AI. Human editors can add a personal touch and nuance to machine-generated text, which is often too formal and unnatural.

Human editors can also correct grammar, punctuation and syntax errors that may be present in AI-generated text.

As mentioned earlier, they can also add additional information and details to give more depth and context to the text.

By doing so, you can create content that is both authentic and fluid, while still maintaining the benefits of automatic content generation.

In addition, human editors can also help you ensure that AI-generated content complies with your company’s ethical standards and privacy policies as well as Google’s guidelines.

4.4. Use writing style techniques

Another way to improve the quality and flow of AI-generated text is to use writing style techniques.

This includes things like the “three-unit” rule (describing an idea in three sentences or short phrases) and the “show, don’t tell” method (showing rather than describing).

You can also use metaphors and similes to make your writing more vivid and immersive. These techniques can help give AI-generated text a more human and authentic feel, which can make the content easier to read and more enjoyable for users.

It’s important to note that these techniques should be used sparingly and strategically to avoid making the content too complex or difficult to understand.

4.5. Give context to the ideas contained in AI texts

This is about understanding the context in which the content will be used and making sure that the AI-generated content is relevant to that context.

There are different ways to provide context to create AI-generated content that feels authentic.

You can add context to your queries using prompts, this can help improve the AI’s understanding of context. For example, if you want to generate content for a cooking blog, it’s important to use prompts that focus on cooking, such as recipes, blog posts about cooking, etc.

It’s about giving more detail in your query, i.e. if you want them to write you a recipe for a pregnant woman or even without a given ingredient.

You can go even further and ask the AI to include information about ingredients, preparation steps, cooking times, etc.

4.6. Use plagiarism checking tools to ensure that the text generated by the AI is unique

Plagiarism checking tools are an effective way to ensure that AI-generated content is unique.

By using these tools, writers can verify that AI-generated content has not been copied from other existing sources. This ensures that the content is original and not simply a copy of something that has been written before.

When you discover parts that come from other sites, you will only have to rewrite them to get a unique content. That’s the whole point of having a human editor.

FAQ

What is AI-generated text detection?

AI-generated text detection is the process of identifying whether a piece of content was created by a human or by an Artificial Intelligence.

This ensures the quality and originality of the content, as well as compliance with the rules established by search engines.

Why is it important to detect texts generated using AI?

It is important to detect AI-generated text because it can be considered as automatically generated content by search engines, which can lead to penalties or exclusion of your site from the index.

In addition, AI-generated content can also lack quality and originality, which can damage your site’s user experience and reputation.

How can we detect AI-generated text?

There are several methods to detect AI-generated text, including using automatic detection tools, manual content analysis, and checking the quality and credibility of sources.

Can AI-generated text be used legitimately?

Yes, it is possible to use AI-generated text legitimately by editing and modifying it to make it unique, fluid, and authentic.

How can I make my AI-generated texts unique and authentic?

There are several methods to make AI-generated texts more unique and authentic, including the use of human editors, writing style techniques, and plagiarism checking. It is also important to continue to constantly monitor and improve AI results to avoid errors or suspicious content.

Can plagiarism detection tools help detect AI-generated text?

Yes, to some extent. Plagiarism detection tools can help detect AI-generated texts by comparing the content to other existing texts on the internet.

However, it is important to note that these tools cannot always detect AI-generated texts, as these texts may be slightly modified so that they do not exactly match other texts. Thus, plagiarism detection tools will not be able to detect them.

Therefore, it is important to use plagiarism detection tools in combination with other verification methods to be sure to detect all AI generated texts.

Can Google detect AI-generated text?

Yes and no. Google can detect AI-generated text by using algorithms to spot specific patterns or characteristics associated with automatically generated content.

However, it is not always possible for Google to detect AI-generated content without the help of human reviewers.

Therefore, it is important to note that if Google’s webspam team finds AI-generated content, they are authorized to take action on it. So you need to make sure that the AI-generated content is unique and authentic to avoid any risk of penalty.

In summary

It is important to remember that identifying AI-generated content is now necessary to ensure the quality and authenticity of the information that is read and that your site offers to its visitors.

The text verification tools we have shared in this article will allow you to detect texts automatically generated with AI.

These tools will not only prevent users from being inundated with AI-generated content, but they will also have an impact on the safety, quality and transparency of online information.

Not only will they be able to protect Internet users from erroneous or misleading information, but they will also allow websites to avoid being penalized for automatic text generation.

In addition, detection techniques, such as analyzing syntactic structure, looking for repetition of words and phrases, and verifying sources and author credibility can help identify such content.

Sites looking to take advantage of the rise of AI to get help with web writing tasks should use automatically generated text detection tools in conjunction with human editors.

This can help them create and publish unique, fluid and authentic content without getting caught by Google. Site owners need to be vigilant and know how to use different methods to identify AI-generated content so they can use it effectively.

Feel free to mention in the comments if you have any other concerns that you would like to provide answers to about AI.

Leave a comment Cancel reply