This page in Swedish

AI and information retrieval

AI-generated image of a person sitting at a desk with a laptop and a stack of books. A robot is standing behind the person.

AI-genererated image from Bing Image Creator.

AI tools can be used to support and prepare information retrieval in several different ways. You can for instance use AI tools to find ideas for your thesis, suggested search terms, and articles.

Generative AI, ChatGPT and information retrieval

Tools based on generative AI such as ChatGPT may help you to obtain a quick overview of a topic. You can get help with the terminology used in the subject area, ask for summaries from different perspectives, and test your ideas.

Generative AI like ChatGPT is largely based on probability. If a sentence starts like this, how should it continue? These tools are usually unable to provide sources for their information because they do not search for sources. When they do indicate sources these may be completely fabricated.

When you are looking for scholarly articles as part of your studies, it is important to know that information is accurate and its source can be traced, and therefore it is better to use a database.

Information retrieval tools

Several tools use AI or other methods to help you to find similar material when you have already found some articles to start with. These tools may use citation analysis and semantic similarity to suggest additional articles. Some tools also use AI to interpret questions asked in natural language.

In citation analysis, further articles are suggested if they use the same source/s cited by the article you found.

In semantic searches, a language model is used to attempt to understand the meaning of a document or research question and to match it with similar content that does not necessarily use the same formulations. Instead of retrieving the results as a list of hits, some services present their results graphically and you can click your way to more related material.

Some tools use a technique called Retrieval Augmented Generation (RAG) to take advantage of language modelling and generative AI in information retrieval. The idea behind RAG is to use generative AI for understanding language but also to retrieve factual content from an external source. When searching, this could mean that a question is asked and answered in natural language while a background search is done in an academic database. The language model transforms the question into a query that the database understands and uses the articles found as context when answering the original question.

AI technology has already been incorporated into many of the more traditional databases the library subscribes to and several database providers plan to increase the use of AI in their services. AI can be used to extend search results, but also to rank results according to relevance or in some databases to generate summaries.

Which materials are searchable using the tool?

You should always ask this question when doing searches, but it is especially important when using new tools from new providers. Some tools only include freely available (open access) materials. The materials the library subscribes to cannot be found there. A specific tool may cover several broad subject areas or be only appropriate for finding information on certain topics. The type of material retrieved may also vary, and while some content is scholarly other materials may not be. As always when retrieving information as part of your studies you need to evaluate the sources you find.

On which data were the incorporated AI models trained?

A model is for instance trained to find similarities between articles using a large amount of data. Although very large quantities of data may be involved the amount of training data is necessarily always limited, for example regarding size, scope and currency. Sometimes the type of training data used is not described.

Imbalances in the training data may be reflected in the model and risk generating biases. If training was done on data from certain sources during a particular period, generalisations from this period and the sources will be seen in the delivered results. A model trained on technical literature may for example be less accurate when searching for related articles in the humanities and social sciences.

How is the information you enter used?

The prompts you enter or information you upload to the tool may be saved and used in a way beyond your control. Tools may be continually improving their models based on the data received from users. You should never enter sensitive data, personal data, or data that may not be freely distributed.

What transparency and replicability criteria should the search fulfil?

Transparency, openly accounting for how a result was achieved, is one of the fundamental principles of research and science and should always be applied in your studies. Replicability is another fundamental principle entailing that someone else should be able to repeat your process and achieve the same results.

Many AI tools have been described as ‘black box’ models. You provide input in the form of, for example, a research question or a number of articles, and the tool delivers a result. However, it may be difficult to pinpoint what exactly led to the specific result. Using AI to discover articles is therefore very different from using subject headings and search terms that you can see in the descriptions of articles.

When AI models are trained probability calculations are used, which means that the same input may generate different results. This affects both the transparency of these tools and how replicable the results themselves are. Irrespective of the chosen tool, you need to be open about which tools you used and the results you obtained.

Some examples of AI-based tools for searching and finding scholarly articles are listed below. Please note that the university library does neiter subscribe to nor provide support in any of the listed tools.

Semantic Scholar

Semantic Scholar is an AI-driven search tool from the Allen Institute for AI that identifies related articles both using citations and subject similarity. Originally it focused on computer science, neurology and geoscience, but today it covers a wide range of subjects.

According to the service provider, the tool includes data on more than 200 million scientific publications. Data is collected in partnership with publishers, but also by using crawlers. Many articles are summarised in a sentence. Some articles may be read in Semantic Reader, which includes functions to mark the most important parts of the article related to aim, method, results, and innovations. AI-generated definitions are also available for some terms.

Inciteful

Inciteful starts the search process with one or two articles. If one article is used, a graph with similar articles is generated. Important articles, authors and review articles in the network are also identified. If two articles are used, a graph is generated that indicates the relations between these articles via citations.

Inciteful uses data from OpenAlex, Semantic Scholar, Crossref, and OpenCitations.

ResearchRabbit

ResearchRabbit starts from one or more articles to suggest further articles. The relations between the articles are graphically visualised and you can click your way to new material.

The tool is free but registration is needed.

According to the service provider the tool includes hundreds of millions of articles.

Connected Papers

Connected Papers is a tool that creates a graph with related articles based on one article. The articles are grouped in clusters where those showing the greatest degree of similarity are displayed more closely together and those with fewer similarities are shown further from each other.

Connected Papers uses data from Semantic Scholar.

An account is needed to create a limited number of free graphs per month. Subscription is required to generate more graphs.

Elicit

Elicit either starts from a research question or a number of articles. Further articles can be retrieved using a function to identify similar research. The tool bases its answer on the question on the research retrieved. Each article is also briefly summarised in a sentence and different parts of the articles may be identified.

Elicit uses data from Semantic Scholar.

According to the service provider, Elicit works best for domains including empiric, experimental research. Two subject areas mentioned are biomedicine and machine learning.

Registration is needed to use the tool. With a free account, you can access basic features. For additional functionality, a subscription can be purchased.