top of page

Hana Metzger's e-Portfolio

Competency E: Information Retrieval Systems

Section 1
Design, query, and evaluate information retrieval systems.

Section 1A: Competency Description and Scope
Information retrieval (IR) systems are a way of organizing information to enable access to that information. For example, library users may search an IR system known as an online public access catalogs (OPACs) in order to find a book at their public library. IR systems are not limited to OPACs, however; other common types include internet search engines, online databases, and digital libraries (Chowdhury, 2010, pp. 8-9). In this section of this competency, I will discuss the design, use (querying), and evaluation of information systems.

Design
To design an IR system, one must first research the two sides of your equation: your users and your information. Weedman (2018) calls this "defining the problem to be solved" and recommends gleaning as much knowledge as possible about one's user base in particular (p. 175).

The next step is to decide on representation, or metadata. The simplest definition of metadata is that it is "data about data" (Riley, 2017, p. 1). I sometimes find metadata simplest to understand by example: metadata about an e-book, for example, might include its title and author ("descriptive metadata), its file size ("technical metadata"), and its copyright holder and date ("rights metadata"). When designing an IR system, it is important to define what types of metadata you will include or exclude.

Next, one must create a search engine, whether analog (e.g., a file cabinet) or electronic (e.g., an OPAC or Internet search engine such as Google). Most search engines will also need an index to search, rather than searching each document in its entirety (Weedman, 2018). Finally, the final step in designing an IR system is to test and evaluate it.

Querying
How one queries an IR system is determined by the type of system itself. Internet search engines such as Google or Bing allow users to use natural language, whereas databases are often structured around controlled vocabularies (Harpring, 2010). The former is easier for people to use without any training, but the latter can provide more focused and relevant results once users learn the controlled vocabulary.

Additionally, IRs vary greatly in scope. Some, such as Google, are known for crawling a large portion of the open web; conversely, databases tend to be limited in scope to specific subjects or collections. A large scope does not necessarily mean more useful information, of course. Five excellent results are more useful than 5,000 irrelevant ones.

Finally, it helps to know which fields are searchable and which types of searching are allowed. Advanced search options and Boolean operators can be quite helpful in performing a more accurate search (Weedman, pp. 181-182). Some IRs may also include tools such as an index for browsing or a list of the controlled vocabulary being used. Web search engines tend to have fewer advanced options and therefore do not allow for as much search customization and functionality as many databases and OPACs.

Evaluation
Evaluating IR systems consists of identifying criteria to evaluate and weighing these against the needs of your users. Not all IR systems and users necessitate the same criteria. A person looking up the population of California out of curiosity is a quite different user from a scholar who needs the most recent and accurate population count for research purposes. For this reason, user-centered design, which centers the needs of the user, is an important guiding principle when evaluating IR systems (Toms, 2012). For librarians creating IR systems, user-centered design may entail multiple rounds of user testing at different stages in order to create the best possible product.

Some of the criteria that an information professional may wish to test include the organization system used, the information itself, and the efficiency and accuracy of results. As mentioned earlier, different organization systems may be used, including natural language or controlled vocabulary. Another possible type of metadata, most frequently seen in social media, is the use of non-controlled keywords or tagging (e.g., #libraries). The benefit of this method is that, unlike in a controlled vocabulary, many users can tag documents with keywords. The downside is that users may use many different key words (e.g., #library, #libraries, or #publiclibrary) or even misspell the tag. This may create a system with a greater amount of metadata but less controlled searching by users. Similarly, searching an internet search engine such as Google generally yields thousands more results than searching an OPAC, but the results may be irrelevant or even factually incorrect. Librarians must use criteria such as these to evaluate IR systems.

Section 1B: Importance to the Profession
Through the use of internet search engines, IR systems have become commonplace in American life. With information so readily available to anyone with internet access, one might wonder why it is necessary that librarians can confidently design, query, and evaluate IR systems. In fact, it is not just necessary but vital. Misinformation is rife on the internet (Kozyreva, Lewandowsky, & Hertwig, 2020). Searching for information on internet engines such as Google often yields unofficial results from sources that have not been fact-checked or peer-reviewed (Torres, 2023). As information professionals, librarians are part of the fight against misinformation. Understanding and evaluating IR systems is fundamental to that fight, as it helps librarians to provide accurate information (when librarians seek information on behalf of a patron) as well as teach patrons how to distinguish accurate information from inaccurate (when showing patrons where and how to search for information).

Librarians who understand IR systems will also excel at providing patrons with accurate information because they know the tools to retrieve the information. When searching a database with a controlled vocabulary, librarians trained in IR systems will be able to use the correct vocabulary to find the best results. This is an invaluable skill in our current era of (mis)information.

Section 2
Here I will provide three evidentiary items for Competency E.

Section 2A: Preparation
To prepare for this competency, I took INFO 202: Information Retrieval System Design with Professor Alison Johnson. This class taught me a lot about IR systems and the concepts associated with them, from controlled vocabularies to Boolean searches. It even gave me the opportunity to design my own and evaluate others' creations. I also took INFO 210: Reference and Information Services with Dr. José A. Aguiñaga, who taught me a great deal about searching databases and the internet for information. This class was focused on reference, and I gained a lot of experience searching IR systems that I will be able to carry forward with me as a librarian. Finally, I took INFO 256: Archives and Manuscripts with Dr. David D. Lorenzo. In this class, I had the opportunity to learn about archival reference resources, familiarize myself with the concepts of archival metadata, and evaluate different archival reference resources.

Section 2B: Evidence

Evidentiary Object 1: INFO 202 Database Design Prototype

For this assignment, I worked with four classmates to design a database that would categorize chairs for sale online. Our group met frequently and did the majority of the work with each other through discussions. I contributed during these meetings and also wrote the introduction to our assignment. This assignment shows my competency at designing an information retrieval system. As I wrote in the introduction, we first had to research our database's potential users. A fundamental part of designing IR systems is deciding upon metadata. This assignment shows my competency at creating metadata and deciding upon the best way to search the metadata. For example, one possible piece of metadata is labelled "chair arms," that is, whether or not a chair has arms.  This is searchable by users with a simple checkbox; a check indicates that arms are present. Conversely, for the metadata of "colors," a controlled vocabulary was allowed and users had to enter the color that best matched their desired color. As discussed above, designing an IR system involves simultaneous consideration of both the potential user and the data at hand.

Although our group used chairs to fulfill this assignment, the concepts of user and metadata can be easily transferred to a more typical library IR system, such as an OPAC. In an online library catalog, advanced searches typically offer searchable metadata. For example, San José State's King Library online allows users to check a box to return peer-reviewed documents or to specify a date range with years.

Evidentiary Object 2: INFO 210 Search Activity

In this assignment for INFO 210, I performed a series of different searches using specified criteria in different IR systems. I then provided screenshots of each search as well as some light analysis. This assignment shows that I am competent at querying IR systems. I performed searches using a databases (ProQuest ABI/Inform), internet search engines (Google and Duck Duck Go), an OPAC (San José State's King Library online), a union catalog (WorldCat), and other databases and IR systems (Google Books, OneSearch, and Google Scholar). I also demonstrated my competency at querying IR systems by accurately using filters, keywords, and Boolean terms to narrow my results. At the end of each of the four sections, I evaluated my own searches and results, thus demonstrating my understanding of the varying uses and benefits of different IR systems depending on the user and the user's need. For example, I compared the merits of using Google Books, OneSearch, and WorldCat to find books related to the Atacama desert, discussing the results in relation to accessibility and user preference. Understanding the role of users and their needs is, for me, an inseparable part of being competent at selecting and querying IR systems.

Evidentiary Object 3: INFO 256 Reference Resources Evaluation

For this assignment, I evaluated six online archival reference resources, the UK National Archives, the Library of Congress Finding Aids, OCLC WorldCat, the Mountain West Digital Library, the Online Archive of California, and ArchiveGrid. I first describe each IR system using criteria such as the number and type of objects described by the IR system as well as the search functionality of the site. Then I performed the same search at each resource, analyzing the scope and quality of the results as well as the ease of site navigation.

This essay demonstrates my competence at evaluating information retrieval systems. To evaluate IR systems, one must first choose criteria to compare. In this case, I selected the scope and quality of the results and ease of navigation. Understanding the user's needs is also an essential part of evaluation. Some users may desire a broad number of results, for example, whereas others only need one or two excellent results. I find that the former is often true in the early stages of research in particular. Knowing this, I evaluated the IR systems with the understanding that some IR systems would be better suited for some users at different times. Expressing this thought in the essay demonstrates my knowledge that different IR systems are appropriate at different times and for different users.

Section 3: Conclusion
Designing, querying, and evaluating information retrieval systems was one of the most interesting and useful skills that I learned at San José State's iSchool. Competency at IR systems is an invaluable professional skill that will help librarians provide better information and services to their communities. Going forward, I plan to continue to stay up-to-date on IR systems by reading relevant books (such as Safiya Noble's Algorithms of Oppression), performing my own search tests of different IR systems, and staying abreast of the news. I am curious to see how the recent growth of AI will affect IR systems and information sources in general.

 
References
Chowdhury, G. G. (2010). Introduction to Modern Information Retrieval (3rd ed.). Facet Publishing. https://www.alastore.ala.org/sites/default/files/pdfs/chowdhuryIR1.pdf

Harpring, P. (2010). What are controlled vocabularies? In Baca, M. (Ed.), Introduction to controlled vocabularies: Terminology for art, architecture, and other cultural works (pp. 12–26). https://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/what.pdf

Kozyreva, A., Lewandowsky, S., & Hertwig, R. (2020). Citizens versus the internet: Confronting digital challenges with cognitive tools. Psychological Science in the Public Interest, 21(3). https://doi.org/10.1177/1529100620946707

Riley, J. (2017). Understanding metadata: What is metadata, and what is it for? National Information Standards Organization. https://groups.niso.org/higherlogic/ws/public/download/17446/Understanding%20Metadata.pdf

Toms, E. (2012). User-centered design of information systems. In M. J. Bates (Ed.), Understanding information retrieval systems. Taylor & Francis Group. Retrieved from https://learning.oreilly.com/library/view/understanding-information-retrieval/9781439891995/OEBPS/9781466551350_epub_c05_r1.xhtml#sec5_3_1_5

Torres, G. (2023). Problematic information in Google web search? Scrutinizing the results from U.S. election-related queries. In R. Rogers (Ed.), The propagation of misinformation in social media (pp. 33–47). https://doi.org/10.1515/9789048554249

Weedman, J. (2018). Information retrieval: Designing, querying, and evaluating information systems. In Haycock, K., & Romaniuk, M.-J. (Eds.), The portable MLIS: Insights from the experts (2nd ed., pp. 171–185). Libraries Unlimited.

bottom of page