Skip to main content
U.S. flag

An official website of the United States government

KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords

March 3, 2026

This code can be used to rapidly create an annotated bibliography that will help users navigate and synthesize a large body of literature. Users can input a set of PDF or DOCX files that they have identified as relevant to their question as well as lists of relevant search terms. This code will convert the input documents into TXT files, trim the files to exclude extraneous text such as the references section of a scientific paper, and transform the body of the document into tokens that are easily searchable. It will then count the number of times each supplied search term occurs in each source, as well as identify occurrences of North American states and provinces, and print them to a CSV file. This CSV file can be used to identify documents that are most likely to address specific aspects of a research question by sorting relevant search terms by frequency. The code also generates a set of figures that characterizes the nature and content of the sources, including 1) a map which shows the number of sources that referenced each North American state or province 2) a graph which shows the number of sources that referenced different ecosystem types, and 3) a heatmap which shows the number of sources that mention intersecting search terms. Users can easily customize their search term lists to apply this code to their specific research question.

Publication Year 2026
Title KWICer: Producing an annotated bibliography from a set of PDFs by quantifying keywords
DOI 10.5066/P1476GUY
Authors Lydia N Bailey, Dana M Varner, Sarah E Whipple
Product Type Software Release
Record Source USGS Asset Identifier Service (AIS)
USGS Organization Fort Collins Science Center
Rights This work is licensed under CC BY 4.0
Was this page helpful?