- README
- Contact
Python scripts to reconcile a library's local holdings in Ex Libris Alma with OCLC's WorldCat database.
Some of the scripts require an API key:
For the update_alma_records.py
script, you will need an Ex Libris Developer
Network account and an API key (see the
Alma API documentation for
more details).
Once logged into the Ex Libris Developer Network, follow these instructions for creating an API key.
You should create both a Production and a Sandbox API key. That way, you can use
your Sandbox API key when testing the update_alma_records.py
script.
When you click the "Add Permission" button, choose "Bibs" for Area, either "Production" or "Sandbox" for Env (depending on which key you're creating), and "Read/write" for Permissions.
For the search_worldcat.py
and process_worldcat_records.py
scripts, you will
need an OCLC web service key (aka WSKey) with access to the WorldCat Metadata
API service. Follow these
instructions to request one.
When filling out the request form, be sure to choose "WorldCat Metadata API" under Services. Note that, at the time of writing, the WorldCat Metadata API service was only available when requesting a Production WSKey. If your Production WSKey request is approved and you do not also receive a Sandbox WSKey for the WorldCat Metadata API service, reach out to OCLC to ask for one, as this is very helpful when testing the scripts.
Even when using your Sandbox WSKey, you should still be careful when using the WorldCat Metadata API, as explained here.
For example, when running the process_worldcat_records.py
script with the
set_holding
or unset_holding
operation, your
Sandbox WSKey can update your institution's actual holdings in WorldCat. To
avoid this, make sure your input_file
consists exclusively of Test Sandbox
Records. (Your WSKey approval email from OCLC should include the OCLC numbers
for these Test Sandbox Records.)
In contrast, it is safe to use real WorldCat records when testing this script's
get_current_oclc_number
operation because it does not update the records.
All other content is released under CC-BY-4.0.
- Go into
oclc-reclamation
folder (i.e. root folder of repository) - Create and activate virtual environment:
python -m venv venv
source venv/bin/activate
- Install python dependencies:
pip install -r requirements.txt
- Add
.env
file to root folder (you can copy.env-example
)- To use the
update_alma_records.py
script, initialize these variables:ALMA_API_KEY
- See Alma API Key section for how to request one.
ALMA_API_BASE_URL
- If you're in North America, use
https://api-na.hosted.exlibrisgroup.com
- If not, look here for the base URL for your geographic region
- If you're in North America, use
- To use the
search_worldcat.py
andprocess_worldcat_records.py
scripts, initialize these variables:OCLC_INSTITUTION_SYMBOL
WORLDCAT_METADATA_API_KEY
WORLDCAT_METADATA_API_SECRET
- Your OCLC WSKey for the WorldCat Metadata API service will include both a key and secret. See OCLC Web Service Key section for how to request one.
- To use the
These scripts can take some time to complete, especially if they are processing many records. So be sure to disable your system's sleep settings prior to running any of the scripts. Otherwise, the scripts could get interrupted during execution.
For Mac users: You can prevent idle sleep while a script is running by using the
caffeinate
command-line tool. From the Terminal, just prepend caffeinate -i
to the desired script command. For example:
caffeinate -i python update_alma_records.py inputs/update_alma_records/filename.csv
With this approach, you won't have to adjust your sleep settings.
usage: search_worldcat.py [-h] [-v] [--search_my_library_holdings_first] input_file
positional arguments:
input_file the name and path of the input file, which must be in either
CSV (.csv) or Excel (.xlsx or .xls) format (e.g.
inputs/search_worldcat/filename.csv)
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--search_my_library_holdings_first
whether to first search WorldCat for your library's holdings.
- Use this option if you want to search in the following order:
1) Search with "held by" filter.
2) If there are no WorldCat search results held by your library,
then search without "held by" filter.
- Without this option, the default search order is as follows:
1) Search without "held by" filter.
2) If there is more than one WorldCat search result, then search with
"held by" filter to narrow down the results.
examples:
python search_worldcat.py inputs/search_worldcat/filename.csv
python search_worldcat.py --search_my_library_holdings_first inputs/search_worldcat/filename.csv
For required format of input file, see either:
inputs/search_worldcat/example.csv
inputs/search_worldcat/example.xlsx
Note that including the --search_my_library_holdings_first
optional argument
may increase or decrease the number of WorldCat Metadata API requests required
by the script. If you have many records to process and wish to minimize the
number of API requests made by the script, then consider running the script
with and without the --search_my_library_holdings_first
argument on an
input file containing a subset of your records. The script results will tell
you how many total API requests were made, as well as how many records needed a
single WorldCat API request vs. two WorldCat API requests. Based on these
results, you can predict whether the --search_my_library_holdings_first
argument will result in fewer API requests for your entire dataset.
Searches WorldCat for each record in the input file and saves the OCLC Number.
For each row in the input file, a WorldCat search is performed using the first available record identifier (in this order):
lccn_fixed
(i.e. a corrected version of thelccn
value; this is a column you would add to your input spreadsheet if needed in order to correct thelccn
value from the Alma record)lccn
isbn
(accepts multiple values separated by a semicolon)issn
(accepts multiple values separated by a semicolon)gov_doc_class_num_086
(i.e. MARC field 086: Government Document Classification Number)- If
gpo_item_num_074
(i.e. MARC field 074: GPO Item Number) is also available, then a combined search is performed (gov_doc_class_num_086
ANDgpo_item_num_074
). - If only
gpo_item_num_074
is available, then no search is performed.
- If
Outputs the following files:
outputs/search_worldcat/records_with_oclc_num.csv
: Records with one WorldCat match; hence, the OCLC Number has been foundoutputs/search_worldcat/records_with_zero_or_multiple_worldcat_matches.csv
: Records whose search returned zero or multiple WorldCat matchesoutputs/search_worldcat/records_with_errors_when_searching_worldcat.csv
: Records where an error was encountered- If any of the above output files already exists in the directory, then it is overwritten.
usage: update_alma_records.py [-h] [-v] [--batch_size BATCH_SIZE] input_file
positional arguments:
input_file the name and path of the input file, which must be in either CSV (.csv) or
Excel (.xlsx or .xls) format (e.g. inputs/update_alma_records/filename.csv)
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--batch_size BATCH_SIZE
the number of records to batch together when making each GET request to
retrieve Alma records. Must be between 1 and 100, inclusive (default is 1).
Larger batch sizes will result in fewer total Alma API requests.
examples:
python update_alma_records.py inputs/update_alma_records/filename.csv
python update_alma_records.py --batch_size 10 inputs/update_alma_records/filename.csv
For required format of input file, see either:
inputs/update_alma_records/example.csv
inputs/update_alma_records/example.xlsx
For batch sizes greater than 1, note the following:
- The script will make fewer
GET
requests to the Alma API because it will gather together multiple MMS IDs (up to the batch size) and then make a single GET request for all Alma records in that particular batch. This will reduce the total number ofGET
requests by a factor of the batch size. - However, if any MMS ID in the batch is invalid, then the entire
GET
request will fail and none of the Alma records from that batch will be updated. - Unlike
GET
requests, thePUT
request for updating an Alma record cannot be batched. So the script will make the same number ofPUT
requests regardless of the batch size.
Updates Alma records to have the corresponding OCLC number.
For each row in the input file, the corresponding OCLC number is added to the specified Alma record (indicated by the MMS ID), unless the Alma record already contains that OCLC number. If the Alma record contains non-matching OCLC numbers in an 035 field (in the subfield $a), those OCLC numbers are moved to the 019 field (as long as they are valid).
When processing each Alma record:
- The original record is saved in XML format as:
outputs/update_alma_records/xml/{mms_id}_original.xml
- If the record is updated, then it is added to
outputs/update_alma_records/records_updated.csv
and the modified Alma record is saved in XML format as:outputs/update_alma_records/xml/{mms_id}_modified.xml
- If the record is not updated because it already has the current OCLC number,
then it is added to:
outputs/update_alma_records/records_with_no_update_needed.csv
- If an error is encountered, then the record is added to:
outputs/update_alma_records/records_with_errors.csv
- For the above output files, if an XML file with the same name already exists in the directory, then it is overwritten. If a CSV file with the same name already exists, then it is appended to.
See main()
function's docstring (within update_alma_records.py
) to learn
about:
- how OCLC numbers are recognized and extracted
- what constitutes a valid OCLC number for the purposes of this script
- how invalid OCLC numbers are handled
usage: extract_record_identifiers.py [option] directory_with_xml_files [alma_records_with_current_oclc_num]
positional arguments:
directory_with_xml_files
the path to the directory containing the XML files to process
alma_records_with_current_oclc_num
the name and path of the CSV file containing the MMS IDs
of all Alma records with a current OCLC number
(e.g. inputs/extract_record_identifiers/alma_records_with_current_oclc_num.csv)
example: python extract_record_identifiers.py inputs/extract_record_identifiers/xml_files_to_extract_from/ inputs/extract_record_identifiers/alma_records_with_current_oclc_num.csv
To create/populate the directory_with_xml_files
, you will need to export the
XML files from Alma. Here's one approach:
- Recommended: Create the following directory for these XML files:
inputs/extract_record_identifiers/xml_files_to_extract_from/
- Create sets in Alma that contain the records whose holdings should be set in WorldCat. Begin each set name with the same prefix (e.g. "OCLC Reclamation") to facilitate easy retrieval of all sets.
- Export each set as an XML file:
- You can do this by running a job on each set.
- For Select Job to Run, choose "Export Bibliographic Records".
- Select the set you want to export.
- Choose "MARC21 Bibliographic" as the Output Format and "XML" as the Physical Format.
- If you want the XML file to be downloadable by others in your institution, choose "Institution" for Export Into Folder. Otherwise, leave it as "Private".
- When the job is complete, download the XML file to the desired directory,
e.g.
inputs/extract_record_identifiers/xml_files_to_extract_from/
.
For required format of the alma_records_with_current_oclc_num
input file, see:
inputs/extract_record_identifiers/example_file_for_alma_records_with_current_oclc_num.csv
For each XML file in the specified directory, the MMS ID and OCLC Number(s)
from each Alma record are extracted and appended to the appropriate
outputs/extract_record_identifiers/master_list_records
CSV file:
- If an error is encountered, then the record is added to:
outputs/extract_record_identifiers/master_list_records_with_errors.csv
- If the record's MMS ID appears in the optional
alma_records_with_current_oclc_num
input file, then the record is added to:outputs/extract_record_identifiers/master_list_records_with_current_oclc_num.csv
- Otherwise, the record is added to:
outputs/extract_record_identifiers/master_list_records_with_potentially_old_oclc_num.csv
- If any of the above output files already exists in the directory, then it is appended to (not overwritten).
usage: process_worldcat_records.py [-h] [-v] [--cascade {0,1}] operation input_file
positional arguments:
operation the operation to be performed on each row of the input file
(either get_current_oclc_number, set_holding, or unset_holding)
input_file the name and path of the file to be processed, which must be in CSV format
(e.g. inputs/process_worldcat_records/set_holding/filename.csv)
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--cascade {0,1} only applicable to the unset_holding operation: whether or not to execute
the operation if a local holdings record or local bibliographic record exists.
Choose either 0 or 1 (default is 0).
0 - don't unset holding if local holdings record or local bibliographic
record exists;
1 - unset holding and delete local holdings record and local bibliographic
record (if one exists)
examples:
python process_worldcat_records.py get_current_oclc_number inputs/process_worldcat_records/get_current_oclc_number/filename.csv
python process_worldcat_records.py set_holding inputs/process_worldcat_records/set_holding/filename.csv
python process_worldcat_records.py --cascade 0 unset_holding inputs/process_worldcat_records/unset_holding/filename.csv
Required format of input file:
- For
get_current_oclc_number
operation, see:inputs/process_worldcat_records/get_current_oclc_number/example.csv
- For
set_holding
operation, see:inputs/process_worldcat_records/set_holding/example.csv
- For
unset_holding
operation, see:inputs/process_worldcat_records/unset_holding/example.csv
- Important note: The
set_holding
andunset_holding
operations can update your institution's holdings even when using your OCLC Sandbox WSKey. To avoid this during testing, only include the OCLC numbers of Test Sandbox Records in this input file (see above section for more details).
Performs the specified operation on every record in the input file.
Gathers the maximum OCLC numbers possible before sending the appropriate request to the WorldCat Metadata API.
What each operation
does:
get_current_oclc_number
: For each row, check whether the given OCLC number is the current one.- If so, then add the record to:
outputs/process_worldcat_records/get_current_oclc_number/already_has_current_oclc_number.csv
- If not, then add the record to:
outputs/process_worldcat_records/get_current_oclc_number/needs_current_oclc_number.csv
- If an error is encountered, then add the record to:
outputs/process_worldcat_records/get_current_oclc_number/records_with_errors_when_getting_current_oclc_number.csv
- If so, then add the record to:
set_holding
: For each row, set holding for the given OCLC number.- If holding is set successfully, then add the record to:
outputs/process_worldcat_records/set_holding/records_with_holding_successfully_set.csv
- If holding was already set, then add the record to:
outputs/process_worldcat_records/set_holding/records_with_holding_already_set.csv
- If an error is encountered, then add the record to:
outputs/process_worldcat_records/set_holding/records_with_errors_when_setting_holding.csv
- If holding is set successfully, then add the record to:
unset_holding
: For each row, unset holding for the given OCLC number.- If holding is unset successfully, then add the record to:
outputs/process_worldcat_records/unset_holding/records_with_holding_successfully_unset.csv
- If holding was already unset, then add the record to:
outputs/process_worldcat_records/unset_holding/records_with_holding_already_unset.csv
- If an error is encountered, then add the record to:
outputs/process_worldcat_records/unset_holding/records_with_errors_when_unsetting_holding.csv
- Important note: Be careful when running the
unset_holding
operation with--cascade 1
. According to the WorldCat Metadata API documentation (search this page for the Institution Holdings section, then look for theDELETE
request on the/ih/datalist
endpoint),cascade
with value1
will unset the holding and delete the local holdings record and local bibliographic record (if one exists).
- If holding is unset successfully, then add the record to:
- If any of the above output files already exists in the directory, then it is appended to (not overwritten).
usage: compare_alma_to_worldcat.py [option] alma_records_file worldcat_records_directory
positional arguments:
alma_records_file the name and path of the CSV file containing the records
in Alma whose holdings **should be set in WorldCat**
(e.g. inputs/compare_alma_to_worldcat/alma_master_list.csv);
this file should consist of a single column with one
OCLC number per row
worldcat_records_directory
the path to the directory of files containing the records
whose holdings **are currently set in WorldCat** for your
institution; each file should be in text (.txt) or
CSV (.csv) format and consist of a single column with one
OCLC number per row
example: python compare_alma_to_worldcat.py inputs/compare_alma_to_worldcat/alma_records_file.csv inputs/compare_alma_to_worldcat/worldcat_records/
For required format of the alma_records_file
input file, see:
inputs/compare_alma_to_worldcat/example_alma_records_file.csv
To create/populate the worldcat_records_directory
:
- Use OCLC WorldShare to export the bibliographic records for all your institution's WorldCat holdings.
- Use MarcEdit (which you'll need to download and install locally) to pull only the OCLC number (035 $a) out of these records.
- This should leave you with a directory of text (.txt) files in the following format:
035$a
"(OCoLC)00000001"
"(OCoLC)00000002"
"(OCoLC)00000003"
- Use this directory as the
worldcat_records_directory
. - See these instructions for more details.
Compares the Alma records which should be set in WorldCat to the current WorldCat holdings.
Outputs the following files:
outputs/compare_alma_to_worldcat/records_with_no_action_needed.csv
: The OCLC numbers found in both thealma_records_file
and theworldcat_records_directory
outputs/compare_alma_to_worldcat/records_to_set_in_worldcat.csv
: The OCLC numbers found in thealma_records_file
but not theworldcat_records_directory
outputs/compare_alma_to_worldcat/records_to_unset_in_worldcat.csv
: The OCLC numbers found in theworldcat_records_directory
but not thealma_records_file
- If any of the above output files already exists in the directory, then it is overwritten.
Here is one way you can use these scripts to perform an OCLC reclamation:
- For all relevant Alma records without an OCLC Number, prepare input
spreadsheet(s) for
search_worldcat.py
script.- Each row of the input spreadsheet must contain the record's MMS ID and at least one of the following record identifiers: LCCN, ISBN, ISSN, Government Document Classification Number (MARC field 086).
- For the correct column headings, see
inputs/search_worldcat/example.csv
.
- Run
search_worldcat.py
script using the input spreadsheet(s) created in the previous step.- Review the 3 spreadsheets output by the script.
- If relevant, send the following 2 spreadsheets to your Cataloging Team
(they'll need to manually add the OCLC Number to these Alma records):
outputs/search_worldcat/records_with_zero_or_multiple_worldcat_matches.csv
outputs/search_worldcat/records_with_errors_when_searching_worldcat.csv
- Run
update_alma_records.py
script using the following input file:outputs/search_worldcat/records_with_oclc_num.csv
(one of the spreadsheets output bysearch_worldcat.py
).- Review the 3 spreadsheets output by the script, and then rename them (that way, when you run this script again later, it will output new spreadsheets rather than append to these existing spreadsheets).
- If relevant, send
outputs/update_alma_records/records_with_errors.csv
to your Cataloging Team. They'll need to manually add the OCLC Number to these Alma records.
- Run
extract_record_identifiers.py
script.- For the
directory_with_xml_files
input, follow these instructions. You'll have to finalize the reclamation project sets (i.e. the sets containing all the Alma records that should be in WorldCat) before you export them as XML files. - For the
alma_records_with_current_oclc_num
input file, combine the MMS ID column fromoutputs/update_alma_records/records_updated.csv
andoutputs/update_alma_records/records_with_no_update_needed.csv
(two of the spreadsheets output byupdate_alma_records.py
). The resulting CSV file should have a single column named "MMS ID". - Review the 3 spreadsheets output by the script.
- If relevant, send
outputs/extract_record_identifiers/master_list_records_with_errors.csv
to your Cataloging Team. They'll need to manually fix these Alma records (some possible problems might include multiple OCLC Numbers, invalid OCLC Numbers, or no OCLC Number at all).
- For the
- Run
process_worldcat_records.py
script using theget_current_oclc_number
operation and the following input spreadsheet:outputs/extract_record_identifiers/master_list_records_with_potentially_old_oclc_num.csv
(one of the spreadsheets output byextract_record_identifiers.py
).- You'll need to make sure that this input spreadsheet adheres to
inputs/process_worldcat_records/get_current_oclc_number/example.csv
(in terms of the column headings). - Review the 3 spreadsheets output by the script.
- If relevant, send
outputs/process_worldcat_records/get_current_oclc_number/records_with_errors_when_getting_current_oclc_number.csv
to your Cataloging Team. They'll need to manually fix these Alma records.
- You'll need to make sure that this input spreadsheet adheres to
- Run
update_alma_records.py
script using the following input file:outputs/process_worldcat_records/get_current_oclc_number/needs_current_oclc_number.csv
(one of the spreadsheets output byprocess_worldcat_records.py
in the previous step).- Review the 3 spreadsheets output by the script.
- If relevant, send
outputs/update_alma_records/records_with_errors.csv
to your Cataloging Team. They'll need to manually add the OCLC Number to these Alma records.
- Create the Alma Master List spreadsheet, which contains the OCLC number of
each Alma record whose holding should be set in WorldCat for your institution
(this CSV file should have a single column named "OCLC Number"). Populate this
spreadsheet as follows:
- Add all OCLC numbers from
outputs/extract_record_identifiers/master_list_records_with_current_oclc_num.csv
(one of the spreadsheets output byextract_record_identifiers.py
). - Add all OCLC numbers from
outputs/process_worldcat_records/get_current_oclc_number/already_has_current_oclc_number.csv
(one of the spreadsheets output byprocess_worldcat_records.py
using theget_current_oclc_number
operation). - Add all OCLC numbers from the following spreadsheets (which were output
by
update_alma_records.py
in the previous step):outputs/update_alma_records/records_updated.csv
outputs/update_alma_records/records_with_no_update_needed.csv
- Add all OCLC numbers from
- Create the WorldCat Holdings List, a directory of
.txt
files containing the OCLC number for all records whose holdings are currently set in WorldCat for your institution (each file should contain a single column named "035$a").- To do this, use OCLC WorldShare to export the bibliographic records for all your institution's holdings. See these instructions for more details.
- Run
compare_alma_to_worldcat.py
script using the Alma Master List spreadsheet as thealma_records_file
input and the WorldCat Holdings List directory as theworldcat_records_directory
input.- Review the 3 spreadsheets output by the script.
- Run
process_worldcat_records.py
script using theset_holding
operation and the following input spreadsheet:outputs/compare_alma_to_worldcat/records_to_set_in_worldcat.csv
(one of the spreadsheets output bycompare_alma_to_worldcat.py
).- Review the 3 spreadsheets output by the script.
- If relevant, send
outputs/process_worldcat_records/set_holding/records_with_errors_when_setting_holding.csv
to your Cataloging Team. For each record:- They may need to manually set the holding in WorldCat.
- They may also want to find the corresponding Alma record and fix it.
- Decide whether
outputs/compare_alma_to_worldcat/records_to_unset_in_worldcat.csv
(one of the spreadsheets output bycompare_alma_to_worldcat.py
) represents the records you truly want to unset.- This spreadsheet represents the OCLC numbers found in the
worldcat_records_directory
but not thealma_records_file
. So you have to be sure that thealma_records_file
(i.e. the Alma Master List) contains all records whose holdings should be set in WorldCat for your institution. - If the
alma_records_file
is missing relevant records (perhaps because your Cataloging Team is manually fixing these Alma records), thenoutputs/compare_alma_to_worldcat/records_to_unset_in_worldcat.csv
will contain records that should not be unset. - There are different reasons why the
alma_records_file
might be missing relevant records. For example, scripts may have encountered errors with certain records. - So in addition to reviewing
outputs/compare_alma_to_worldcat/records_to_unset_in_worldcat.csv
, you may want to manually review the other scripts' outputs (especially the error spreadsheets).
- This spreadsheet represents the OCLC numbers found in the
- If you have a
records_to_unset_in_worldcat.csv
file that you are comfortable is accurate, then run theprocess_worldcat_records.py
script using theunset_holding
operation with this input file.- See these instructions for more details.
- Review the 3 spreadsheets output by the script.
- If relevant, send
outputs/process_worldcat_records/unset_holding/records_with_errors_when_unsetting_holding.csv
to your Cataloging Team. For each record:- They may need to manually unset the holding in WorldCat.
- They may also want to find the corresponding Alma record and fix it.
@scottsalvaggio
@freyesdulib, @jrynhart, @kimpham54
Ways to get in touch:
- Contact the Digital Infrastructure & Technology Coordinator at University of Denver, Library Technology Services
- Create an issue in this repository