Skip to content

TalkBank/TBDBpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e7d97bc · Jan 22, 2024

History

14 Commits
Feb 11, 2021
Feb 10, 2021
Jan 22, 2024
Apr 8, 2022
Feb 11, 2021

Repository files navigation

TBDBpy

Python API to TalkBankDB

The TBDBpy package provides access to TalkBankDB data from Python.

TalkBankDB (www.talkbank.org/DB) is a database and set of tools for exploring TalkBank’s media and transcripts, specify data to be extracted, and pass these data on to statistical programs for further analysis. The TBDBpy package (TalkBankDataBase - Python) provides easy access to all information within TalkBankDB, including clinical collections. Clinical Banks are password protected. Visit www.talkbank.org to learn about gaining access to these collections.

Installation

You can install TBDBpy from GitHub using pip:

pip install git+https://github.com/TalkBank/TBDBpy.git

Then import tbdb:

import tbdb

Functionality

TBDBpy allows access to data from TalkBankDB through several functions. For example, to get a table of utterances from a particular transcript in the childes/Eng-NA/MacWhinney collection:

import tbdb
utts = tbdb.getTranscripts( {"corpusName": "childes", "corpora": [['childes', 'Eng-NA', 'MacWhinney', '010411a']]} )
utts
{
'colHeadings': ['path', 'filename', 'languages', 'media', 'date', 'pid', 'designType', 'activityType', 'groupType'], 
'data': [['childes/Eng-NA/MacWhinney/010411a', '010411a', 'eng', 'audio', '1979-05-06', '11312/c-00016447-1', 'long', 'toyplay', 'TD']]}

The available functions for accessing different data sets are below. Each function has documentation accessible through help(functionName), for example:

import tbdb

# View docs for tbdb module:
help(tbdb)

# View docs for getTranscripts:
help(tbdb.getTranscripts)

Functions to extract data from TalkBankDB are:

tbdb.getTranscripts()
tbdb.getParticipants()
tbdb.getTokens()
tbdb.getTokenTypes()
tbdb.getUtterances()
tbdb.getNgrams()
tbdb.getCQL()

Each of these functions take a dictionary parameter defining a corpusName and a set of optional fields to define a TalkBankDB request. Each returns a dictionary with two members:

{'colHeadings': [], 'data': [[]]}
  • colHeadings: List of strings describing columns in data.
  • data: List of lists, where each list represents a table row.

Additional functions return metadata about TalkBankDB:

tbdb.getPathTrees()
tbdb.validPath()

For troubleshooting, an additional function, validPath(), will return whether a given path is valid.

tbdb.validPath(['childes', 'childes', 'Clinical']);

If the path is not valid, it will return which level of the query is incorrect

tbdb.validPath(['childes', 'childes', 'somethingThatDoesNotExist'])

To access protected collections, include a final True parameter value for auth. With this final True param, a dialog will ask for the protected collection you are trying to access and to enter a username and password for it. If credentials are incorrect, a response describing the error is returned.

aphasia_transcrips = tbdb.getTranscripts({'corpusName': 'aphasia', 'corpora': [['aphasia', 'English', 'Aphasia', 'Adler']]}, True)

About

Python API to TalkBankDB.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages