API Reference

This page provides detailed documentation for the BICAM Python API.

Main Functions

Downloader Class

Configuration

Datasets

Dataset definitions and metadata.

Utilities

Command Line Interface

The BICAM command-line interface provides easy access to all functionality.

Main Commands

bicam --help                    # Show help
bicam --version                 # Show version
bicam list-datasets             # List datasets
bicam download <dataset>        # Download dataset
bicam info <dataset>            # Show dataset info
bicam cache                     # Show cache info
bicam clear <dataset>           # Clear cache

Download Options

bicam download <dataset> [OPTIONS]

Options:
  --force, -f              Force re-download
  --cache-dir PATH         Custom cache directory
  --no-extract             Download only, do not extract
  --confirm                Skip confirmation for large datasets
  --quiet, -q              Suppress log outputs

List Options

bicam list-datasets [OPTIONS]

Options:
  --detailed, -d           Show detailed information
  --quiet, -q              Suppress log outputs

Info Options

bicam info <dataset> [OPTIONS]

Options:
  --quiet, -q              Suppress log outputs

Cache Options

bicam cache [OPTIONS]

Options:
  --quiet, -q              Suppress log outputs

Clear Options

bicam clear [OPTIONS] [DATASET]

Options:
  --all                    Clear all cached data
  --yes                    Confirm cache clear without prompt
  --quiet, -q              Suppress log outputs

Function Reference

download_dataset

bicam.download_dataset(dataset_type, force_download=False, cache_dir=None, confirm=False, quiet=False)

Download and load a BICAM dataset.

Parameters:

dataset_type (str) – Type of dataset to download. Options: ‘bills’, ‘amendments’, ‘members’, ‘nominations’, ‘committees’, ‘committeereports’, ‘committeemeetings’, ‘committeeprints’, ‘hearings’, ‘treaties’, ‘congresses’, ‘complete’
force_download (bool, optional) – Force re-download even if cached. Default: False
cache_dir (str or Path, optional) – Custom cache directory. Default: ~/.bicam/
confirm (bool, optional) – Skip confirmation prompts for large datasets (>1GB). Default: False
quiet (bool, optional) – Suppress log outputs. Default: False

Returns:

Path – Path to the extracted dataset directory

Examples:

import bicam
bills_path = bicam.download_dataset('bills')
print(f"Bills data available at: {bills_path}")

load_dataframe

bicam.load_dataframe(dataset_type, file_name=None, download=False, cache_dir=None, confirm=None, quiet=False, df_engine='pandas')

Load a BICAM dataset directly into a pandas DataFrame.

Parameters:

dataset_type (str) – Type of dataset to load. Options: ‘bills’, ‘amendments’, ‘members’, ‘nominations’, ‘committees’, ‘committeereports’, ‘committeemeetings’, ‘committeeprints’, ‘hearings’, ‘treaties’, ‘congresses’, ‘complete’
file_name (str, optional) – Specific CSV file to load. If None, loads the first available CSV file. For example: ‘bills_metadata.csv’, ‘members_current.csv’
download (bool, optional) – If True, download the dataset if not cached. If False (default), raise an error if dataset is not cached. Default: False
cache_dir (str or Path, optional) – Custom cache directory. Default: ~/.bicam/
confirm (bool, optional) – Skip confirmation prompts for large datasets (>1GB). If None (default) and download=True, automatically confirms for large datasets. If False, will prompt for confirmation even for large datasets. Default: None
quiet (bool, optional) – Suppress log outputs. Default: False
df_engine (str, optional) – DataFrame engine to use. Options: ‘pandas’ (default), ‘polars’, ‘dask’, ‘spark’, ‘duckdb’. Note: dask, spark, and duckdb require the respective packages to be installed. Default: “pandas”

Returns:

DataFrame – Loaded dataset as a DataFrame in the specified engine format

Raises:

ValueError – If dataset is not cached and download=False, or if file_name is invalid
FileNotFoundError – If the specified file doesn’t exist in the dataset
ImportError – If the specified df_engine is not available

Examples:

import bicam

# Load bills metadata (will download if not cached, auto-confirm for large datasets)
bills_df = bicam.load_dataframe('bills', 'bills_metadata.csv', download=True)
print(f"Loaded {len(bills_df)} bills")

# Load members data (will raise error if not cached)
try:
    members_df = bicam.load_dataframe('members', 'members_current.csv')
except ValueError as e:
    print(f"Dataset not cached: {e}")

# Force confirmation prompt even for large datasets
bills_df = bicam.load_dataframe('bills', download=True, confirm=False)

# Suppress all output during download
bills_df = bicam.load_dataframe('bills', download=True, quiet=True)

# Use polars engine (included by default)
bills_df = bicam.load_dataframe('bills', df_engine='polars')

# Use dask engine (requires dask installed)
bills_df = bicam.load_dataframe('bills', df_engine='dask')

# Use spark engine (requires pyspark installed)
bills_df = bicam.load_dataframe('bills', df_engine='spark')

# Use duckdb engine (requires duckdb installed)
bills_df = bicam.load_dataframe('bills', df_engine='duckdb')

# Load first available CSV file
df = bicam.load_dataframe('bills', download=True)

list_datasets

bicam.list_datasets()

List all available dataset types.

Returns:

list – List of available dataset names

Examples:

import bicam
datasets = bicam.list_datasets()
print(f"Available datasets: {datasets}")

get_dataset_info

bicam.get_dataset_info(dataset_type)

Get information about a specific dataset.

Parameters:

dataset_type (str) – Name of the dataset

Returns:

dict – Dataset information including size, description, and file list

Examples:

import bicam
info = bicam.get_dataset_info('bills')
print(f"Size: {info['size_mb']} MB")
print(f"Files: {info['files']}")

clear_cache

bicam.clear_cache(dataset_type=None)

Clear cached data.

Parameters:

dataset_type (str, optional) – Specific dataset to clear. If None, clears all cached data.

Examples:

import bicam

# Clear specific dataset
bicam.clear_cache('bills')

# Clear all cached data
bicam.clear_cache()

get_cache_size

bicam.get_cache_size()

Get cache size information.

Returns:

dict – Cache size information including total size and per-dataset breakdown

Examples:

import bicam
cache_info = bicam.get_cache_size()
print(f"Total cache size: {cache_info['total']}")

BICAMDownloader Class

The main downloader class for BICAM datasets.

class bicam.downloader.BICAMDownloader(cache_dir=None)

Parameters:

cache_dir (Path, optional) – Custom cache directory

Methods:

download(dataset_type, force_download=False, cache_dir=None, confirm=False, quiet=False)

Download and extract a dataset.

Parameters:

dataset_type (str) – Type of dataset to download
force_download (bool) – Force re-download even if cached
cache_dir (str) – Custom cache directory
confirm (bool) – Skip confirmation for large datasets
quiet (bool) – Suppress log outputs

Returns:

Path – Path to the extracted dataset directory

get_info(dataset_type)

Get information about a dataset.

Parameters:

dataset_type (str) – Name of the dataset

Returns:

dict – Dataset information

clear_cache(dataset_type=None)

Clear cached data.

Parameters:

dataset_type (str, optional) – Specific dataset to clear

get_cache_size()

Get cache size information.

Returns:

dict – Cache size information

Configuration

bicam.config.DEFAULT_CACHE_DIR: Default cache directory path.

bicam.config.MAX_RETRIES: Maximum number of retry attempts for downloads.

bicam.config.RETRY_DELAY: Delay between retry attempts in seconds.

Utility Functions

bicam.utils.format_bytes(bytes_value)

Format bytes into human-readable string.

Parameters:

bytes_value (int) – Number of bytes

Returns:

str – Formatted string (e.g., “1.5 MB”)

bicam.utils.estimate_download_time(size_mb, speed_mbps=10)

Estimate download time for a dataset.

Parameters:

size_mb (float) – Size in megabytes
speed_mbps (float) – Download speed in Mbps

Returns:

str – Estimated time string

bicam.utils.check_disk_space(path, required_mb)

Check if sufficient disk space is available.

Parameters:

path (Path) – Path to check
required_mb (float) – Required space in MB

Returns:

bool – True if sufficient space available

bicam.utils.verify_checksum(file_path, algorithm='sha256')

Verify file checksum.

Parameters:

file_path (Path) – Path to file
algorithm (str) – Hash algorithm

Returns:

str – Checksum string

Error Handling

BICAM functions may raise the following exceptions:

exception bicam.datasets.ValueError: Raised when an invalid dataset type is provided or when dataset information is not found.

exception bicam.datasets.OSError: Raised when there are file system or network issues.

exception bicam.datasets.Exception: Raised for other errors such as authentication failures or corrupted downloads.

Example Error Handling:

import bicam

try:
    bills_path = bicam.download_dataset('bills')
except ValueError as e:
    print(f"Invalid dataset: {e}")
except OSError as e:
    print(f"System error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Environment Variables

BICAM recognizes the following environment variables:

BICAM_DATA – Custom cache directory path
BICAM_LOG_LEVEL – Logging level (DEBUG, INFO, WARNING, ERROR)
BICAM_CHECK_VERSION – Enable/disable version checking

Example:

export BICAM_DATA=/custom/cache/path
export BICAM_LOG_LEVEL=DEBUG
python -c "import bicam; bicam.download_dataset('bills')"