API Reference

This page provides detailed documentation for the BICAM Python API.

Main Functions

Downloader Class

Configuration

Datasets

Dataset definitions and metadata.

Utilities

Command Line Interface

The BICAM command-line interface provides easy access to all functionality.

Main Commands

bicam --help                    # Show help
bicam --version                 # Show version
bicam list-datasets             # List datasets
bicam download <dataset>        # Download dataset
bicam info <dataset>            # Show dataset info
bicam cache                     # Show cache info
bicam clear <dataset>           # Clear cache

Download Options

bicam download <dataset> [OPTIONS]

Options:
  --force, -f              Force re-download
  --cache-dir PATH         Custom cache directory
  --no-extract             Download only, do not extract
  --confirm                Skip confirmation for large datasets
  --quiet, -q              Suppress log outputs

List Options

bicam list-datasets [OPTIONS]

Options:
  --detailed, -d           Show detailed information
  --quiet, -q              Suppress log outputs

Info Options

bicam info <dataset> [OPTIONS]

Options:
  --quiet, -q              Suppress log outputs

Cache Options

bicam cache [OPTIONS]

Options:
  --quiet, -q              Suppress log outputs

Clear Options

bicam clear [OPTIONS] [DATASET]

Options:
  --all                    Clear all cached data
  --yes                    Confirm cache clear without prompt
  --quiet, -q              Suppress log outputs

Function Reference

download_dataset

bicam.download_dataset(dataset_type, force_download=False, cache_dir=None, confirm=False, quiet=False)

Download and load a BICAM dataset.

Parameters:

  • dataset_type (str) – Type of dataset to download. Options: ‘bills’, ‘amendments’, ‘members’, ‘nominations’, ‘committees’, ‘committeereports’, ‘committeemeetings’, ‘committeeprints’, ‘hearings’, ‘treaties’, ‘congresses’, ‘complete’

  • force_download (bool, optional) – Force re-download even if cached. Default: False

  • cache_dir (str or Path, optional) – Custom cache directory. Default: ~/.bicam/

  • confirm (bool, optional) – Skip confirmation prompts for large datasets (>1GB). Default: False

  • quiet (bool, optional) – Suppress log outputs. Default: False

Returns:

  • Path – Path to the extracted dataset directory

Examples:

import bicam
bills_path = bicam.download_dataset('bills')
print(f"Bills data available at: {bills_path}")

load_dataframe

bicam.load_dataframe(dataset_type, file_name=None, download=False, cache_dir=None, confirm=None, quiet=False, df_engine='pandas')

Load a BICAM dataset directly into a pandas DataFrame.

Parameters:

  • dataset_type (str) – Type of dataset to load. Options: ‘bills’, ‘amendments’, ‘members’, ‘nominations’, ‘committees’, ‘committeereports’, ‘committeemeetings’, ‘committeeprints’, ‘hearings’, ‘treaties’, ‘congresses’, ‘complete’

  • file_name (str, optional) – Specific CSV file to load. If None, loads the first available CSV file. For example: ‘bills_metadata.csv’, ‘members_current.csv’

  • download (bool, optional) – If True, download the dataset if not cached. If False (default), raise an error if dataset is not cached. Default: False

  • cache_dir (str or Path, optional) – Custom cache directory. Default: ~/.bicam/

  • confirm (bool, optional) – Skip confirmation prompts for large datasets (>1GB). If None (default) and download=True, automatically confirms for large datasets. If False, will prompt for confirmation even for large datasets. Default: None

  • quiet (bool, optional) – Suppress log outputs. Default: False

  • df_engine (str, optional) – DataFrame engine to use. Options: ‘pandas’ (default), ‘polars’, ‘dask’, ‘spark’, ‘duckdb’. Note: dask, spark, and duckdb require the respective packages to be installed. Default: “pandas”

Returns:

  • DataFrame – Loaded dataset as a DataFrame in the specified engine format

Raises:

  • ValueError – If dataset is not cached and download=False, or if file_name is invalid

  • FileNotFoundError – If the specified file doesn’t exist in the dataset

  • ImportError – If the specified df_engine is not available

Examples:

import bicam

# Load bills metadata (will download if not cached, auto-confirm for large datasets)
bills_df = bicam.load_dataframe('bills', 'bills_metadata.csv', download=True)
print(f"Loaded {len(bills_df)} bills")

# Load members data (will raise error if not cached)
try:
    members_df = bicam.load_dataframe('members', 'members_current.csv')
except ValueError as e:
    print(f"Dataset not cached: {e}")

# Force confirmation prompt even for large datasets
bills_df = bicam.load_dataframe('bills', download=True, confirm=False)

# Suppress all output during download
bills_df = bicam.load_dataframe('bills', download=True, quiet=True)

# Use polars engine (included by default)
bills_df = bicam.load_dataframe('bills', df_engine='polars')

# Use dask engine (requires dask installed)
bills_df = bicam.load_dataframe('bills', df_engine='dask')

# Use spark engine (requires pyspark installed)
bills_df = bicam.load_dataframe('bills', df_engine='spark')

# Use duckdb engine (requires duckdb installed)
bills_df = bicam.load_dataframe('bills', df_engine='duckdb')

# Load first available CSV file
df = bicam.load_dataframe('bills', download=True)

list_datasets

bicam.list_datasets()

List all available dataset types.

Returns:

  • list – List of available dataset names

Examples:

import bicam
datasets = bicam.list_datasets()
print(f"Available datasets: {datasets}")

get_dataset_info

bicam.get_dataset_info(dataset_type)

Get information about a specific dataset.

Parameters:

  • dataset_type (str) – Name of the dataset

Returns:

  • dict – Dataset information including size, description, and file list

Examples:

import bicam
info = bicam.get_dataset_info('bills')
print(f"Size: {info['size_mb']} MB")
print(f"Files: {info['files']}")

clear_cache

bicam.clear_cache(dataset_type=None)

Clear cached data.

Parameters:

  • dataset_type (str, optional) – Specific dataset to clear. If None, clears all cached data.

Examples:

import bicam

# Clear specific dataset
bicam.clear_cache('bills')

# Clear all cached data
bicam.clear_cache()

get_cache_size

bicam.get_cache_size()

Get cache size information.

Returns:

  • dict – Cache size information including total size and per-dataset breakdown

Examples:

import bicam
cache_info = bicam.get_cache_size()
print(f"Total cache size: {cache_info['total']}")

BICAMDownloader Class

The main downloader class for BICAM datasets.

class bicam.downloader.BICAMDownloader(cache_dir=None)

Parameters:

  • cache_dir (Path, optional) – Custom cache directory

Methods:

download(dataset_type, force_download=False, cache_dir=None, confirm=False, quiet=False)

Download and extract a dataset.

Parameters:

  • dataset_type (str) – Type of dataset to download

  • force_download (bool) – Force re-download even if cached

  • cache_dir (str) – Custom cache directory

  • confirm (bool) – Skip confirmation for large datasets

  • quiet (bool) – Suppress log outputs

Returns:

  • Path – Path to the extracted dataset directory

get_info(dataset_type)

Get information about a dataset.

Parameters:

  • dataset_type (str) – Name of the dataset

Returns:

  • dict – Dataset information

clear_cache(dataset_type=None)

Clear cached data.

Parameters:

  • dataset_type (str, optional) – Specific dataset to clear

get_cache_size()

Get cache size information.

Returns:

  • dict – Cache size information

Configuration

bicam.config.DEFAULT_CACHE_DIR

Default cache directory path.

bicam.config.MAX_RETRIES

Maximum number of retry attempts for downloads.

bicam.config.RETRY_DELAY

Delay between retry attempts in seconds.

Utility Functions

bicam.utils.format_bytes(bytes_value)

Format bytes into human-readable string.

Parameters:

  • bytes_value (int) – Number of bytes

Returns:

  • str – Formatted string (e.g., “1.5 MB”)

bicam.utils.estimate_download_time(size_mb, speed_mbps=10)

Estimate download time for a dataset.

Parameters:

  • size_mb (float) – Size in megabytes

  • speed_mbps (float) – Download speed in Mbps

Returns:

  • str – Estimated time string

bicam.utils.check_disk_space(path, required_mb)

Check if sufficient disk space is available.

Parameters:

  • path (Path) – Path to check

  • required_mb (float) – Required space in MB

Returns:

  • bool – True if sufficient space available

bicam.utils.verify_checksum(file_path, algorithm='sha256')

Verify file checksum.

Parameters:

  • file_path (Path) – Path to file

  • algorithm (str) – Hash algorithm

Returns:

  • str – Checksum string

Error Handling

BICAM functions may raise the following exceptions:

exception bicam.datasets.ValueError

Raised when an invalid dataset type is provided or when dataset information is not found.

exception bicam.datasets.OSError

Raised when there are file system or network issues.

exception bicam.datasets.Exception

Raised for other errors such as authentication failures or corrupted downloads.

Example Error Handling:

import bicam

try:
    bills_path = bicam.download_dataset('bills')
except ValueError as e:
    print(f"Invalid dataset: {e}")
except OSError as e:
    print(f"System error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Environment Variables

BICAM recognizes the following environment variables:

  • BICAM_DATA – Custom cache directory path

  • BICAM_LOG_LEVEL – Logging level (DEBUG, INFO, WARNING, ERROR)

  • BICAM_CHECK_VERSION – Enable/disable version checking

Example:

export BICAM_DATA=/custom/cache/path
export BICAM_LOG_LEVEL=DEBUG
python -c "import bicam; bicam.download_dataset('bills')"