Data Classification

Automatic Data Classification

Discover and classify sensitive data across all your databases. Identify PII, financial data, health records, and credentials automatically with intelligent pattern matching and machine learning.

Know What Data You Have

You can't protect what you don't know about. DB Audit's classification engine automatically scans your databases to discover and categorize sensitive data, helping you understand your data landscape and meet compliance requirements.

200+

Built-in data patterns

99.2%

Classification accuracy

Minutes

Time to full scan

What We Detect

Personal Identifiable Information (PII)

Automatically detect names, addresses, phone numbers, email addresses, and other personal data that can identify individuals.

Full names and aliases
Home and work addresses
Phone numbers and email addresses
Date of birth and age

Financial Data

Identify credit card numbers, bank accounts, transaction data, and other sensitive financial information.

Credit card numbers (PCI DSS)
Bank account and routing numbers
Transaction amounts and history
Tax identification numbers

Protected Health Information (PHI)

Detect medical records, diagnoses, treatment plans, and other HIPAA-protected health data.

Medical record numbers
Diagnoses and conditions
Prescription information
Insurance policy details

Authentication Credentials

Find passwords, API keys, tokens, and other authentication secrets that should never be stored in plain text.

Passwords and password hashes
API keys and tokens
SSH keys and certificates
OAuth secrets

Classification Levels

Data is automatically assigned to sensitivity levels based on its content and context. Use these levels to enforce appropriate access controls and audit requirements.

Public

Data that can be freely shared without risk

Internal

Business data for internal use only

Confidential

Sensitive data requiring access controls

Restricted

Highly sensitive data with strict access limits

How Classification Works

Schema Discovery

DB Audit connects to your databases and catalogs all tables, columns, and their data types. This metadata is used to prioritize scanning.

Data Sampling

A representative sample of data is analyzed using configurable sampling strategies. Full scans are available for smaller datasets or compliance requirements.

Pattern Matching

Over 200 built-in patterns detect common sensitive data formats. Custom patterns can be added for organization-specific data.

ML Enhancement

Machine learning models improve classification accuracy by understanding context and detecting patterns that regex alone would miss.

Classification Assignment

Each column is assigned a data category and sensitivity level. Results are stored and used to automatically apply protection policies.

Configuration

Configure classification rules, custom patterns, and scanning schedules through YAML configuration or the dashboard.

# Data Classification Configuration
classification:
  enabled: true
  scan_schedule: "0 2 * * *"  # Daily at 2 AM

  # Built-in classifiers
  classifiers:
    - type: pii
      enabled: true
      sensitivity: high
      patterns:
        - name: ssn
          regex: '\b\d{3}-\d{2}-\d{4}\b'
          classification: restricted
        - name: email
          regex: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
          classification: confidential

    - type: financial
      enabled: true
      sensitivity: high
      patterns:
        - name: credit_card
          regex: '\b(?:\d{4}[- ]?){3}\d{4}\b'
          classification: restricted
          validate: luhn  # Validate using Luhn algorithm

    - type: phi
      enabled: true
      sensitivity: high
      hipaa_compliant: true

    - type: credentials
      enabled: true
      sensitivity: critical
      alert_on_detection: true

  # Custom classifiers
  custom_classifiers:
    - name: internal_ids
      description: "Internal employee and project IDs"
      patterns:
        - regex: 'EMP-\d{6}'
          classification: internal
        - regex: 'PROJ-[A-Z]{3}-\d{4}'
          classification: confidential

  # Exclusions
  exclusions:
    tables:
      - audit_log
      - system_metadata
    columns:
      - created_at
      - updated_at

Scan Results

Run classification scans on-demand or on a schedule. Results are available in the dashboard, CLI, and as exportable reports.

# Classification Scan Results
dbaudit classify scan --database production

Scanning database: production
Tables scanned: 47
Columns analyzed: 312
Rows sampled: 1,000,000

Classification Results:
=======================

Table: customers
  - email (VARCHAR)        -> PII (Confidential)
  - phone (VARCHAR)        -> PII (Confidential)
  - ssn (VARCHAR)          -> PII (Restricted) [!]
  - credit_card (VARCHAR)  -> Financial (Restricted) [!]

Table: employees
  - full_name (VARCHAR)    -> PII (Confidential)
  - salary (DECIMAL)       -> Financial (Confidential)
  - employee_id (VARCHAR)  -> Internal

Table: medical_records
  - diagnosis (TEXT)       -> PHI (Restricted) [!]
  - treatment (TEXT)       -> PHI (Restricted) [!]
  - insurance_id (VARCHAR) -> PHI (Confidential)

[!] Restricted data detected - Review recommended

Summary:
  Public: 156 columns
  Internal: 89 columns
  Confidential: 52 columns
  Restricted: 15 columns

Report saved to: ./classification-report-2024-01-15.json

Automatic Policy Integration

Classification results automatically integrate with security policies. When sensitive data is discovered, you can automatically apply masking, access controls, and audit logging.

Policy-Based Actions

Data Masking

Automatically mask classified data in query results based on user roles.

Access Controls

Restrict access to classified columns based on sensitivity levels.

Audit Logging

Enhanced logging for all access to classified data for compliance.

Alerting

Real-time alerts when classified data is accessed or exported.

Next Steps

Security Policies

Threat Detection

Quick Start Guide

Discover Your Sensitive Data

Start classifying your data today. Know exactly what sensitive information exists in your databases and protect it automatically.

Start Free Trial Talk to Sales