provisionesta/datadumper

Local and cloud file parsing and saving helpers for Laravel with diff changelog and manifest features

dev-main 2024-07-14 16:56 UTC

This package is auto-updated.

Last update: 2024-12-14 19:17:54 UTC


README

[[TOC]]

Overview

The Datadumper package helps you parse and save flat files in CSV, JSON, and YAML flat files to local disk, S3 bucket, or commit to a Git repository. You can also push array of data to a Google Sheet and replace the existing data or add new rows.

This package includes manifest creation with changelog append functionality.

This is maintained by the open source community and is not maintained by any company. Please use at your own risk and create merge requests for any bugs that you encounter.

Problem Statement

When working with large arrays of data in flat files, you may need to support multiple formats (ex. CSV, JSON, YAML). As you work to keep those files up to date with the latest array data, you also need to be able to update existing files and check for any differences that should be logged.

This package was originally created for getting a directory of users from the Okta API and monitoring for changes to user attributes, however there are a wide variety of generic use cases that this may be helpful for.

Issue Tracking and Bug Reports

We do not maintain a roadmap of feature requests, however we invite you to contribute and we will gladly review your merge requests.

Please create an issue for bug reports.

Contributing

Please see CONTRIBUTING.md to learn more about how to contribute.

Maintainers

NameGitLab HandleEmail
Jeff Martin@jeffersonmartinprovisionesta [at] jeffersonmartin [dot] com

Contributor Credit

Installation

Requirements

RequirementVersion
PHP^8.1
Laravel^10.0, ^11.0

Upgrade Guide

See the changelog for release notes.

Add Composer Package

composer require provisionesta/datadumper

If you are contributing to this package, see CONTRIBUTING.md for instructions on configuring a local composer package with symlinks.

Publish the configuration file

This is optional. The configuration file specifies which .env variable names that are used for various configurations.

php artisan vendor:publish --tag=datadumper

Parsing Files

When parsing CSV, JSON, or YAML files, the contents will be returned as an array. If you set the key argument, a collection will be created and the array will be keyed by the named argument instead of integers.

Parsing CSV Files

use Provisionesta\Datadumper\Csv;

$data = Csv::parse(
    file_path: 'folder-name/file-name.csv',
    event_type: null,
    key_by: 'id'
);

Parsing JSON Files

use Provisionesta\Datadumper\Json;

$data = Json::parse(
    file_path: 'folder-name/file-name.json',
    event_type: null,
    key_by: 'id'
);

Parsing YAML Files

use Provisionesta\Datadumper\Yaml;

$data = Yaml::parse(
    file_path: 'folder-name/file-name.yaml',
    event_type: null,
    key_by: 'id'
);

Parsing Google Sheets

use Provisionesta\Datadumper\GoogleSheet;

$data = GoogleSheet::parse(
    sheet: '1ga1b2c3d4e5f6g7h8i9h0i1j2k3l4m5n6o7p8q9r0s1',
    tab: 'Sheet 1'
    // connection:
);

Parsing GitLab Repository Files

use Provisionesta\Datadumper\Gitlab;
use Symfony\Component\Yaml\Yaml as SymfonyYaml;

$file_contents = Gitlab::parse(
    project: 'group-path/child-group-path/project-path',
    file_path: 'folder/filename.json'
    branch: 'main'
    // connection:
);
// Parse JSON File
$data = json_decode($file_contents);
use Symfony\Component\Yaml\Yaml as SymfonyYaml;

// Parse YAML File
$data = collect(SymfonyYaml::parse($file_contents));

Saving Files

You can save the same array to multiple formats simultaneously with this package, or save each format separately. When parsing existing data for manifests, the JSON format is used and the updated CSV and YAML is regenerated and saved.

Saving Multiple Formats to Disk

use Provisionesta\Datadumper\Disk;

// $data =

// Override existing files
Disk::save(
    data: $data,
    file_path: 'folder/filename',
    event_type: null,
    key_by: 'id',
    csv: true
    json: true
    yaml: true
);
use Provisionesta\Datadumper\Disk;

// $data =

// Append existing files
Disk::append(
    data: $data,
    file_path: 'folder/filename',
    event_type: null,
    key_by: 'id',
    csv: true
    json: true
    yaml: true
);

Saving Multiple Formats to GitLab Repository

use Provisionesta\Datadumper\GitlabCommit;

// $data =

GitlabCommit::save(
    data: $data,
    file_path: 'folder/filename',
    project: 'group-path/child-group-path/project-path',
    commit_branch: 'main', // A new branch will be created if it doesn't exist
    source_branch: 'main',
    commit_message: 'Auto-generated commit by the datadumper package',
    // connection:
    event_type: null,
    key_by: 'id',
    csv: true
    json: true
    yaml: true
);

Saving CSV Files

use Provisionesta\Datadumper\Csv;

// You can create an array however you want. CSV files should have values only.
$data = collect($records)->transform(fn($item) => array_values($item))->toArray();

$file_contents = Csv::save(
    file_path: 'folder-name/file-name.csv',
    data: $data,
    event_type: null
);

Saving JSON Files

use Provisionesta\Datadumper\Json;

// $data =

$file_contents = Json::save(
    file_path: 'folder-name/file-name.json',
    data: $data,
);

Saving YAML Files

use Provisionesta\Datadumper\Yaml;

// $data =

$file_contents = Yaml::save(
    file_path: 'folder-name/file-name.yaml',
    data: $data,
);

Manifest and Changelog

This package includes comprehensive diff detection and changelog generation. This is useful when doing state change comparison for specific attribute fields when getting updated data from an API and comparing it against existing flat file data.

use Provisionesta\Datadumper\Manifest;
use Provisionesta\Okta\ApiClient;

// An array of attribute keys to compare for changes
$attributes = ['title', 'department', 'manager'];

$data = ApiClient::get('users')->data;

// Local or S3 Disk Files
Manifest::make()->handle(
    attributes: $attributes,
    data: $data,
    file_path: 'okta/users',
    key_by: 'id',
    reference_key: 'email'
    driver: disk
    csv: true
    json: true
    yaml: true
);

// GitLab Repository Files
Manifest::make()->handle(
    attributes: $attributes,
    data: $data,
    file_path: 'okta/users',
    key_by: 'id',
    reference_key: 'email'
    git_commit_branch: 'main',
    git_commit_message: 'Auto-generated commit by the datadumper package',
    git_source_branch: 'main'
    // connection:
    driver: gitlab
    gitlab_project: '12345678',
    csv: true
    json: true
    yaml: true
);

Environment Variables

Google Sheets are enabled by default and can be disabled if desired. This uses the provisionesta/google-api-client package.

DATADUMPER_GOOGLE_SHEET_ENABLED=true

When working with Google Sheets, you will use a JSON API key that has been granted one or more OAuth2 scopes, including one of the required scopes for working with Google Sheets (ex. https://www.googleapis.com/auth/{scope_suffix}). Set this to the suffix of scope that your key has been granted.

DATADUMPER_GOOGLE_SHEET_SCOPE="drive"
# DATADUMPER_GOOGLE_SHEET_SCOPE="drive.file"
# DATADUMPER_GOOGLE_SHEET_SCOPE="spreadsheets"

Changelog Date Format

When generating a changelog file, it is generated in the same directory as the manifest in the changelog/{date_format}.csv|json|yml. By default, this is in Y-m format (ex. 2024-01). You can customize this to any format that you want (ex. daily, quarterly, etc.) and when a changelog is generated, it uses Carbon to generate a timestamp in the format and checks whether the file with the same name exists or to create a new one.

DATADUMPER_MANIFEST_CHANGELOG_DATE_FORMAT="Y-m"

Filesystem Driver

When a manifest is generated, you can choose whether to save the manifest to the default Laravel filesystem disk (ex. FILESYSTEM_DISK=local or FILESYSTEM_DISK=s3) or to a GitLab repository. It is recommended to start with disk for initial testing.

DATADUMPER_MANIFEST_FILE_DRIVER=disk
# DATADUMPER_MANIFEST_FILE_DRIVER=gitlab

GitLab Connection

If you do use the GitLab driver, you need to specify the API credentials that has access to the GitLab project where files will be committed to. This should be a Project Access Token in most cases unless you have a bot or service account user with least privilege for your application with limited access to only the projects used by your application. For security reasons, do not use a personal access token.

If you are not performing other GitLab API calls with different projects and permissions, you can use the GitLab API Client variables.

See Security Best Practices before creating an API token.

GITLAB_API_URL="https://gitlab.com"
GITLAB_API_TOKEN="glpat-a1b2c3d4e5f6g7h8i9j0"

If you already make GitLab API calls and want to use a different user account or API token for manifest read/write changes or on a different GitLab instance, you can use the DATADUMPER_MANIFEST_GITLAB_* variables. These will use the GITLAB_API_* variables automatically if not set.

DATADUMPER_MANIFEST_GITLAB_URL="https://gitlab.com"
# DATADUMPER_MANIFEST_GITLAB_URL="https://gitlab.example.com"
DATADUMPER_MANIFEST_GITLAB_TOKEN="glpat-a1b2c3d4e5f6g7h8i9j0"

For advanced connection use cases, you can also pass an array that is configured elsewhere into the connection: or gitlab_connection: parameter. For security reasons, this array should be accessing environment variables, secrets manager values, or encrypted values in your database.

// config/services.php
[
    // ...
    'gitlab' => [
        'manifests' => [
            'url' => env('GITLAB_MANIFESTS_URL'),
            'token' => env('GITLAB_MANIFESTS_TOKEN')
        ]
    ]
    // ...
]
use Provisionesta\Datadumper\Gitlab;

$gitlab_file = GitLab::parse(
    project: 'group-path/child-group-path/project-name',
    file_path: 'folder-name/filename.json',
    branch: 'main',
    connection: config('services.gitlab.manifests')
);

$file_array = json_decode($gitlab_file);