provisionesta / datadumper
Local and cloud file parsing and saving helpers for Laravel with diff changelog and manifest features
Requires
- php: ^8.1
- illuminate/config: ^10.0 || ^11.0
- illuminate/http: ^10.0 || ^11.0
- illuminate/log: ^10.0 || ^11.0
- illuminate/support: ^10.0 || ^11.0
- nesbot/carbon: ^2.67 || ^3.0
- provisionesta/audit: ^1.1
- provisionesta/gitlab-api-client: ^4.1
- provisionesta/google-api-client: ^4.1
- spatie/simple-excel: ^3.3
- symfony/yaml: ^6.0 || ^7.0
Requires (Dev)
- larastan/larastan: ^2.7
- orchestra/testbench: ^6.23 || ^7.0 || ^8.0 || ^9.0
This package is auto-updated.
Last update: 2024-12-14 19:17:54 UTC
README
[[TOC]]
Overview
The Datadumper package helps you parse and save flat files in CSV, JSON, and YAML flat files to local disk, S3 bucket, or commit to a Git repository. You can also push array of data to a Google Sheet and replace the existing data or add new rows.
This package includes manifest creation with changelog append functionality.
This is maintained by the open source community and is not maintained by any company. Please use at your own risk and create merge requests for any bugs that you encounter.
Problem Statement
When working with large arrays of data in flat files, you may need to support multiple formats (ex. CSV, JSON, YAML). As you work to keep those files up to date with the latest array data, you also need to be able to update existing files and check for any differences that should be logged.
This package was originally created for getting a directory of users from the Okta API and monitoring for changes to user attributes, however there are a wide variety of generic use cases that this may be helpful for.
Issue Tracking and Bug Reports
We do not maintain a roadmap of feature requests, however we invite you to contribute and we will gladly review your merge requests.
Please create an issue for bug reports.
Contributing
Please see CONTRIBUTING.md to learn more about how to contribute.
Maintainers
Name | GitLab Handle | |
---|---|---|
Jeff Martin | @jeffersonmartin | provisionesta [at] jeffersonmartin [dot] com |
Contributor Credit
Installation
Requirements
Requirement | Version |
---|---|
PHP | ^8.1 |
Laravel | ^10.0 , ^11.0 |
Upgrade Guide
See the changelog for release notes.
Add Composer Package
composer require provisionesta/datadumper
If you are contributing to this package, see CONTRIBUTING.md for instructions on configuring a local composer package with symlinks.
Publish the configuration file
This is optional. The configuration file specifies which .env
variable names that are used for various configurations.
php artisan vendor:publish --tag=datadumper
Parsing Files
When parsing CSV, JSON, or YAML files, the contents will be returned as an array. If you set the key
argument, a collection will be created and the array will be keyed by the named argument instead of integers.
Parsing CSV Files
use Provisionesta\Datadumper\Csv;
$data = Csv::parse(
file_path: 'folder-name/file-name.csv',
event_type: null,
key_by: 'id'
);
Parsing JSON Files
use Provisionesta\Datadumper\Json;
$data = Json::parse(
file_path: 'folder-name/file-name.json',
event_type: null,
key_by: 'id'
);
Parsing YAML Files
use Provisionesta\Datadumper\Yaml;
$data = Yaml::parse(
file_path: 'folder-name/file-name.yaml',
event_type: null,
key_by: 'id'
);
Parsing Google Sheets
use Provisionesta\Datadumper\GoogleSheet;
$data = GoogleSheet::parse(
sheet: '1ga1b2c3d4e5f6g7h8i9h0i1j2k3l4m5n6o7p8q9r0s1',
tab: 'Sheet 1'
// connection:
);
Parsing GitLab Repository Files
use Provisionesta\Datadumper\Gitlab;
use Symfony\Component\Yaml\Yaml as SymfonyYaml;
$file_contents = Gitlab::parse(
project: 'group-path/child-group-path/project-path',
file_path: 'folder/filename.json'
branch: 'main'
// connection:
);
// Parse JSON File
$data = json_decode($file_contents);
use Symfony\Component\Yaml\Yaml as SymfonyYaml;
// Parse YAML File
$data = collect(SymfonyYaml::parse($file_contents));
Saving Files
You can save the same array to multiple formats simultaneously with this package, or save each format separately. When parsing existing data for manifests, the JSON format is used and the updated CSV and YAML is regenerated and saved.
Saving Multiple Formats to Disk
use Provisionesta\Datadumper\Disk;
// $data =
// Override existing files
Disk::save(
data: $data,
file_path: 'folder/filename',
event_type: null,
key_by: 'id',
csv: true
json: true
yaml: true
);
use Provisionesta\Datadumper\Disk;
// $data =
// Append existing files
Disk::append(
data: $data,
file_path: 'folder/filename',
event_type: null,
key_by: 'id',
csv: true
json: true
yaml: true
);
Saving Multiple Formats to GitLab Repository
use Provisionesta\Datadumper\GitlabCommit;
// $data =
GitlabCommit::save(
data: $data,
file_path: 'folder/filename',
project: 'group-path/child-group-path/project-path',
commit_branch: 'main', // A new branch will be created if it doesn't exist
source_branch: 'main',
commit_message: 'Auto-generated commit by the datadumper package',
// connection:
event_type: null,
key_by: 'id',
csv: true
json: true
yaml: true
);
Saving CSV Files
use Provisionesta\Datadumper\Csv;
// You can create an array however you want. CSV files should have values only.
$data = collect($records)->transform(fn($item) => array_values($item))->toArray();
$file_contents = Csv::save(
file_path: 'folder-name/file-name.csv',
data: $data,
event_type: null
);
Saving JSON Files
use Provisionesta\Datadumper\Json;
// $data =
$file_contents = Json::save(
file_path: 'folder-name/file-name.json',
data: $data,
);
Saving YAML Files
use Provisionesta\Datadumper\Yaml;
// $data =
$file_contents = Yaml::save(
file_path: 'folder-name/file-name.yaml',
data: $data,
);
Manifest and Changelog
This package includes comprehensive diff detection and changelog generation. This is useful when doing state change comparison for specific attribute fields when getting updated data from an API and comparing it against existing flat file data.
use Provisionesta\Datadumper\Manifest;
use Provisionesta\Okta\ApiClient;
// An array of attribute keys to compare for changes
$attributes = ['title', 'department', 'manager'];
$data = ApiClient::get('users')->data;
// Local or S3 Disk Files
Manifest::make()->handle(
attributes: $attributes,
data: $data,
file_path: 'okta/users',
key_by: 'id',
reference_key: 'email'
driver: disk
csv: true
json: true
yaml: true
);
// GitLab Repository Files
Manifest::make()->handle(
attributes: $attributes,
data: $data,
file_path: 'okta/users',
key_by: 'id',
reference_key: 'email'
git_commit_branch: 'main',
git_commit_message: 'Auto-generated commit by the datadumper package',
git_source_branch: 'main'
// connection:
driver: gitlab
gitlab_project: '12345678',
csv: true
json: true
yaml: true
);
Environment Variables
Google Sheets are enabled by default and can be disabled if desired. This uses the provisionesta/google-api-client
package.
DATADUMPER_GOOGLE_SHEET_ENABLED=true
When working with Google Sheets, you will use a JSON API key that has been granted one or more OAuth2 scopes, including one of the required scopes for working with Google Sheets (ex. https://www.googleapis.com/auth/{scope_suffix}
). Set this to the suffix of scope that your key has been granted.
DATADUMPER_GOOGLE_SHEET_SCOPE="drive"
# DATADUMPER_GOOGLE_SHEET_SCOPE="drive.file"
# DATADUMPER_GOOGLE_SHEET_SCOPE="spreadsheets"
Changelog Date Format
When generating a changelog file, it is generated in the same directory as the manifest in the changelog/{date_format}.csv|json|yml
. By default, this is in Y-m
format (ex. 2024-01). You can customize this to any format that you want (ex. daily, quarterly, etc.) and when a changelog is generated, it uses Carbon to generate a timestamp in the format and checks whether the file with the same name exists or to create a new one.
DATADUMPER_MANIFEST_CHANGELOG_DATE_FORMAT="Y-m"
Filesystem Driver
When a manifest is generated, you can choose whether to save the manifest to the default Laravel filesystem disk (ex. FILESYSTEM_DISK=local
or FILESYSTEM_DISK=s3
) or to a GitLab repository. It is recommended to start with disk
for initial testing.
DATADUMPER_MANIFEST_FILE_DRIVER=disk
# DATADUMPER_MANIFEST_FILE_DRIVER=gitlab
GitLab Connection
If you do use the GitLab driver, you need to specify the API credentials that has access to the GitLab project where files will be committed to. This should be a Project Access Token in most cases unless you have a bot or service account user with least privilege for your application with limited access to only the projects used by your application. For security reasons, do not use a personal access token.
If you are not performing other GitLab API calls with different projects and permissions, you can use the GitLab API Client variables.
See Security Best Practices before creating an API token.
GITLAB_API_URL="https://gitlab.com"
GITLAB_API_TOKEN="glpat-a1b2c3d4e5f6g7h8i9j0"
If you already make GitLab API calls and want to use a different user account or API token for manifest read/write changes or on a different GitLab instance, you can use the DATADUMPER_MANIFEST_GITLAB_*
variables. These will use the GITLAB_API_*
variables automatically if not set.
DATADUMPER_MANIFEST_GITLAB_URL="https://gitlab.com"
# DATADUMPER_MANIFEST_GITLAB_URL="https://gitlab.example.com"
DATADUMPER_MANIFEST_GITLAB_TOKEN="glpat-a1b2c3d4e5f6g7h8i9j0"
For advanced connection use cases, you can also pass an array that is configured elsewhere into the connection:
or gitlab_connection:
parameter. For security reasons, this array should be accessing environment variables, secrets manager values, or encrypted values in your database.
// config/services.php
[
// ...
'gitlab' => [
'manifests' => [
'url' => env('GITLAB_MANIFESTS_URL'),
'token' => env('GITLAB_MANIFESTS_TOKEN')
]
]
// ...
]
use Provisionesta\Datadumper\Gitlab;
$gitlab_file = GitLab::parse(
project: 'group-path/child-group-path/project-name',
file_path: 'folder-name/filename.json',
branch: 'main',
connection: config('services.gitlab.manifests')
);
$file_array = json_decode($gitlab_file);