Skip to content

Simple includes implementation #2483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jul 9, 2025
Merged

Simple includes implementation #2483

merged 42 commits into from
Jul 9, 2025

Conversation

marc-gr
Copy link
Contributor

@marc-gr marc-gr commented Mar 18, 2025

This is an initial implementation for the elastic/package-spec#89 proposal.

package-spec branch with the changes in https://github.com/elastic/package-spec/compare/main...marc-gr:package-spec:feat/includes?expand=1

With the current changes it allows to describe a _dev/shared/includes.yml file describing files to copy from other packages/data_streams at build time.

Instead of making this an invisible process I initially opted to commit the files explicitly, to ease debugging, and elastic-package check takes care of noticing if they are out of sync, while build copies them.

The initial layout of the file is quite naive and just to prove the point.

Summary:

  • Adds an includes.yml that describe files to copy from other packages or the same and where
  • Allows a _dev/shared/files/* path to put arbitrary files that can be shared from different data streams eg: field definitions
  • Adds a new step to the check command to notice any out of date files.
  • Adds a new step to the build command to copy the files.

Considerations:

  • Since this will always get files from the latest versions, we need to add tooling/CI steps to trigger tests for any integration that depends on a package

Example usage:

With an includes.yml in windows/_dev/shared like

- package: system
  from: data_stream/security/elasticsearch/ingest_pipeline/default.yml
  to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml
- package: system
  from: data_stream/security/elasticsearch/ingest_pipeline/standard.yml
  to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_standard.yml
elastic-package check
Lint the package
data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml is outdated. Rebuild the package with 'elastic-package build'
--- want
+++ got
@@ -8,3 +8,3 @@
   - pipeline:
-      name: '{{ IngestPipeline "security_standard" }}'
+      name: '{{ IngestPipeline "standard" }}'
       if: 'ctx.winlog?.provider_name != null && ["Microsoft-Windows-Eventlog", "Microsoft-Windows-Security-Auditing"].contains(ctx.winlog.provider_name)'
@@ -52,3 +52,3 @@
       field: ecs.version
-      value: '8.17.0'
+      value: '8.11.0'
   - set:
data_stream/forwarded/elasticsearch/ingest_pipeline/security_standard.yml is outdated. Rebuild the package with 'elastic-package build'
Error: checking package failed: checking included files are up-to-date failed: files do not match

marc@tp:~/integrations/packages/windows$ elastic-package build
Build the package
system/data_stream/security/elasticsearch/ingest_pipeline/default.yml file copied to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml
system/data_stream/security/elasticsearch/ingest_pipeline/standard.yml file copied to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_standard.yml
README.md file rendered: /home/marc/integrations/packages/windows/docs/README.md
2025/03/18 11:44:09  INFO License text found in "/home/marc/integrations/LICENSE.txt" will be included in package
Package built: /home/marc/integrations/build/packages/windows-2.5.2.zip
Done

marc@tp:~/integrations/packages/windows$ elastic-package check
Lint the package
Done
Build the package
system/data_stream/security/elasticsearch/ingest_pipeline/default.yml file copied to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml
system/data_stream/security/elasticsearch/ingest_pipeline/standard.yml file copied to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_standard.yml
README.md file rendered: /home/marc/integrations/packages/windows/docs/README.md
2025/03/18 11:44:25  INFO License text found in "/home/marc/integrations/LICENSE.txt" will be included in package
Package built: /home/marc/integrations/build/packages/windows-2.5.2.zip
Done

@marc-gr marc-gr added the enhancement New feature or request label Mar 18, 2025
@marc-gr marc-gr requested a review from jsoriano March 18, 2025 10:44
@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 18, 2025

I would like to see if the approach makes sense and any other considerations before moving along with a more complete solution

@jsoriano
Copy link
Member

jsoriano commented Mar 18, 2025

Thanks, in general this approach looks good to me. I like that it doesn't need any change on the installation process as everything happens during build.

Let me discuss some details.

With the current changes it allows to describe a _dev/shared/includes.yml file describing files to copy from other packages/data_streams at build time.

I think we could place this file under _dev/build, and the shared files directly under _dev/build/shared.
We could even consider including the information in _dev/build/build.yml, though not a strong opinion about this.

Instead of making this an invisible process I initially opted to commit the files explicitly, to ease debugging, and elastic-package check takes care of noticing if they are out of sync, while build copies them.

In principle I would prefer to make this invisible, during build. We already have some "invisible" steps during builds, such as resolution of ECS fields, or some processing in dashboards.
Having to keep files on sync is always a source of papercuts.

Though if we make this completely invisible we would have to check that everything uses the built packages, I am not sure now if fields validation and pipeline tests work with built packages or with the source files.

Maybe a third option is to keep some list of checksums of imported files, something similar to go.mod, but this would also require some sync, so not sure if it is worth to consider.

Another problem we may have is that CI in the integrations repository is aware of packages now, and on PRs builds are only executed for modified packages. We need to make CI aware of these includes, so if some packages use a file from the system package, and this file changes, all affected packages are tested too.

- package: system
  from: data_stream/security/elasticsearch/ingest_pipeline/default.yml
  to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml

Not sure about package. This is a key that assumes a repository structure, and will only be useful in repositories that follow it (ok, this is our main use case with the integrations repository :D but this is circumstantial).

The configuration above could be also expressed like the following, that doesn't need to make any assumption on the structure of the repository, and is not so different:

- from: ../system/data_stream/security/elasticsearch/ingest_pipeline/default.yml
  to: data_stream/forwarded/elasticsearch/ingest_pipeline/security_default.yml

This would also allow to include files located in other parts of the repository. For example the APM package used to live in the repository of the APM server, they could have shared files with this approach.

In any case we have to be very careful with not allowing to traverse paths out of the root of the repository. Maybe we can leverage the new OpenRoot here.

@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 18, 2025

Thanks for taking a look!

I think we could place this file under _dev/build, and the shared files directly under _dev/build/shared. We could even consider including the information in _dev/build/build.yml, though not a strong opinion about this.

I like this one 👍

In principle I would prefer to make this invisible, during build. We already have some "invisible" steps during builds, such as resolution of ECS fields, or some processing in dashboards. Having to keep files on sync is always a source of papercuts.

I favored making it explicit thinking of the dev experience. Generally we will use this with file definitions and pipelines (mostly). I can imagine some frustration to figure out when things do not work as expected. On top of that, it felt kind of cumbersome to copy files on the fly or clean them up after packaging and prone to polluted repo if things go wrong at some point. But it is mostly a personal preference so I am not against any other option if it is the preferred approach.

Another problem we may have is that CI in the integrations repository is aware of packages now, and on PRs builds are only executed for modified packages. We need to make CI aware of these includes, so if some packages use a file from the system package, and this file changes, all affected packages are tested too.

Correct, I was thinking about adding a new command to elastic-package like elastic-package included or similar that lists any "importers" then we trigger the appropiate steps.

About removing the package key, sounds good to me 👍

Will do the mentioned changes while we discuss the rest of things.

Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Marc, I really want something like this.

In addition to sharing field definitions between data streams, I would have liked to have this functionality for

  • input configuration shared across data streams
  • field definitions shared between source data stream and transform

I don't have any strong recommendations here, just some thoughts...

Inline vs whole file

Some of the issue discussion talks about inline includes, but the implementation here is on the whole file level. I think that's enough. It keeps things simple while allowing use with files of different types.

Dev experience positives and negatives

In terms of dev experience, I see these positives currently:

  • copying the file into its final destination means you can read it in context or find its content with grep.
  • showing diffs means you save some time figuring out the mismatch

However, if each data stream has a certain file and i want to modify it, I may not realize before my edit that it is supposed to stay in sync with the others, and I need to consult the includes.yml to know which of the files is the source and which are the destinations.

Currently there's no help with conflict resolution. I may overwrite my new changes to a destination file by rebuilding the package.

An alternative to includes.yml

An alternative would be to have the copied files only go into the the package, and instead of includes.yml , there could be filename.ext.link files wherever we want filename.ext (with the content of .link files saying where the source is). It would make it clearer when the source content is from elsewhere, but there are drawbacks too.

Secure references

I think there's a potential security issue currently for something like this:

- package: system
  from: ../../../../../../../../etc/passwd
  to: LICENSE.txt

It would be good to lock it down a bit more.

Use outside of the integrations repo

I think the functionality we settle on should work and be useful for packages that are developed outside of the integrations repo. Currently there is an assumption that sibling directories of the package root are other packages.

@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 18, 2025

In addition to sharing field definitions between data streams, I would have liked to have this functionality for

  • input configuration shared across data streams
  • field definitions shared between source data stream and transform

This should be enough for this also as far as we do not enforce any particular file type.

An alternative to includes.yml

An alternative would be to have the copied files only go into the the package, and instead of includes.yml , there could be filename.ext.link files wherever we want filename.ext (with the content of .link files saying where the source is). It would make it clearer when the source content is from elsewhere, but there are drawbacks too.

I think this is a good point as a middleground as it makes it clear what file to look at. Will give it a go and see how it looks like 👍

Secure references

I think there's a potential security issue currently for something like this:

- package: system
  from: ../../../../../../../../etc/passwd
  to: LICENSE.txt

This is addressed by the use of OpenRoot, in my last commit we already scope the top most dir at packages/.

Use outside of the integrations repo

I think the functionality we settle on should work and be useful for packages that are developed outside of the integrations repo. Currently there is an assumption that sibling directories of the package root are other packages.

I think this might be something we do as a second iteration, not sure how to solve this generally (maybe by allowing referencing git repositories, for example) since most use cases for this currently fall in this initial case.

@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 19, 2025

I made changes to the spec and elastic-package so now you can add file.ext.link with a single line containing a reference to a file to include. It will do this transparently during build.

Considerations

  • We need to review tests as this will require changes in at least pipeline tests
  • Still only works with packages in the same repository
  • Paths in .link files are expected to be relative to ./packages/ to avoid issues escaping the root

@marc-gr marc-gr requested a review from chrisberkhout March 19, 2025 12:39
@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 20, 2025

Made a last change so during pipeline benchmarks/tests the included pipelines are expanded before instaling them so they can run properly

@jsoriano
Copy link
Member

We need to review tests as this will require changes in at least pipeline tests

We would need to modify at least the is_pr_affected function. It will need to look for .ext.lnk files, and check if the linked files are modified in the PR.

Maybe we should also add a check that requires a changelog entry for packages whose linked files have been modified. To avoid overlooking publication of fixes in dependant packages.

Maybe we should store a checksum of the linked file apart from its path, so there is some track of modifications in the dependent packages. What do you think about this option?

Still only works with packages in the same repository

👍

Paths in .link files are expected to be relative to ./packages/ to avoid issues escaping the root

This couples the feature to the structure of the integrations repository, we have packages on their own repository, and packages in other paths in other repositories.

I would only require the path to be under the root of the repository.

@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 26, 2025

Added the commented link commands:

  • All commands are relative to the current path, so they work for packages and at the repo level
  • elastic-package links check: will fail if any of the links in the tree has an outdated checksum
  • elastic-package links update: will update all links in the tree to its current checksum
  • elastic-package links list: will list any packages that contains links with references to the current package path. If outside a package, it will list any links referencing files outside any package
~/elastic-package/test/packages/other/pipeline_tests$ elastic-package links list
with_includes

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the feature looks good to me as proposed. Added some questions and comments about the proposal.

One thing I see could be a bit confusing is the mixed scope, maybe this feature should be agnostic of packages, and work always at the global level, taking into account the current directory.

@marc-gr marc-gr requested a review from jsoriano March 27, 2025 16:03
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, we will have to wait for the change in package spec before merging.

@jsoriano
Copy link
Member

Well, one thing we have to remember is the use of linked files in tests that don't use built packages, at least pipelines in pipeline tests.

@marc-gr
Copy link
Contributor Author

marc-gr commented Mar 28, 2025

Well, one thing we have to remember is the use of linked files in tests that don't use built packages, at least pipelines in pipeline tests.

THis is already taken into account in https://github.com/elastic/elastic-package/pull/2483/files#diff-6d6115d2523865659d234b34e7eaf8cf1aa35ecf8378cfa32b1bb14d8781aaf9R75 where we copy them locally to install the pipelines and the remove them. Probably not the cleanest approach, but could not think about anything simpler without implementing something more complete such as #1743, but seemed out of scope for this one

…ults

- Add top-level convenience functions for linkedfiles operations
- Introduce PackageLinks struct for ordered, structured results
- Replace map-based returns with structured PackageLinks slice
- Update CLI commands to use new API structure
- Fix test to work with new PackageLinks structure
- Improve documentation with workflow examples
@marc-gr marc-gr requested a review from jsoriano June 26, 2025 08:33
marc-gr added 11 commits June 26, 2025 11:13
- Ensure LinksFS always uses absolute paths for os.DirFS
- Add validation to prevent workDir outside repository root
- Update NewLinksFS to return error for invalid configurations
- Add comprehensive test coverage for all path handling scenarios
- Handle both absolute and relative workDir inputs securely
The merge from upstream/main removed linked files functionality
from ingest pipeline processing. This restores:
- files import and LinksFS usage
- *.link pattern in file globbing
- .link extension handling
- linked file variants in pipeline detection

Fixes pipeline test failures across all packages.
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Only a question about the new methods and a possible reorganization.

```bash
# After editing a shared file, update all links that reference it
elastic-package links update
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@marc-gr marc-gr requested a review from jsoriano July 8, 2025 07:18
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -124,17 +60,35 @@ type LinksFS struct {
func NewLinksFS(repoRoot *os.Root, workDir string) (*LinksFS, error) {
// Ensure workDir is absolute for os.DirFS
var absWorkDir string
var err error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Why is it needed to define this error?

@marc-gr marc-gr enabled auto-merge (squash) July 9, 2025 07:18
@marc-gr marc-gr merged commit eabdc28 into elastic:main Jul 9, 2025
3 checks passed
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

@marc-gr marc-gr deleted the feat/includes branch July 9, 2025 08:26
@andrewkroh
Copy link
Member

andrewkroh commented Jul 16, 2025

We would need to modify at least the is_pr_affected function. It will need to look for .ext.lnk files, and check if the linked files are modified in the PR.

@marc-gr It was mentioned that CI would need to be updated to detect which dependent packages require building. Has this occurred?

This makes me consider how we should handle the release of changes in dependent packages. Would you expect users to simultaneously push a new release for each dependent package? Alternatively, how can we ensure that a release is pushed to windows when system changes? Or at least be able to identify the packages that have pending unreleased changes.

@marc-gr
Copy link
Contributor Author

marc-gr commented Jul 17, 2025

We would need to modify at least the is_pr_affected function. It will need to look for .ext.lnk files, and check if the linked files are modified in the PR.

@marc-gr It was mentioned that CI would need to be updated to detect which dependent packages require building. Has this occurred?

Not yet

This makes me consider how we should handle the release of changes in dependent packages. Would you expect users to simultaneously push a new release for each dependent package?

I think this is the safest expectation, yeah, in case CI tells you dependant packages break because of your change

Alternatively, how can we ensure that a release is pushed to windows when system changes? Or at least be able to identify the packages that have pending unreleased changes.

I would go for the simplest approach of not letting you update if dependant packages are not passing. I do not think this is going to be so widespread used that a more complex solution is needed atm

@jsoriano
Copy link
Member

We would need to modify at least the is_pr_affected function. It will need to look for .ext.lnk files, and check if the linked files are modified in the PR.

@marc-gr It was mentioned that CI would need to be updated to detect which dependent packages require building. Has this occurred?

Not yet

I think we can run elastic-package links check as a check step at the repository level. We have to remember adding this when updating elastic-package in the integrations repository. cc @mrodm

Regarding updates on other packages, changes will be needed as detected by links check, and on review the developers and reviewers will have to consider it as any other change. Evaluate if a changelog entry is needed, with a -next suffix or not, and so on.

@jsoriano
Copy link
Member

Adding checks in the PR that updates elastic-package to the first version including this feature: elastic/integrations#14594

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants