-
Notifications
You must be signed in to change notification settings - Fork 126
Simple includes implementation #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I would like to see if the approach makes sense and any other considerations before moving along with a more complete solution |
Thanks, in general this approach looks good to me. I like that it doesn't need any change on the installation process as everything happens during build. Let me discuss some details.
I think we could place this file under
In principle I would prefer to make this invisible, during Though if we make this completely invisible we would have to check that everything uses the built packages, I am not sure now if fields validation and pipeline tests work with built packages or with the source files. Maybe a third option is to keep some list of checksums of imported files, something similar to Another problem we may have is that CI in the integrations repository is aware of packages now, and on PRs builds are only executed for modified packages. We need to make CI aware of these includes, so if some packages use a file from the
Not sure about The configuration above could be also expressed like the following, that doesn't need to make any assumption on the structure of the repository, and is not so different:
This would also allow to include files located in other parts of the repository. For example the APM package used to live in the repository of the APM server, they could have shared files with this approach. In any case we have to be very careful with not allowing to traverse paths out of the root of the repository. Maybe we can leverage the new |
Thanks for taking a look!
I like this one 👍
I favored making it explicit thinking of the dev experience. Generally we will use this with file definitions and pipelines (mostly). I can imagine some frustration to figure out when things do not work as expected. On top of that, it felt kind of cumbersome to copy files on the fly or clean them up after packaging and prone to polluted repo if things go wrong at some point. But it is mostly a personal preference so I am not against any other option if it is the preferred approach.
Correct, I was thinking about adding a new command to elastic-package like About removing the package key, sounds good to me 👍 Will do the mentioned changes while we discuss the rest of things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Marc, I really want something like this.
In addition to sharing field definitions between data streams, I would have liked to have this functionality for
- input configuration shared across data streams
- field definitions shared between source data stream and transform
I don't have any strong recommendations here, just some thoughts...
Inline vs whole file
Some of the issue discussion talks about inline includes, but the implementation here is on the whole file level. I think that's enough. It keeps things simple while allowing use with files of different types.
Dev experience positives and negatives
In terms of dev experience, I see these positives currently:
- copying the file into its final destination means you can read it in context or find its content with grep.
- showing diffs means you save some time figuring out the mismatch
However, if each data stream has a certain file and i want to modify it, I may not realize before my edit that it is supposed to stay in sync with the others, and I need to consult the includes.yml
to know which of the files is the source and which are the destinations.
Currently there's no help with conflict resolution. I may overwrite my new changes to a destination file by rebuilding the package.
An alternative to includes.yml
An alternative would be to have the copied files only go into the the package, and instead of includes.yml
, there could be filename.ext.link
files wherever we want filename.ext
(with the content of .link
files saying where the source is). It would make it clearer when the source content is from elsewhere, but there are drawbacks too.
Secure references
I think there's a potential security issue currently for something like this:
- package: system
from: ../../../../../../../../etc/passwd
to: LICENSE.txt
It would be good to lock it down a bit more.
Use outside of the integrations
repo
I think the functionality we settle on should work and be useful for packages that are developed outside of the integrations
repo. Currently there is an assumption that sibling directories of the package root are other packages.
This should be enough for this also as far as we do not enforce any particular file type.
I think this is a good point as a middleground as it makes it clear what file to look at. Will give it a go and see how it looks like 👍
This is addressed by the use of OpenRoot, in my last commit we already scope the top most dir at
I think this might be something we do as a second iteration, not sure how to solve this generally (maybe by allowing referencing git repositories, for example) since most use cases for this currently fall in this initial case. |
I made changes to the spec and elastic-package so now you can add Considerations
|
Made a last change so during pipeline benchmarks/tests the included pipelines are expanded before instaling them so they can run properly |
We would need to modify at least the Maybe we should also add a check that requires a changelog entry for packages whose linked files have been modified. To avoid overlooking publication of fixes in dependant packages. Maybe we should store a checksum of the linked file apart from its path, so there is some track of modifications in the dependent packages. What do you think about this option?
👍
This couples the feature to the structure of the integrations repository, we have packages on their own repository, and packages in other paths in other repositories. I would only require the path to be under the root of the repository. |
Added the commented link commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the feature looks good to me as proposed. Added some questions and comments about the proposal.
One thing I see could be a bit confusing is the mixed scope, maybe this feature should be agnostic of packages, and work always at the global level, taking into account the current directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, we will have to wait for the change in package spec before merging.
Well, one thing we have to remember is the use of linked files in tests that don't use built packages, at least pipelines in pipeline tests. |
THis is already taken into account in https://github.com/elastic/elastic-package/pull/2483/files#diff-6d6115d2523865659d234b34e7eaf8cf1aa35ecf8378cfa32b1bb14d8781aaf9R75 where we copy them locally to install the pipelines and the remove them. Probably not the cleanest approach, but could not think about anything simpler without implementing something more complete such as #1743, but seemed out of scope for this one |
…ults - Add top-level convenience functions for linkedfiles operations - Introduce PackageLinks struct for ordered, structured results - Replace map-based returns with structured PackageLinks slice - Update CLI commands to use new API structure - Fix test to work with new PackageLinks structure - Improve documentation with workflow examples
- Ensure LinksFS always uses absolute paths for os.DirFS - Add validation to prevent workDir outside repository root - Update NewLinksFS to return error for invalid configurations - Add comprehensive test coverage for all path handling scenarios - Handle both absolute and relative workDir inputs securely
The merge from upstream/main removed linked files functionality from ingest pipeline processing. This restores: - files import and LinksFS usage - *.link pattern in file globbing - .link extension handling - linked file variants in pipeline detection Fixes pipeline test failures across all packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Only a question about the new methods and a possible reorganization.
```bash | ||
# After editing a shared file, update all links that reference it | ||
elastic-package links update | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
internal/files/linkedfiles.go
Outdated
@@ -124,17 +60,35 @@ type LinksFS struct { | |||
func NewLinksFS(repoRoot *os.Root, workDir string) (*LinksFS, error) { | |||
// Ensure workDir is absolute for os.DirFS | |||
var absWorkDir string | |||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Why is it needed to define this error?
💚 Build Succeeded
History
|
@marc-gr It was mentioned that CI would need to be updated to detect which dependent packages require building. Has this occurred? This makes me consider how we should handle the release of changes in dependent packages. Would you expect users to simultaneously push a new release for each dependent package? Alternatively, how can we ensure that a release is pushed to |
Not yet
I think this is the safest expectation, yeah, in case CI tells you dependant packages break because of your change
I would go for the simplest approach of not letting you update if dependant packages are not passing. I do not think this is going to be so widespread used that a more complex solution is needed atm |
I think we can run Regarding updates on other packages, changes will be needed as detected by |
Adding checks in the PR that updates elastic-package to the first version including this feature: elastic/integrations#14594 |
This is an initial implementation for the elastic/package-spec#89 proposal.
package-spec branch with the changes in https://github.com/elastic/package-spec/compare/main...marc-gr:package-spec:feat/includes?expand=1
With the current changes it allows to describe a
_dev/shared/includes.yml
file describing files to copy from other packages/data_streams at build time.Instead of making this an invisible process I initially opted to commit the files explicitly, to ease debugging, and
elastic-package check
takes care of noticing if they are out of sync, whilebuild
copies them.The initial layout of the file is quite naive and just to prove the point.
Summary:
_dev/shared/files/*
path to put arbitrary files that can be shared from different data streams eg: field definitionscheck
command to notice any out of date files.build
command to copy the files.Considerations:
Example usage:
With an
includes.yml
inwindows/_dev/shared
like