This repository was archived by the owner on Apr 20, 2023. It is now read-only.
Add download_carousel #125
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Commits fb0c968 and b51f725:
Adds a
download_carousel
method forPost
s which allows you to download all media on carousel posts, i.e. posts with multiple images/videos, as raised in #105. Since this is a batch operation, you specify an output directory and a function for calculating the filename for each output image instead of specifying a single output filename. See the method documentation for details.Also added a couple of supporting methods, though only one of them,
parse_carousel_urls
, is public; this method simply returns the video and image URLs for each image in the carousel, orNone
if the post is not a carousel. Again, see docstring for details.Also added the beginnings of a demo jupyter notebook.
Fixes #105
Commit e88032e:
Post.get_recent_comments would raise a KeyError when using Selenium or a requests.Session object to scrape a Post due to slight differences in the structure of the resulting json_dict. I added an except block to handle this and try the alternative json_dict schema.
Fixes #124
Commit f443435:
Add
Profile.iter_posts
to get a lazy iterator over posts, and reimplementProfile.get_posts
(with the same API) usingiter_posts
.Fixes #127
Checklist
Additional notes (optional)
Have not written automated tests yet, will do so soon.