Skip to content
This repository was archived by the owner on Apr 20, 2023. It is now read-only.

Conversation

stefco
Copy link

@stefco stefco commented Jun 18, 2021

Description

Commits fb0c968 and b51f725:

Adds a download_carousel method for Posts which allows you to download all media on carousel posts, i.e. posts with multiple images/videos, as raised in #105. Since this is a batch operation, you specify an output directory and a function for calculating the filename for each output image instead of specifying a single output filename. See the method documentation for details.

Also added a couple of supporting methods, though only one of them, parse_carousel_urls, is public; this method simply returns the video and image URLs for each image in the carousel, or None if the post is not a carousel. Again, see docstring for details.

Also added the beginnings of a demo jupyter notebook.

Fixes #105

Commit e88032e:

Post.get_recent_comments would raise a KeyError when using Selenium or a requests.Session object to scrape a Post due to slight differences in the structure of the resulting json_dict. I added an except block to handle this and try the alternative json_dict schema.

Fixes #124

Commit f443435:

Add Profile.iter_posts to get a lazy iterator over posts, and reimplement Profile.get_posts (with the same API) using iter_posts.

Fixes #127

Checklist

  • I followed the guidelines in our Contributing document
  • I added an explanation of my changes
  • I have written new tests for my changes, as applicable
  • I successfully ran tests with my changes locally

Additional notes (optional)

Have not written automated tests yet, will do so soon.

@kyrlon
Copy link

kyrlon commented Dec 30, 2021

I have attempted to run the added function download_carousel with the following Google Instagram post, but come across a TypeError.

from instascrape import *
from pathlib import Path

def insta_scrape(ig_links):
    for link in ig_links:
        post = link.split("?")[0] if "copy_link" in link else link
        post_folder = Path(post.split("/")[-2])
        post_folder.mkdir(parents=True, exist_ok=True)

        google_post = Post(post)
        google_post.download_carousel(str(post_folder), allow_non_carousel=True)


if __name__ == "__main__":
    link_list = list()   
    ig_link = "https://www.instagram.com/p/CXuAeZ1ltCa/?utm_source=ig_web_copy_link"
    link_list.append(ig_link)
    insta_scrape(link_list)

Running the following code above, I get the following Traceback:

$ py insta_scrape.py 
Traceback (most recent call last):
  File "insta_scrape.py", line 18, in <module>
    insta_scrape(link_list)
  File "insta_scrape.py", line 11, in insta_scrape
    google_post.download_carousel(str(post_folder), allow_non_carousel=True)
  File "...\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 213, in download_carousel  
    urls = self.parse_carousel_urls()
  File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 157, in parse_carousel_urls
    is_videos = self._filter_get(self.flat_json_dict, self._IS_VIDEO_KEYS)
  File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 133, in _filter_get        
    return [(k, dic[k]) for k in keys if k in dic]
  File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 133, in <listcomp>
    return [(k, dic[k]) for k in keys if k in dic]
TypeError: argument of type 'NoneType' is not iterable
(py_venv) 

Stepping through with the debugger, I noticed that the variable self.flat_json_dict has the value of None. I am not sure if anyone else has come across such an error.

@kyrlon
Copy link

kyrlon commented Dec 30, 2021

I had forgotten to perform the scrape method. With the new line, the function now operates as intended.

from instascrape import *
from pathlib import Path

def insta_scrape(ig_links):
    for link in ig_links:
        post = link.split("?")[0] if "copy_link" in link else link
        post_folder = Path(post.split("/")[-2])
        post_folder.mkdir(parents=True, exist_ok=True)

        google_post = Post(post)
        google_post.scrape()
        google_post.download_carousel(str(post_folder), allow_non_carousel=True)


if __name__ == "__main__":
    link_list = list()   
    ig_link = "https://www.instagram.com/p/CXuAeZ1ltCa/?utm_source=ig_web_copy_link"
    link_list.append(ig_link)
    insta_scrape(link_list)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazy iterator for post discovery get_recent_comments() results in KeyError: 'entry_data' How to download all photos if post have multiple photos
2 participants