Osintgraph is a tool for deep social analysis and OSINT investigations focused on Instagram targets. It uses Neo4j to map a target’s network — revealing connections, interests, and affiliations — and an interactive AI Agent to speed up investigations and simplify analysis.
OSINTGraph CLI gathers all public Instagram data from a target and maps their social connections, including profiles, followers, followees, posts, comments, and likes. It helps you thoroughly examine your target by gathering all relevant data and analyzing it for investigations.
![]() |
---|
Overview of CLI Interface for data collection |
Use natural language to query about your target. The AI Agent supports data retrieval, keyword and semantic searches, relationship queries, and template-driven analyses — helping you get focused answers without manually digging through data.
Overview of interacting with the agent performing data retrieval, keyword and semantic searches, and template-based analyses. |
Visualize your target’s social network, trace interactions, and query relationships directly.
Overview.mp4
Example of tracing a target user’s close connection through their most commented post, then investigating mutual followers and all interactions between them. |
---|
- ✨ About OSINTGraph
- ⚡ What OSINTGraph Does
- 🚀 Getting Started
- ⚡ How OSINTGraph Works
- ⚙ Commands Reference
- 🧩 Data Model (Neo4j Schema)
- 🕵️ OSINTGraph AI Agent – Getting Started Guide
- 🚫 How to Avoid Account Suspension
- 📦 Dependencies
pipx install osintgraph
or
pip install osintgraph
Note
When using pip, it’s recommended to install inside a Python virtual environment to avoid dependency conflicts.
Before running osintgraph setup
, make sure you have the following ready:
-
Instagram Account: Preferably not your main account
-
Neo4j Database: For storing and visualizing data.
(Sign up at Neo4j → Create an instance for free → Download admin credentials) — you’ll need these for connection.
-
Gemini API Key: Enables data pre-analyses and the AI agent.
(Sign up at Google AI Studio → Create or select a Google Cloud project → Get API Key for free)
-
User Agent (Optional): Helps reduce Instagram detection risk. (Open your Firefox browser where you log in to Instagram, search “my user agent” on Google, and copy it)
Then run
osintgraph setup
Start gathering data on your target:
osintgraph discover TARGET_INSTAGRAM_USERNAME --limit follower=100 followee=100 post=2
Launch the AI Agent to explore and analyze collected data:
osintgraph agent
Once the agent starts, try asking it:
Show the target user’s profile info
Explore your target’s network graph interactively.
- Go to the Neo4j Console.
- Click the Explore tab, then Connect.
- In the search bar, type "Show me a graph".
- You should now see the person you just collected, along with their relationships.
OSINTGraph run in two main phases: Reconnaissance and Analysis & Investigation.
⚡PHASE 1: RECONNAISSANCE ⚡PHASE 2: ANALYSIS & INVESTIGATION
────────────────────────── ───────────────────────────────────
[ Data Collection ] (osintgraph discover <target>) [ Investigation ]
├─ Profile Metadata ├─ [AI Agent] (osintgraph agent)
├─ Followers │ • Retrieve Data
├─ Followees │ • Keyword Search
└─ Posts (with Comments) │ • Semantic Search
↓ │ • Graph Relationship Search
Posts Pre-Analysis │ • Run Template Analyses
├─ Uses: └─ [Neo4j Visualization]
│ • Post Metadata
│ • Comments
│ • Image Pre-Analyses
│ ├─ Uses:
│ │ • Post media (thumbnails & images)
│ └─ Generates:
│ • Structured Image Analysis Report
└─ Generates:
• Structured Post Analysis Report
↓
Account Pre-Analysis
├─ Uses:
│ • All Post Analyses
│ • Profile Metadata
└─ Generates:
• Structured Account Analysis Report
In this phase, you collect all public Instagram data for a target and their network. You’re building the raw intelligence database that you’ll investigate later.
What you do:
Run one of these commands to collect all public Instagram data for a target and their network:
-
osintgraph discover <target>
— Collect and (optionally) pre-analyze the target account’s data. -
osintgraph explore <target>
— Recursively rundiscover
on each followee of the target, prioritizing followees with the largest follower base in the Neo4j database.
What OSINTGraph does in the background:
- Scrapes the target’s profile, followers, followees, posts, and comments.
- If Gemini API is enabled, pre-analyzes:
- Image Analysis: Each post’s media is examined for visual clues and details.
- Post Analysis: Combines image findings, post metadata, and comments into a structured OSINT report.
- Account Analysis: Summarizes patterns and behaviors across all posts for the account.
Pre-analysis quickly examines posts and account data to give you early insights. It’s also useful for template-based investigations, because templates can use the pre-analyzed data immediately for deeper analysis.
- Maps all relationships (likes, follows, replies, etc.) into Neo4j. See how Instagram data is stored in Neo4j ↗
In this phase, you search, analyze, and visualize the intelligence gathered in Phase 1. Now you’re making sense of the network, activities, and patterns.
What you do:
- Query data using natural language, keyword/semantic search, and graph-relationship queries.
- Run analyses using predefined or custom templates.
- Explore and Visualize social networks interactively.
You have two main ways to do this:
- Ask questions for data retrieval, keyword and semantic searches, graph-relationship based queries, and analyses using predefined or custom templates. Learn more about Agent ↗
2. Neo4j Visualization (Neo4j Console Browser)
- Explore visualize the social network map interactively.
- See how people, posts, and interactions are connected.
Below is a breakdown of each command, what it does, and when to use it.
See Usage & options
Purpose:
Configures services and credentials so OSINTGraph can access Instagram, Neo4j, Gemini.
Options:
-
all
(default) — configure everything. -
instagram
— configure Instagram scraping credentials (cookies/session). -
neo4j
— set up your Neo4j database connection. -
gemini
— set your Gemini API key for AI analysis. -
user-agent
— customize the User-Agent string for scraping.
When to use: Run this the first time you install OSINTGraph or to set credentials.
Examples:
osintgraph setup
osintgraph setup instagram
See Usage & options
Purpose: Clears stored credentials for the chosen option and immediately re-runs setup for that option.
Options:
-
all
(default) — reset everything and reconfigure. -
instagram
— reset Instagram credentials. -
neo4j
— reset Neo4j database connection settings. -
gemini
— reset your Gemini API key. -
user-agent
— reset the User-Agent string for scraping.
When to use: Use this when you need to change or update your credentials (e.g., expired Instagram session, new API key, changed Neo4j password).
Examples:
osintgraph reset
osintgraph reset instagram
See Usage & options
Purpose: Collects all public data for a single Instagram account.
What it does:
-
Scrapes followers, followees, and posts (with comments).
-
Runs AI-powered post analysis (
post_analysis
) ). (if Gemini is configured) -
Runs AI-powered account analysis (
account_analysis
) after all posts are analyzed. (if Gemini is configured) -
Saves everything in Neo4j.
Resumable runs
- If
discover
cannot finish scraping or analysis in one run (for example, a target has thousands of followers or many posts), the progress is saved automatically.- Running the same command again with the same target will continue from where it left off until all data and analysis are complete.
- Finished sections are skipped on later runs, so no duplicate work is done.
- Use
--force
if you want to re-fetch or re-analyze any part (e.g.,--force follower
,--force account-analysis
).Limitation
- When scraping followers and followees, only username and basic profile info are collected. To get full profiles, posts, and comments, you need to run
discover
on each account separately.- When scraping posts, likes and comments are collected, but only a partial amount may be available due to Instagram’s limitations.
Options:
-
--skip [parts]
— skip certain steps.(Options: all, follower, followee, post, post-analysis, account-analysis)
Example:--skip post-analysis
will skip analyzing posts with AI. -
--limit TYPE=NUMBER
— limit how many items to fetch per type (default: follower=1000, followee=1000, post=10).(Options: follower, followee, post)
Example:--limit post=5
— fetches only 5 posts. -
--rate-limit NUMBER
— pause for 8–10 minutes after every N request to avoid detection.
Example:--rate-limit 500
will wait 8~10 minutes after every 500 Instagram requests. -
--force [parts]
— re-fetch or re-analyze even if already done.(Options: all, follower, followee, post, post-analysis, account-analysis)
Example:--force account-analysis
— resets the progress and reruns the AI analysis on the account data
When to use: First step of any investigation — gets all data for your primary target.
Example:
osintgraph discover "target_user"
osintgraph discover "target_user" --skip post-analysis account-analysis --limit follower=200 post=15 --force follower followee
See Usage & options
Purpose:
Recursive discovery — goes beyond your target to their network.
What it does:
- Runs
discover
on each followee of the target, prioritizing those with the largest follower counts in your Neo4j database.
Focuses on followees because they often reveal the target’s real interests, communities, and affiliations—such as local groups, news sources, favorite influencers, or close friends. Within these, accounts with larger follower bases in your Neo4j DB are explored first, increasing the chances of uncovering valuable insights.
- Stops after a set number of accounts.
Options:
--max NUMBER
— how many accounts to discover in total. Example:--max 10
— the agent willdiscover
up to 10 followees of the target, then stop.
(The following options work the same way as in discover
)
-
--skip [parts]
— skip certain steps (e.g., post-analysis).(Options: all, follower, followee, post, post-analysis, account-analysis)
-
--limit TYPE=NUMBER
— limit how many items to fetch per type (default: follower=1000, followee=1000, post=10).(Options: follower, followee, post)
-
--rate-limit NUMBER
— pause for 8–10 minutes after every N request to avoid detection. -
--force [parts]
— re-fetch or re-analyze even if already done.(Options: all, follower, followee, post, post-analysis, account-analysis)
When to use: To expand your investigation into the wider social network.
Example:
osintgraph explore "target_user"
osintgraph explore "target_user" --max 10 --limit follower=1000 followee=500
See Usage & options
Purpose:
Launches the OSINTGraph AI Agent for natural language investigations.
What it can do:
-
Keyword search across your Neo4j database.
-
Semantic search using AI embeddings.
-
Auto-generate and run Cypher queries.
-
Execute prebuilt or custom YAML investigation templates.
Key options:
--debug
— store detailed debug output for template.
When to use:
After you’ve collected data use the agent to ask questions, run analysis, or execute templates.
Example:
osintgraph agent --debug
After scraping, OSINTGraph stores Instagram data in Neo4j as interconnected nodes and relationships.
OSINTGraph Data Model (All Entities & Relationships)
See all properties
Property | Type | Description |
---|---|---|
id | INTEGER | Unique identifier for the person node. |
username | STRING | Instagram username. |
fullname | STRING | Full display name from profile. |
bio | STRING | Profile biography text. |
followers | INTEGER | Number of followers. |
followees | INTEGER | Number of accounts followed. |
mediacount | INTEGER | Number of posts uploaded. |
external_url | STRING | External link in profile bio. |
business_category_name | STRING | Business category if a business account. |
is_verified | BOOLEAN | True if the account has a verification badge. |
is_business_account | BOOLEAN | True if the account is marked as a business account. |
has_highlight_reels | BOOLEAN | True if the user has highlight stories. |
has_public_story | BOOLEAN | True if the account has public stories. |
is_private | BOOLEAN | True if the account is private. |
profile_pic_url | STRING | Profile picture URL. |
profile_pic_url_no_iphone | STRING | Alternate profile picture URL. |
biography_hashtags | LIST | Hashtags used in the bio. |
biography_mentions | LIST | Usernames mentioned in the bio. |
Property | Type | Description |
---|---|---|
account_analysis | STRING | AI-generated analysis of the account. (stringified JSON) |
Show account_analysis structure
{
"account_summary": {
"who_runs_this_account": {
"summary": "",
"confidence": ""
},
"what_type_of_account": {
"label": "",
"reasoning": "",
"confidence": ""
},
"why_this_account_exists": {
"main_purpose": "",
"supporting_signals": []
},
"who_is_the_target_audience": {
"summary": "",
"reasoning": ""
},
"what_it_posts_about": {
"topic_distribution": [
{
"topic": "",
"percentage": ""
}
]
},
"how_often_it_posts": {
"avg_posts_per_month": "",
"most_active_days": [],
"seasonal_patterns": ""
},
"who_comments_on_it": {
"audience_profile": {
"likely_age_range": "",
"languages_used": [],
"comment_style": "",
"emotional_tone": ""
},
"relationship_to_owner": ""
},
"how_comments_look": {
"comment_quality": "",
"reply_behavior": "",
"engagement_style": "",
"detected_bots_or_fake_activity": false
},
"notable_flags_or_anomalies": {
"inconsistencies": [],
"suspicious_behavior": [],
"possible_account_switch_history": false
},
"language_and_text_patterns": {
"caption_language": [],
"common_caption_themes": [],
"hashtags_usage": "",
"emoji_usage": "",
"comment_language_distribution": [],
"comment_length": ""
},
"summary_notes": ""
}
}
Property | Type | Description |
---|---|---|
username_vector | LIST | Vector embedding of username for semantic search. |
bio_vector | LIST | Vector embedding of biography for semantic search. |
fullname_vector | LIST | Vector embedding of full name for semantic search. |
account_analysis_vector | LIST | Vector embedding of account analysis for semantic search. |
Property | Type | Description |
---|---|---|
_profile_complete | BOOLEAN | Internal flag: profile scrape completed. |
_followers_complete | BOOLEAN | Internal flag: follower list scrape completed. |
_followees_complete | BOOLEAN | Internal flag: followee list scrape completed. |
_posts_complete | BOOLEAN | Internal flag: posts scrape completed. |
_posts_analysis_complete | BOOLEAN | Internal flag: post analysis completed. |
_account_analysis_complete | BOOLEAN | Internal flag: account analysis completed. |
_followers_resume_hash | STRING | Internal resume state for follower scraping. |
_followees_resume_hash | STRING | Internal resume state for followee scraping. |
_posts_resume_hash | STRING | Internal resume state for posts scraping. |
See all properties
Property | Type | Description |
---|---|---|
id | INTEGER | Unique identifier for the post node. |
shortcode | STRING | Instagram post shortcode (URL-friendly ID). |
caption | STRING | Post caption text. |
pcaption | STRING | Preprocessed caption text (cleaned). |
title | STRING | Post title (if available). |
likes | INTEGER | Number of likes on the post. |
comments | INTEGER | Number of comments on the post. |
is_video | BOOLEAN | True if the post is a video. |
video_duration | INTEGER | Video length in seconds. |
video_view_count | INTEGER | Number of video views. |
is_pinned | BOOLEAN | True if the post is pinned on profile. |
is_sponsored | BOOLEAN | True if the post is marked as sponsored content. |
typename | STRING | Instagram media type name. |
mediacount | INTEGER | Number of media items (for carousel posts). |
accessibility_caption | STRING | Alt-text or accessibility caption. |
tagged_users | LIST | Usernames tagged in the post. |
caption_hashtags | LIST | Hashtags used in the post caption. |
caption_mentions | LIST | Mentions in the post caption. |
date_utc | DATE_TIME | UTC timestamp of post creation. |
date_local | DATE_TIME | Local timestamp of post creation. |
Property | Type | Description |
---|---|---|
post_analysis | STRING | AI-generated analysis of the post. (stringified JSON) |
image_analysis | STRING | AI-generated image analysis for the post. (stringified JSON array) |
Show post_analysis structure
{
"post_metadata_summary": {
"post_type": "",
"post_tone": "",
"post_intent": "",
"poster_role_or_affiliation": "",
"target_audience": "",
"posting_motivation": "",
"date_context": "",
"sponsored_or_promotional": false
},
"visual_analysis_summary": {
"key_findings": "",
"notable_objects_or_symbols": "",
"people_or_groups_shown": "",
"locations_or_geo_clues": "",
"emotion_or_energy_level": "",
"forensic_red_flags": []
},
"comment_section_analysis": {
"overall_sentiment": "",
"common_comment_behaviors": "",
"dominant_tones_or_emotions": "",
"top_words_or_emojis": [],
"interaction_patterns": "",
"bot_or_coordinated_activity": false,
"cultural_or_linguistic_signals": ""
},
"behavioral_and_social_insight": {
"likely_poster_motivation": "",
"social_group_affiliations": "",
"influence_or_recruitment_signs": "",
"propaganda_or_polarization_signals": "",
"deception_or_misinfo_signs": ""
},
"osint_value": {
"intelligence_usefulness": "",
"recommended_followup": "",
"confidence_level": "",
"summary_takeaways": ""
}
}
Show image_analysis structure
{
"image_type": "",
"image_tone": "",
"image_scenario": "",
"image_intent": "",
"people_count_visible": "",
"people_visibility_level": "",
"people_gender": "",
"people_age_range": "",
"people_ethnicity": "",
"people_clothing": "",
"people_accessories": "",
"people_hair_description": "",
"people_facial_hair": "",
"people_face_features": "",
"people_body_type": "",
"people_skin_tone": "",
"people_posture": "",
"people_actions": "",
"people_dominant_hand": "",
"people_walking_style": "",
"people_emotions": "",
"people_interaction": "",
"people_possible_role": "",
"people_items_carried": "",
"people_visible_tech": "",
"people_tattoos_piercings": "",
"people_symbols_or_badges": "",
"people_identity_clues": "",
"people_eye_color": "",
"people_glasses_or_contacts": "",
"people_mouth_expression": "",
"people_visible_injuries": "",
"people_makeup_or_face_paint": "",
"people_body_language": "",
"people_proximity": "",
"people_group_behavior": "",
"people_footwear": "",
"people_carry_method": "",
"people_visible_tattoos": "",
"people_eye_contact": "",
"people_accessory_details": "",
"people_disabilities_or_devices": "",
"people_behavior_notes": "",
"text_present": false,
"text_transcribed": "",
"text_language": "",
"text_font_style": "",
"text_meaning": "",
"clothing_style": "",
"clothing_colors": "",
"clothing_symbols_or_logos": "",
"facial_expressions": "",
"group_mood": "",
"scene_location_type": "",
"scene_background": "",
"scene_time_weather": "",
"notable_objects": "",
"tech_or_tools": "",
"vehicles_or_props": "",
"visible_text_on_objects": "",
"uniforms_or_insignia": "",
"environment_signs": "",
"editing_or_staging_signs": "",
"license_plate_number": "",
"license_plate_region": "",
"brands_or_product_names": "",
"unique_identifiers": "",
"safety_gear": "",
"weapon_type": "",
"vehicle_type_or_model": "",
"unusual_objects": "",
"animals_seen": "",
"activity_signs": "",
"time_displayed": "",
"image_quality": "",
"visual_style": "",
"filters_or_watermarks": "",
"geo_clues": "",
"primary_language_seen": "",
"regional_indicators": "",
"slang_or_dialect_detected": "",
"cultural_or_religious_signs": "",
"group_affiliations": "",
"flags_uniforms_gestures": "",
"deception_signs": "",
"hashtags_or_keywords": "",
"geo_political_relevance": "",
"game_detected": false,
"game_name": "",
"exif_device": "",
"watermark_found": false,
"original_image_source": "",
"poster_intent": "",
"target_audience": "",
"engagement_tricks": "",
"psychological_triggers": "",
"radical_language_or_symbols": "",
"call_to_action": "",
"recruiting_or_polarizing_content": "",
"misinfo_or_agenda_signals": "",
"summary_type": "",
"key_takeaways": "",
"cultural_or_geo_significance": "",
"poster_purpose": "",
"osint_value": "",
"confidence_in_analysis": ""
}
Property | Type | Description |
---|---|---|
caption_vector | LIST | Vector embedding of caption text for semantic search.. |
title_vector | LIST | Vector embedding of title text for semantic search.. |
post_analysis_vector | LIST | Vector embedding of post analysis for semantic search.. |
image_analysis_vector | LIST | Vector embedding of image analysis for semantic search.. |
See all properties
Property | Type | Description |
---|---|---|
id | INTEGER | Unique identifier for the comment node. |
text | STRING | Comment text. |
likes_count | INTEGER | Number of likes on the comment. |
created_at_utc | DATE_TIME | UTC timestamp of comment creation. |
Property | Type | Description |
---|---|---|
text_vector | LIST | Vector embedding of comment text for semantic search. |
Relationship | Description |
---|---|
👤 Person - Follows -> 👤 Person | A person follows another person. |
👤 Person - Posted -> 📷 Post | A person created the post. |
👤 Person - Liked -> 📷 Post | A person liked a specific post. |
👤 Person - Commented -> 💬 Comment | A person authored the comment. |
💬 Comment - On -> 📷 Post | The comment is made on a specific post. |
💬 Comment - Reply To -> 💬 Comment | A comment is a reply to another comment. |
👤 Person - Liked -> 💬 Comment | A person liked a comment. |
The OSINTGraph Agent helps you explore, retrieve, and analyze your OSINT data in Neo4j. It works in two main ways:
-
Data Retrieval & Simple Analysis – Fetch accounts, posts, comments, and relationships using filters, graph queries, and searches. You can also ask for quick insights (summaries, counts, highlights) on the retrieved data.
-
Template-Based Analysis – For deeper investigations, use pre-built or custom templates. Templates guide the agent to retrieve the right data and apply structured analysis for more controlled , focused, and repeatable investigations.
This guide shows the two main ways to interact with the OSINTGraph AI Agent - Data Retrieval for quick questions, and Template-Based Analysis for deeper investigations. It also explains how to ask clear questions so you get the most accurate results.
Note
These example questions are just a guide — you can ask the agent in your own words, and it will understand.
Data Retrieval is best for direct queries and simple analyses questions You can use it to fetch data based on filters, relationships, or searches.
Get data by filtering on straightforward criteria (e.g., usernames or dates).
Example:
-
“Get John’s comments from 2025”
(Returns all comments made by John during 2025) -
“How many comments has John made in 2025”
(Returns the total number of comments John made during 2025)
Include social connections in your query — followers, likers, commenters, etc.
Example:
- “Find followers of John who commented on his posts in 2025”
(Returns users who follow John and commented on his posts during 2025)
You can search data using two methods:
-
Keyword Search (literal word match): Finds exact matches of words or phrases.
Example: “Find John’s comments from 2025 with the word ‘conference’”
(Returns John’s 2025 comments containing the exact word “conference”) -
Semantic Search (meaning-based): Finds content based on related meanings, including synonyms or related terms.
Supported fields include:
-
Person:
username
,fullname
,bio
,account_analysis
-
Post:
caption
,title
,post_analysis
,image_analysis
-
Comment:
text
Example: “Show John’s comments from 2025 about startups”
(Returns John’s 2025 comments'text related to “startups,” such as “new companies” or “ventures”) -
You can mix filters, relationships, and content search for precise results:
- “Find followers of John who liked his posts about startups in 2025”
- Filters posts by date (2025)
- Traverses relationships to get John’s followers who liked those posts
- Apply semantic search on post content to find those about startups
- “Find followers of John who liked his posts with the word ‘conference’ in 2025”
- Filters posts by date (2025)
- Traverses relationships to get John’s followers who liked those posts
- Apply keyword search on post content for the exact word “conference”
Being precise makes your results more accurate and useful. Here are key ways to improve your queries:
Examples of precision:
- Vague: "Find posts about aura farming"
- Precise: "Use semantic search, find posts about aura farming."
- Vague: "Search for aura farming"
- Precise: "Use semantic search on post captions about aura farming."
- Vague: "Where is John?"
- Precise: "Which location might John be at based on post captions, post analysis, and person bio?"
- Vague: "Tell me about John"
- Precise: "Give John’s account analysis and follower count."
💡 Tip: When asking, think about:
- What searching method should be applied if needed? (semantic search, keyword search)
- Which data fields should be checked? (person bio, post analysis, post captions, etc.)
- What exactly do you want back? (summary, detailed context, related entities, relationships, etc.)
This will speed up your investigation and ensure the Agent looks in the right places.
Templates are blueprints that tell the AI how to analyze your data. Instead of manually going through posts, comments, likes, and social connections—which can take days—a template lets the OSINTGraph agent gather all the needed data, feed it into a fresh AI, and get clear answers.
Example scenario: You want to figure out where a person might be located. Doing it manually would take hours or days—looking through every post, comment, and followee. With a template, the AI can analyze all this data and summarize likely locations, saving you time and effort.
Each template run:
-
Spawns a new AI instance with no memory of previous runs.
-
Uses a system prompt (the AI’s “brain”) to guide reasoning.
-
Injects the gathered data into a user prompt for analysis.
Templates are great because they let you:
-
Control how the AI thinks and reasons.
-
Get consistent, repeatable results.
-
Analyze large datasets quickly without doing manual work.
-
Reuse the same template across different targets or investigations.
Templates are written as .yaml
files with the following structure:
name: <unique_template_name>
# Example: liked_post_analysis
# A unique identifier for the template. Used to select and run this template.
description: |
<Brief explanation of what the template does, what kind of data it processes, and the type of output it produces.>
# Example:
# Analyze liked posts to infer user interests and personality traits.
input_fields:
# List of placeholders that will be replaced by actual data when running the template.
# Each field defines a unique placeholder name and what data should be injected by OSINTGraph agent into that placeholder.
- name: placeholder1
description: |
<Explain clearly what data this field should contain, and the exact format required.>
# The agent will read these descriptions to automatically choose the correct Cypher queries, run them, and inject the results in the requested format.
# Example:
# Provide User profile info including Person.username and Person.bio.
# Give results in this format:
# Username: ...
# Bio: ...
- name: placeholder2
description: |
<Explain what this second input field should contain and its format.>
# Describe what kind of data should be injected into this second placeholder when the template runs.
# Example:
# A list of posts liked by the user, each with Post.caption and Post.post_analysis.
# Format in this way:
# Post:
# Catpion: ...
# Post analysis: ...
system_prompt: |
<Instructions defining the AI’s role, behavior, reasoning style, and output format>
# Defines the LLM style, tone, rules, how to reason, what to infer, and how to format results
# Example:
# You are a social media analyst. Review the user's liked posts and infer behavioral patterns or thematic interests based on post content.
user_prompt: |
<Task description with placeholders for injected data>
# The task request, with special placeholders `{placeholders}` for injected data
# Example:
# Analyze the following profile and liked posts:
# Profile Info:
# {placeholder1}
#
# Posts liked by the user:
# {placeholder2}
See an example template here: location_analysis.yaml
OSINTGraph comes with several ready-to-use templates that cover common OSINT investigations. You can run them immediately without creating your own.
Examples include:
-
location_analysis – Determine possible locations of the target user by analyzing posts, comments, likes, and their social graph.
-
contact_info_extraction – Scan bios, captions, comments, and images for potential leaks of emails, phone numbers, or addresses.
-
interests_hobbies_lifestyle_analysis – Uncover the target user’s interests, hobbies, and lifestyle preferences with supporting evidence from posts, likes, and network connections.
All predefined templates are maintained in this repository: https://github.com/XD-MHLOO/osintgraph-templates
👉 To see the full list of predefined templates:
Ask the agent to list all templates in the folder.
"list all templates"
👉 To view details of a specific one:
Ask the agent to show a template by name, or you can view the YAML file directly in your templates folder (osintgraph -h
to see the folder path).
"show template location_analysis"
👉 To run a predefined template:
Ask the agent to execute the template.
"Run location_analysis on target_username"
-
You request a template to run
Example template with required additional context (e.g., username):"Run location_analysis template on JohnDoe"
Choose the template you want to run and provide the agent with any required context.
If you're not sure what to provide, simply ask the agent(e.g. "How to use <the template>") — it will guide you.
-
Agent collects required data automatically
Based on the template’s input field descriptions, the agent automatically runs Cypher queries on your Neo4j database. It retrieves all required fields, formats the results, and fills the
{placeholders}
in the template's user prompt. -
Run Template and Get Output
A new LLM instance is created internally, using the template’s system and user prompts to analyze the data, then returns the output (e.g., analysis, summaries, or explanations) depending on the template's system prompt.
Note
OSINTGraph is primarily built using free services (e.g. Gemini API), therefore template runs are rate-limited internally to ensure stability.
You can create a custom template by defining a .yaml
file that controls how the AI analyzes your data.
Let’s say you want to analyze a user's bio, post captions, and comment texts to extract any possible of contact details (such as emails, phone numbers, addresses, etc.) You can build a custom template like this:
name: contact_info_extraction
description: |
Analyze a user's profile bio, post captions, comment texts and image analysis
(OCR and visual text) to detect any possible leaks of contact details such as emails, phone numbers, or addresses, and return them in a structured Markdown list with supporting context.
input_fields:
- name: bio
description: |
The user’s Person.bio.
Format:
Bio:
Text: ...
- name: posts
description: |
List of all posts made by the user. Each post must include:
- Post.shortcode
- Post.caption
- Post.image_analysis
Format (One post per entry):
User Post:
Post Url: https://www.instagram.com/p/<Post.shortcode>/
Caption: ...
Image Analysis:
Image 1:
- People: [...]
- Text/OCR: [...]
- Summary: [...]
Image 2:
...
- name: comments
description: |
A list of Comment.text authored by the user.
Format:
Comment:
Text: ...
system_prompt: |
You are a digital privacy analyst. Your task is to carefully analyze the provided data to identify any possible leaks of contact details, including but not limited to:
- Email addresses
- Phone numbers
- Addresses
- Social media handles, usernames, or IDs
- Any other identifiers that may reveal contact information
- Use pattern recognition and contextual reasoning to flag potential contact details.
- If detected, report each type of possible contact detail (email, phone, address, ..) in a structured Markdown format.
- For each match, include:
- The type of contact detail (Email, Phone, Address, etc.)
- The exact string detected
- The source field (bio, caption, comment, image OCR) (cite Post Url for Post image OCR )
- Context / Possible Use — Based on surrounding information, what the contact might be
- A brief reasoning (if the match is inferred and not explicit)
- A confidence level (High / Medium / Low), with justification for the confidence
- If nothing is found, return: "No possible contact details detected."
user_prompt: |
Review the following content and extract any possible contact-related information:
User Bio:
{bio}
List of User Posts:
{posts}
List of User Comments:
{comments}
Steps to Create Your Template
-
name
Choose a unique name to identify your template. This will be used to select and run the template. -
description
Briefly describe what your template does and the kind of output it produces.This helps the OSINTGraph agent better understand the intent and use of the template.
-
input_fields
Define what data the agent should inject at runtime. Each input field includes:
-
name
: Used as{placeholder}
in the user prompt. -
description
: Explain exactly what data should be injected here and how it should be formatted.
Note
- For direct schema attributes (e.g.,
Person.bio
,Post.caption
), mention them explicitly so the agent knows to fetch them directly from the database.
-
System Prompt
Write clear instructions defining the AI’s role, behavior, how to reason, and how to format its output. This controls how the AI thinks and processes the data. -
User Prompt
Write the actual task description, with{placeholder_name}
tags for runtime data injection.
-
Place your custom
.yaml
template file into your templates folder. (Run osintgraph -h to see where the folder is located.) -
Validate Your Template:
"list all templates including invalid ones"
The agent will display all templates in the folder. If your custom template has errors, it will show where; if no errors appear, your template is valid and ready to use. (No need to restart
osintgraph agent
if it’s already running — simply ask to "refresh and list all templates" again.)
-
Use Your Browser Session
When runningosintgraph setup instagram
, choose login via Firefox session to make the login look natural. 🌐 -
Use Your Real User-Agent
When runningosingraph setup user-agent
, provide the exact user-agent from the browser you use to log in to your Instagram account. 🖥️ -
Enable 2FA
Turn on 2FA for your Instagram account. It’s simple: just use an authenticator app, and it helps Instagram recognize that your account is legitimate. 🔒 -
Build Account Reputation
Use your Instagram account normally (like posts, comment, watch stories) for a few days or weeks before scraping. 📈 -
Warm Up Your Session
Spend time using Instagram before scraping, like a normal user, to avoid looking suspicious. ⏳ -
Avoid VPNs
Don’t use VPNs. Instagram may flag accounts with mismatched or suspicious locations. 🚫🌍 -
Don’t Use the Account for Other Activities While Scraping
When using this tool to collect data, avoid using the same Instagram account for any other activities. 🛑 -
Limit Scraping Time
Don’t scrape for more than 6 hours straight. ⏰
- Thanks to @ahmdrz for these valuable insights on avoiding account suspension. 🙏
- Also see this useful comment on Instaloader's GitHub for more tips.
- Instaloader – Used to collect Instagram profile data, followers, and followees.
- Neo4j – Graph database used to store and visualize the Instagram social network.
- LangGraph – Handles structured multi-step LLM reasoning and ReAct-style agent execution.
- Gemini / Google Generative AI – Provides the LLM model used for AI-powered analysis and powers the AI agent.