AI, Software and Systems Engineer · Creator of TrueEye · Technical Thinker and Obsessive Builder
“Make the complex accessible. Make the invisible useful. And have a bit of fun along the way.”
This isn't a static portfolio.
This is my lab. My journal. My sandbox.
A place to document experiments, code, benchmarks, architectures, breakthroughs — and sometimes breakdowns.
Everything here is fueled by obsession, crafted from scratch, and tested in the wild (often with caffeine, sometimes with existential dread).
If it helps you learn, build, debug, or question something better — mission accomplished.
Hi, I’m Gonzalo — but online, I go by DeepRat.
I’m an Engineer in Artificial Intelligence, Software and Systems, trained formally and shaped obsessively by practice.
From backend APIs and retrieval architectures, to agents that analyze language and intent, I build systems that think, connect, and explain.
When I was 6, a book fell into my hands like a prophecy:
Cosmos, by Carl Sagan.
Since that day, I’ve been running through every branch of science — physics, psychology, philosophy, biology, math — not out of duty, but compulsion.
If it had structure, I wanted to break it open and understand how it breathed.
Eventually, I found a domain where I could apply everything I knew, loved, and questioned: artificial intelligence.
And from there, I never stopped building.
- 🤖 TrueEye — Media Literacy AI that analyzes news for bias, audience and intent.
- 🧠 Multi-Agent Systems — RAG pipelines with reasoning, memory, and task delegation.
- 📚 Educational Tools — Like Mole, a bilingual chatbot that learns your documents.
- 🧪 Model Benchmarks — Comparative tests on LLM performance, latency, semantic accuracy.
- 🧬 Conceptual Prototypes — Like ConCiencia, a philosophical AI project exploring self-awareness.
From concept to production — I don’t just contribute parts.
I take raw ideas and shape them into deployable systems: I architect, code, integrate, optimize, and launch.
From backend APIs to RAG agents, from embeddings to fine-tuning, from prompt to product — I do it all, end-to-end.
- Prompt engineering frameworks, reasoning trees, and agent workflows
- Modular RAG pipelines with semantic search, hybrid routing, multi-agent delegation and memory
- Multimodal architectures integrating text, vision, audio, and user context
- Tools that explain, contextualize, and adapt to real-world information
- API backends using FastAPI, LangChain, LangGraph, and tool-integrated logic
- Agents powered by open models (LLaMA, Phi, Granite, Qwen, DeepSeek, etc.), Claude, GPT-4, Cohere, via Ollama, Transformers, or direct API
- Embedding engines with MiniLM, Instructor, Specter2, CLIP, GTE, custom ST models
- Vector pipelines using FAISS, Chroma, hybrid retrievers and semantic chunking
- Local inference setups with Ollama, GGUF, AutoGPTQ, and quantized adapters
- Tool-based reasoning using ToolCalling, LangGraph agents, custom toolkits
- Interfaces and apps built with Gradio, HTML/CSS/JS, and Tailwind
- LoRA and QLoRA fine-tuning pipelines on Colab A100 and local Ollama
- Adapter-based optimization using PEFT, Transformers, and bitsandbytes
- Training-ready scripts for model personalization, RAG adaptation and instructional tuning
- Quantized model deployment via GGUF, 4-bit inference, bitsandbytes, and AutoGPTQ
- Live demos, APIs, and production apps on:
- Hugging Face Spaces (with Gradio or FastAPI)
- Google Cloud Platform (VMs with L4 GPU or T4)
- Local Linux servers via SSH, or containerized apps
- End-to-end architecture: from notebook to backend, from raw data to running app
- 🧠 Open-source: LLaMA (all versions), Phi (2/3/4), Qwen (chat/code/VL), DeepSeek, Granite (IBM), Gemma, Mistral, Mixtral, Falcon, Dolly, Zephyr, OpenChat, Nous-Hermes, Orca, GPT-J
- 📡 API-accessible: Claude 3 (Opus, Sonnet), GPT-4 / GPT-3.5, Cohere, DeepSeek Cloud, OpenRouter
- 🖥 Local inference: Ollama, Transformers.js, GGUF (GGML), AutoGPTQ, llama.cpp, QLoRA deployments
- Fine-tuning with LoRA, QLoRA, PEFT, using Transformers, bitsandbytes, and Colab A100 / Local GPU
- Instruction tuning, quantization, adapter merging, and deployment for lightweight inference
- Embeddings: MiniLM, Instructor-XL, GTE-base, Specter2, CLIP, E5, SBERT, Sentence-T5
- Vector DBs: FAISS (IndexFlatIP, HNSW), Chroma, Weaviate (test), Milvus (exploratory), Pinecone (basic)
- Custom retrievers: hybrid, weighted, multilingual, hierarchical memory
- Vision: OpenCV, YOLOv8, MediaPipe, Vision Transformers (ViT, CLIP), BLIP, Qwen-VL
- Generative: Stable Diffusion 1.5, SDXL, Realistic Vision, Dreamlike Photoreal, Prompt-to-Image pipelines
- Multimodal orchestration between text ↔ vision ↔ audio (proof of concept)
- LangGraph (agent trees, memory, tool routing), LangChain (chains, retrievers, tools)
- Custom memory managers, tool builders, query decomposers, and document loaders
- Backend services: FastAPI, Flask, Streamlit
- Gradio, HTML/CSS/JS, Tailwind, Markdown rendering, Jinja2
- Hugging Face Spaces (FastAPI & Gradio apps), Google Cloud VMs, Colab Pro
- Local deployment via Docker, Linux VM + SSH, Ollama agents
- Git, GitHub Actions, rclone, gdown, Notion, VS Code + SSH, bash scripting
- JupyterLab, Google Colab Pro, Markdown-based docs, GCP firewall + network setup
Everything I build is modular, reproducible, and functional —
Not just notebooks. Not just demos. Real systems built to think, adapt, and run.
“Reality is far too beautiful. There’s so much to discover, that I knew I would never feel empty again.”
— DeepRat