ai/ml engineer

i train and fine tune language models, publish open weights and datasets, and build tools for model evaluation and red teaming.

see selected work browse open models get in touch

about

i'm an ai/ml engineer working at the intersection of model training, red teaming, and open-source tooling.

day to day: fine-tuning and quantizing language models, running red teaming and refusal-direction work, and building tools around them. local-first models when they make sense, hosted apis when they don't.

research that ships. evaluations that don't lie. tools other engineers actually use.

core mission

ML engineering & research
red teaming & adversarial testing
agentic systems
dataset curation & synthetic data
local AI

tech arsenal

model optimization & deployment

LoRA / QLoRAfine-tuningDPOORPOquantization (GGUF)

red teaming & adversarial testing

refusal ablationjailbreak researchadversarial testingcompliance pairs

systems & core

Choly CpythontypeScript

backend

fastAPInode.jsexpressdjangoflask

databases

postgreSQLsqliteredisneon db

tools & cloud

prime intellectcloudflarebackblazegithubrest APIsdocker

ML & deep learning

PyTorchtransformershugging facemodel trainingfine-tuningtoo many to mention

data engineering

dataset curationsynthetic data generationdata preprocessingquality filteringdeduplication

current focus

ML engineeringred teamingagentic systemsmodel evaluation

selected projects

DR-OPIC

ML framework for fine-tuning SLMs via Domain-Routed On-Policy Iterative Correction. Combines verified repair, delta-span subtraction, and ZPD-weighted curriculum scheduling. L = L_self + λ_r L_repair + λ_delta L_delta, where w_zpd = 4·p̃·(1−p̃) and p̃ = (s+0.5)/(K+1).

pythonSLM trainingPyTorchverifier

source

SWARMs Debate Primitive

Multi-agent debate and vote coordination system on Solana blockchain. Agents assume distinct personas to debate complex questions, with full session transcripts hashed and recorded on-chain for verifiable AI consensus.

PythonSolanamulti-agentblockchain

source

IntellectSafe

AI engine with multi-model LLM Council, Universal Proxy for frontier models, deepfake detection, and adversarial defense suite.

fastAPInext.jssecurity

live demo source

ModelFang

Graph-based adversarial testing framework for LLMs with multi-turn jailbreak attacks, FSM evaluator, and real-time analyst dashboard.

pythonnext.jsred teaming

live demo source

Model Unfetter

Directional ablation engine for LLM unalignment. Projects and removes refusal directions from model weights while maintaining capabilities.

pythonred teamingresearch

source

Mayo

Autonomous triple-AI engine that analyzes codebases and opens validated PRs hourly with cross-repo global memory.

pythonagentic AIGitHub

source

View All Projects

models & datasets

huggingface.co/josephmayo →

Qwopus 9B Unfettered GGUF

quantized gguf version of qwopus 9b for efficient local inference with llama.cpp and ollama.

Qwopus 9B Unfettered

9B uncensored language model. directional ablation applied to remove refusal mechanisms while preserving general capability.

Refusal Compliance Pairs

200+ curated refusal-compliance prompt pairs for red teaming and adversarial evaluation.

ZAYA1-8B-Coder

merged coder model from Zyphra/ZAYA1-8B plus custom lora. +24% lift on python code evaluation gate.

ZAYA1-8B-Coder-GGUF

quantized gguf builds of ZAYA1-8B-Coder for local inference via llama.cpp, ollama, and lm studio.

ZAYA1-8B-Coder-LoRA

lora adapter for Zyphra/ZAYA1-8B focused on python code generation. +101% relative lift over base.

Fara-7B-Abliterated-v2

refusal-direction-orthogonalized variant of microsoft/Fara-7B. 98.75% compliance on held-out harmful evals.

Fara-7B-Abliterated-v2-GGUF

quantized gguf builds of Fara-7B-Abliterated-v2 for local inference via llama.cpp, ollama, and lm studio.

Public Curated Coding Data

mixed-origin public coding data with 2,700+ prompt/response pairs for llm training experiments.

Mellum2-12B-A2.5B-Thinking-Abliterated-GGUF

quantized gguf builds of Mellum2-12B ablated for refusal removal. MoE architecture with per-expert per-layer projected ablation.

Mellum2-12B-A2.5B-Thinking-Abliterated

abliterated Mellum2-12B thinking model from JetBrains. refusal-direction orthogonalized with CoT steering for reasoning tasks.

LFM2.5-8B-A1B-Coder-GGUF

quantized gguf builds of LFM2.5-8B-A1B Coder for local inference via llama.cpp, ollama and lm studio.

LFM2.5-8B-A1B-Coder

fine-tuned LiquidAI LFM2.5-8B-A1B MoE model for real-world coding tasks. multilingual and conversation-optimized.

LFM2.5-8B-A1B-Coder-LoRA

lightweight lora adapter for LFM2.5-8B-A1B focused on real-world coding and multilingual tasks.

Holo-3.1-4B-Coder-GGUF

quantized gguf builds of Holo-3.1-4B-Coder for local inference via llama.cpp, ollama and lm studio.

Holo-3.1-4B-Coder

fine-tuned Hcompany Holo-3.1-4B for coding tasks. merged model optimized for python and software development.

Holo-3.1-4B-Coder-LoRA

lora/qlora adapter for Holo-3.1-4B focused on coding and python development.

HRM-Text-1B-sft-code

fine-tuned sapientinc HRM-Text-1B for code generation. trained on HumanEval and MBPP benchmarks.

HRM-Text-1B-sft-code-LoRA

lora adapter for HRM-Text-1B focused on python code generation and coding benchmarks.

Curated OpenBMB Code/Math

31,909 rows of curated code/math post-training data derived from OpenBMB UltraData. includes SFT and think splits.