If you've used JogIt and wondered what's actually happening when you speak a search query and a photo from four years ago appears on screen in 1.4 seconds, this page is the answer.
I'm going to be more technical here than anywhere else on the site, because the most common question I get is "how is this private if it uses AI?" and the only honest answer to that question is to show you the pipeline. The short version is that none of your photos, none of your voice, and none of your face data ever leaves your phone. The longer version, which is the rest of this article, explains how that's possible — and how you can verify it yourself in a network monitor.
This is also the page I want the technically-minded among you to read before you trust the Privacy Pledge. The pledge is the promise. This page is the receipts.
1.What happens the first time you open JogIt
The first thing JogIt does on first launch — after you grant access to your photo library — is build an index. Indexing is a one-time process that runs whenever your phone is plugged in and idle. For a 5,000-photo library on an iPhone 13, indexing finishes in 11–15 minutes. On a Galaxy S23+, similar. On older hardware it takes longer, and we tell you that upfront on a one-time Compatibility Notice screen before the index starts.
What's happening during indexing, photo by photo:
- The photo gets read from the system Photos library. On iOS this uses PHAsset and the Photos framework; on Android, MediaStore. The photo data never leaves the process boundary of the JogIt app.
- The photo runs through a vision model. We use a quantized variant of MobileCLIP — a small, fast version of CLIP designed for mobile inference. The model produces a 768-dimensional embedding: a vector of 768 floating-point numbers that describes the contents of the photo in a way the search engine can compare to text descriptions.
- Faces in the photo run through a face model. MobileFaceNet produces a 512-dimensional embedding per detected face. These are mathematical fingerprints, not pixel data — and importantly, the source face crop is never written to disk. (More on this in Section 3.)
- OCR runs on text in the photo. Receipts, signs, screenshots, handwritten notes. JogIt uses the system OCR — Apple's Vision framework on iOS, Google's ML Kit Text Recognition on Android. The recognized text is added to the photo's searchable metadata.
- All of it is written to a local SQLite database. Vectors live in
sqlite-vectables. The database file is encrypted at rest with SQLCipher; the encryption key is in the OS secure enclave (iOS Keychain / Android Keystore).
Total disk footprint: roughly 5 KB per indexed photo. A 10,000-photo library takes around 50 MB. The original photos themselves live where they always lived — in the system Photos library — and JogIt never copies, moves, or modifies them.
That's the whole indexing pipeline. It runs on your phone, using your phone's processor, and nothing about your photos crosses the network boundary.
2.The vision model — what it actually does
MobileCLIP is a smaller, faster member of the CLIP (Contrastive Language–Image Pretraining) family that OpenAI introduced in 2021 and Apple shipped a mobile-tuned version of in 2024. The model learns to put images and text into the same vector space: an image of a sunset over a beach and the text string "sunset over a beach" end up at nearby points in that 768-dimensional space.
That's how JogIt's search works at the lowest level. When you type or speak "me and my wife at the Rebelution concert," JogIt:
- Encodes the query into a text embedding using the same MobileCLIP model.
- Finds the photos whose image embeddings have the highest similarity (cosine distance) to that text embedding.
- Returns them, ranked.
The same model that read your photos at indexing time is the model that reads your query at search time. They have to be the same model — that's the only way the vector space comparison works.
What this doesn't mean: JogIt is not "running a search through a cloud-hosted CLIP model." MobileCLIP is a downloaded asset that ships inside the JogIt app bundle. The first time you open the app, the model file is already there. It runs through your device's neural engine — the Apple Neural Engine on iPhones with the A12 chip or later, and the equivalent ML accelerator on Android (NNAPI on most devices, chipset-specific delegates on Pixel and Samsung).
Inference for a typical search query takes under 200 milliseconds, which is why the end-to-end search latency is around 1–2 seconds and not 10.
3.Face detection and tagging — why we don't store face images
If you tap a face in a photo and give it a name — say, "Sarah" — JogIt does something that surprises people when they hear it: it doesn't save the face image. It saves a math fingerprint of the face, and the name you gave it. That's it.
The math fingerprint is a 512-number vector produced by MobileFaceNet, the face equivalent of MobileCLIP. Two pictures of Sarah, taken three years apart in different lighting, produce two fingerprints that are very close to each other in the 512-dimensional space — close enough that JogIt can confidently say "this is Sarah" without ever seeing the two faces side by side.
What it means in practice:
- The source face image is never written to disk. It exists only in memory during indexing, while MobileFaceNet is producing the fingerprint, and is dropped immediately after.
- You can't reverse-engineer a face image out of the fingerprint. Not in a meaningful way. The fingerprint preserves the math features the model considers identity-discriminating; it does not preserve enough of the original pixel grid to reconstruct a face.
- The fingerprints live only on your phone, in the same SQLCipher-encrypted database as everything else. Uninstall JogIt and they're gone, encrypted bytes and all.
If you've ever turned on the "People" feature in Apple Photos or "Face groups" in Google Photos and felt slightly uneasy about it — fair. Apple's implementation is also on-device (and good). Google's is on a server. JogIt's is on-device by design and verifiable in network logs.
4.The query parser — how fuzzy time and people work
If you type "Grandma at Christmas a couple years ago," there are three things happening behind the scenes:
- "Grandma" is resolved to the named-face embedding for whoever you tagged as Grandma. If you haven't tagged anyone, the query falls back to general visual similarity.
- "Christmas" is resolved through MobileCLIP's text-image space — it matches photos containing Christmas trees, decorations, wrapped presents, holiday meals.
- "a couple years ago" is resolved to a soft time filter — JogIt parses common fuzzy time expressions ("last summer," "a few months ago," "around Thanksgiving 2023") into a time window with soft edges, so photos right outside the window can still appear if their content score is strong enough.
The query parser is a small, on-device language model — not an LLM, just a tagger that pulls names, time expressions, and visual concepts out of the query string. It runs in under 50 milliseconds.
5.Voice input — why on-device speech recognition is the only kind we use
When you tap the microphone, JogIt uses your device's on-device speech recognizer:
- iOS:
SFSpeechRecognizerwithrequiresOnDeviceRecognition = true. - Android 12+:
SpeechRecognizer.createOnDeviceSpeechRecognizer.
Both of those APIs route speech-to-text through models that run on your phone. The audio never goes to a server. The transcription never goes to a server. The audio file exists in RAM only while you're holding the mic button down, and is dropped the instant transcription is complete — it is never written to disk, not even as a temp file.
If your device doesn't support on-device speech recognition for your locale, the microphone button doesn't appear at all. We do not fall back to a cloud STT service under any circumstance, because the moment we did, JogIt's privacy claim would be a lie.
6.Search execution — sqlite-vec, not faiss
The actual search — given a query embedding, find the most similar photo embeddings — runs through sqlite-vec, which is a vector similarity extension for SQLite. We picked it over the more famous alternatives (faiss, USearch, ScaNN) for two reasons:
- Operational simplicity. SQLite is already a battle-tested embedded database.
sqlite-vecadds vector similarity as a custom virtual table. It's one binary, no separate index file, no server. - Cross-platform parity. The same SQL query runs identically on iOS and Android. The same database schema. The same index structure. There is no platform-specific behavior to debug.
Search execution against a 10,000-photo index takes well under 100 milliseconds on every device we've tested.
7.What never happens
For completeness, here is the list of things JogIt explicitly does not do:
- No outbound network requests during photo indexing, voice capture, or search. Zero.
- No analytics SDK is included in the app — Firebase Analytics, Mixpanel, Amplitude, Segment, PostHog, and a dozen others are on a build-time deny-list that fails CI if imported.
- No notification permission is requested. The notification APIs are not linked into the binary.
- No location permission is requested.
- No camera permission is requested (JogIt reads photos from the library, it doesn't take new ones).
- No account, login, profile, or email is collected. JogIt has no concept of a user identity.
The website you're reading this on uses Plausible Analytics, which is privacy-friendly and EU-hosted. That's the entirety of JogIt's analytics surface.
How to verify any of this yourself
You don't have to take my word for it. Here's a 10-minute verification you can run yourself.
- Install JogIt on an iPhone or Android phone connected to your Wi-Fi network.
- On your laptop, install Proxyman (or
mitmproxyif you prefer command-line). Both are free for non-commercial use. - Configure your phone's Wi-Fi to route through your laptop's proxy. Install the proxy's root certificate on the phone so HTTPS traffic is inspectable.
- Open JogIt. Run a few searches, type and voice. Watch the proxy's request log.
- Count the JogIt-originating requests. The number should be zero.
If you see one, please email me — that's a P0 bug and I want to know about it.
The founder-signed commitment that this all stays true going forward is at the Privacy Pledge.
— Jack
Founder, JogIt