iOS & Mobile

Last updated May 2026

Project Q&A Knowledge Base

Overview

PixScan is a privacy-focused iOS app that helps users efficiently manage their photo library by combining swipe-based curation with on-device OCR text extraction. It's designed for anyone who has accumulated thousands of photos and wants a fast, intuitive way to sort through them — keeping what matters, deleting what doesn't, and extracting text from receipts, screenshots, or documents along the way.

Key Features

Interactive Onboarding: First-launch tutorial with practice swipe cards, mock sheets, and feature highlights. Users must perform each gesture (including opening the delete queue and OCR menu) to progress. Uses SF Symbol compositions on gradient backgrounds — zero bundled assets.
Swipe-Based Curation: Four-direction swipe gestures for rapid photo triage — left to delete, right to keep, up/down to extract text. Inspired by the speed of dating app interfaces.
On-Device OCR: Apple Vision framework extracts text from photos with accurate recognition and language correction, entirely on-device.
Smart Delete Queue: Photos aren't deleted immediately. They're queued with preview, metadata, and batch operations so nothing is lost by accident.
Full Session Persistence: Progress, OCR texts, and delete queue are all saved to UserDefaults automatically. Close the app, come back later, and pick up exactly where you left off — including your pending deletions and extracted text.
Text Export: Search, copy, share, or save all extracted text as a .txt file.
Full-Screen Photo Viewer: Long press any photo to open it full-screen with pinch-to-zoom (1x–5x), pan when zoomed, and double-tap to toggle zoom level.
Compact Instruction Bar: Color-coded icon-based instruction panel showing all swipe actions at a glance.
Image Prefetching: PHCachingImageManager prefetches upcoming images for smooth, lag-free swiping through large libraries.
Progress Tracking: Live counter with animated progress bar showing photos processed, kept, deleted, and OCR'd.
Screenshot Filter Mode: Toggle between all photos and screenshots-only via toolbar menu. Processed IDs are shared across modes so nothing is re-processed.
Storage Savings Display: Auto-formatted GB/MB display on delete buttons showing exactly how much space will be freed.

Technical Highlights

Tinder-Style Photo Triage

I built the swipe interface using SwiftUI's DragGesture, mapping four distinct swipe directions to different actions. The card-style UI provides visual feedback — color-coded indicators, smooth animations, and haptic feedback — making it feel natural to process hundreds of photos in minutes.

Vision Framework OCR Pipeline

The OCR pipeline loads full-resolution images on a background thread, runs VNRecognizeTextRequest with .accurate recognition level and language correction, then collects results back on the main thread. A caching mechanism (tracking processed photo IDs) prevents re-scanning photos that have already been processed.

Lightweight Persistence Without a Database

Rather than introducing Core Data or SQLite, I used UserDefaults to store processed photo IDs as a serialized set, recognized OCR texts as JSON-encoded Codable structs, and the delete queue as an array of asset localIdentifiers. This keeps the app lightweight while ensuring users never lose their progress — not just which photos were swiped, but also their pending deletions and all extracted text.

Modular Codebase from a Monolith

The original ContentView was 1,561 lines handling everything. I refactored it into 13 focused files — each view component, utility, and modifier in its own file. Xcode 16's PBXFileSystemSynchronizedRootGroup auto-discovers new .swift files, so the extraction required zero project file changes.

Full-Screen Photo Viewer with Gesture Composition

The full-screen viewer combines MagnifyGesture and DragGesture using .simultaneously() so users can pinch-to-zoom while panning — a common pattern in photo apps. Pan offset is boundary-clamped based on the current zoom level to prevent the image from drifting off-screen. Double-tap toggles between 1x and 2x zoom with animation.

Interactive Onboarding with State Machine

I built the onboarding as a four-step state machine (OnboardingStep enum) gated by @AppStorage. The practice swipe phase uses a secondary PracticePhase state machine to guide users through post-swipe interactions — tapping the trash icon to open a mock delete queue, or the OCR icon to view mock extracted text. Pulsing toolbar overlays animate with .repeatForever(autoreverses: true) to draw attention. The entire flow uses zero bundled image assets — all sample cards are SF Symbol compositions on gradient backgrounds, keeping the binary minimal.

Filter-Agnostic State Tracking

I designed the processedPhotoIds set to be filter-agnostic — tracked by unique localIdentifier rather than by filter context. This means adding new browse modes (like screenshot-only filtering) requires zero changes to the persistence layer. The FilterMode enum controls only the PHFetchOptions predicate, while the processing state works identically across all modes.

Comprehensive Test Suite

I built 32 unit tests using Swift Testing (@Test, #expect) with @Suite(.serialized) to avoid UserDefaults race conditions. Tests cover model logic, persistence round-trips, onboarding data types (step progression, sample card configuration, flag preservation), view instantiation, and edge cases. The 25 UI tests use XCTSkipUnless to gracefully skip photo-dependent and onboarding-gated tests, while onboarding flow tests verify welcome screen elements, practice swipe prompts, feature highlights content, and permission screen layout.

Engineering Decisions

On-device OCR vs. cloud API

Constraint: Photos are sensitive data; the app needs to work offline and feel instant.
Options: Cloud OCR (Google Vision, AWS Textract) for higher accuracy; Apple Vision on-device; a hybrid mode.
Choice: Apple Vision exclusively, run on a .userInitiated background queue.
Why: Vision's accuracy with .accurate recognition and language correction is good enough for receipts, screenshots, and documents. Cloud APIs would add latency, network failure modes, an upload privacy surface, and ongoing cost — none of which the product needs.

UserDefaults vs. Core Data / SwiftData

Constraint: Persist three things across launches — processed photo IDs, OCR results, and a delete queue.
Options: Core Data, SwiftData, SQLite via GRDB, plain UserDefaults with JSON encoding.
Choice: UserDefaults storing a serialized Set<String>, JSON-encoded Codable TextEntry array, and an array of localIdentifier strings.
Why: All three structures are small and accessed as a whole, not queried. Skipping a database removes a schema-migration axis and keeps cold-start cost trivial. If a future library scales past ~10,000 processed IDs, SwiftData becomes the natural next step.

Four-direction swipe vs. buttons

Constraint: Triaging a large photo library needs to feel fast — a couple of seconds per photo, one-handed.
Options: Buttons under each photo; long-press menu; two-direction swipe with a separate OCR mode; four-direction swipe.
Choice: Four directions mapped to keep / delete / OCR+keep / OCR+delete, with double-tap to undo and long-press for full-screen.
Why: Buttons make every action a deliberate tap. Folding OCR into the swipe itself (up and down) means the user never needs a mode switch to extract text from a receipt. Double-tap undo covers the misfire case.

Single ViewModel vs. multiple

Constraint: One primary screen, several sheet-based sub-views (delete queue, OCR notes, full-screen viewer) that all share state.
Options: Per-sheet view models with a coordinator passing state; a single shared ObservableObject.
Choice: One PhotoViewModel injected as @EnvironmentObject.
Why: All sub-views read or mutate the same underlying photo array, processed set, and delete queue. Splitting would force synchronization plumbing for no architectural payoff at this scope.

Frequently Asked Questions

How does the onboarding work?

On first launch, the app shows an interactive tutorial instead of the main photo view. Users practice all four swipe gestures on sample cards, open the mock delete queue and OCR notes sheets, and learn about undo, full-screen preview, and screenshot filtering. The tutorial ends with a photo library permission request. An @AppStorage flag ensures onboarding only shows once — subsequent launches go directly to the main app.

How does the OCR work?

PixScan uses Apple's Vision framework (VNRecognizeTextRequest) to perform optical character recognition entirely on-device. It loads the full-resolution image, runs text detection with the .accurate recognition level and language correction enabled, then stores the extracted text with a timestamp.

Why did you choose SwiftUI over UIKit?

SwiftUI's declarative syntax made it significantly faster to build the gesture-driven UI. The DragGesture API, combined with @Published properties on the view model, creates a reactive pipeline where swipe actions flow naturally into state changes and UI updates.

How does the app handle large photo libraries?

Photos are fetched as lightweight PHAsset references — not loaded into memory until displayed. Images are loaded on-demand with size constraints, and OCR runs on a background thread to keep the UI responsive. The processed-photo tracking prevents redundant work across sessions.

What happens if I accidentally swipe a photo to delete?

Double-tap to immediately undo and go back to the previous photo. Even if you don't catch it right away, photos aren't deleted immediately — they're added to a review queue where you can preview them, deselect specific ones, or cancel the deletion entirely.

Does the app upload my photos anywhere?

No. PixScan processes everything locally on your device using Apple's native frameworks. No photos, text, or metadata ever leave your device. There are no analytics, no cloud APIs, and no network calls.

How is progress saved?

Each processed photo's unique identifier is saved to UserDefaults after you swipe it. Additionally, recognized OCR texts are persisted as JSON-encoded structs, and the delete queue is stored as an array of asset localIdentifiers. When you relaunch the app, it restores all three: processed photo set, extracted texts, and pending deletions. You can reset all progress with the "Start Over" button.

How is the codebase organized?

The project follows MVVM with a single PhotoViewModel and 17 focused Swift files. The app entry point gates between OnboardingView (first launch) and ContentView (returning user) via @AppStorage. View components (OnboardingView, PhotoCard, FullScreenPhotoView, InstructionBar, PhotoPermissionView, DeleteQueueView, OCRNotesView, PhotoThumbnail, PhotoPreview, ProgressStatsView) are each in their own file. Shared utilities (SwipeDirection, ButtonPress, Collection+Safe, ViewControllerUtils) are extracted for reuse. Xcode 16's auto-discovery means no manual project file configuration when adding new files.

How is the app tested?

32 unit tests using Swift Testing validate model logic (persistence, selection, onboarding data types, view instantiation, safe subscripts, Codable round-trips). 25 UI tests using XCTest cover onboarding flow (welcome screen, practice prompts, feature highlights, permission screen), smoke scenarios (launch, initial state), and photo-dependent interactions (swipe gestures, long-press full-screen viewer, delete queue sheet, OCR notes sheet). Tests use XCTSkipUnless to skip gracefully when gated by onboarding or missing photos.

What does the screenshot filter actually filter on?

The toolbar menu switches FilterMode between .allPhotos and .screenshots, which only changes the PHFetchOptions predicate used by PhotoKit. The processedPhotoIds set is shared across both modes by design, so a photo you already swiped won't reappear when you toggle modes.

Why does pinch-to-zoom and pan work simultaneously in the full-screen viewer?

FullScreenPhotoView composes MagnifyGesture and DragGesture with .simultaneously() so they update independently as the user moves their fingers — the same feel as the system Photos app. Pan offset is clamped against the current zoom scale so the image can't drift fully off-screen at any zoom level.

What gets prefetched on each swipe?

On every index change, ContentView asks PHCachingImageManager to start caching the next three PHAssets at the card's display size. That makes the next several swipes feel instant even on a library with several thousand photos, without holding the entire library in memory.

Technology Stack

Core Technologies

Category	Technology	Version	Purpose
Language	Swift	5.9+	Primary development language
UI Framework	SwiftUI	iOS 17+	Declarative UI with gesture support
OCR Engine	Apple Vision	iOS 17+	On-device text recognition
Photo Access	PhotoKit (Photos)	iOS 17+	Photo library read/write/delete
Persistence	UserDefaults	Native	Session state across app launches

Frontend

Framework: SwiftUI (100% — no UIKit views except share sheets)
State Management: @StateObject / @EnvironmentObject with ObservableObject (MVVM)
Styling: Native SwiftUI modifiers, custom ButtonPress ViewModifier for press effects
Navigation: NavigationStack with .sheet() presentations
Gestures: DragGesture, TapGesture(count: 2), LongPressGesture, MagnifyGesture, Menu
Layout: GeometryReader for responsive sizing (replaces deprecated UIScreen.main)

Infrastructure

Hosting: Native iOS app (App Store / TestFlight)
CI/CD: Xcode build system
Monitoring: Console logging (print() with DEBUG flags)

Development Tools

IDE: Xcode 16.2+
Build System: Xcode native with PBXFileSystemSynchronizedRootGroup (auto-discovers new files)
Package Manager: None (zero external dependencies)
Unit Testing: Swift Testing framework (@Test, #expect) with @Suite(.serialized) — 32 tests
UI Testing: XCTest / XCUITest with conditional skips (XCTSkipUnless) — 25 tests

Key Dependencies

Package	Purpose
`SwiftUI`	Entire UI layer — views, gestures, navigation, animations
`Photos`	Fetch photo assets, request permissions, batch-delete photos
`Vision`	`VNRecognizeTextRequest` for accurate OCR with language correction
`UIKit`	`UIActivityViewController` for share sheet, `UIImpactFeedbackGenerator` for haptics
`Foundation`	`UserDefaults` for persistence, `DispatchQueue` for threading

Notable Implementation Choices

Zero External Dependencies

The project uses only Apple's native frameworks. This eliminates dependency management overhead, reduces app size, and ensures long-term stability without third-party breakage risk. The onboarding tutorial uses SF Symbol compositions on gradient backgrounds instead of bundled image assets, keeping the app binary minimal.

Background Threading for OCR

OCR requests run on a .userInitiated quality-of-service background queue to keep the UI responsive while processing high-resolution images.

Efficient Photo Loading with Prefetching

Photos are loaded on-demand with target size constraints. PHCachingImageManager prefetches the next 3 images ahead of the current index on every swipe, ensuring smooth transitions without loading the entire library into memory.

Modular Architecture

The codebase is split into 16 focused Swift files (down from a monolithic 1,561-line ContentView). Each component has a single responsibility, making the code more maintainable and testable. Xcode 16's PBXFileSystemSynchronizedRootGroup automatically discovers new files without manual project configuration.

Comprehensive Test Coverage

32 unit tests using Swift Testing (@Test, #expect) with @Suite(.serialized) validate model logic, persistence, onboarding data types, view instantiation, and safe subscript behavior. 25 UI tests with XCTSkipUnless conditional skips test smoke scenarios, onboarding flow elements, and photo-dependent interactions including instruction bar visibility and long-press full-screen viewer.

App Store Deployment Ready

Includes a PrivacyInfo.xcprivacy privacy manifest declaring UserDefaults usage (CA92.1) and automatic code signing — all required for App Store submission. The interactive onboarding replaces the static launch screen, providing a better first-run experience.

Architecture Overview

System Diagram

flowchart TD
    subgraph UI["UI Layer (SwiftUI)"]
        OB[OnboardingView]
        CV[ContentView]
        PC[PhotoCard]
        FSPV[FullScreenPhotoView]
        IB[InstructionBar]
        PPV[PhotoPermissionView]
        OCR_V[OCRNotesView]
        DQ_V[DeleteQueueView]
        PT[PhotoThumbnail]
        PP[PhotoPreview]
        PSV[ProgressStatsView]
    end

    subgraph Shared["Shared Utilities"]
        SD[SwipeDirection]
        BP[ButtonPress Modifier]
        CS[Collection+Safe]
        VCU[ViewControllerUtils]
    end

    subgraph VM["ViewModel Layer"]
        PVM[PhotoViewModel]
    end

    subgraph Services["Apple Frameworks"]
        PHLib[Photos / PhotoKit]
        PHCache[PHCachingImageManager]
        Vision[Vision OCR]
        UD[UserDefaults]
    end

    subgraph Device["On-Device"]
        PhotoLib[(Photo Library)]
        Storage[(Local Storage)]
    end

    OB -->|completes| CV
    CV -->|swipe gestures| PVM
    CV -->|renders| PC
    CV -->|renders| IB
    CV -->|long press| FSPV
    CV -->|renders| PPV
    CV -->|presents sheet| OCR_V
    CV -->|renders| PSV
    CV -->|presents sheet| DQ_V
    DQ_V -->|list rows| PT
    DQ_V -->|overlay| PP

    PVM -->|fetch & delete| PHLib
    PVM -->|prefetch nearby| PHCache
    PVM -->|text recognition| Vision
    PVM -->|persist state| UD

    PHLib --> PhotoLib
    PHCache --> PhotoLib
    UD --> Storage

Component Descriptions

Photo_HelperApp

Purpose: App entry point with onboarding gate
Location: PixScan/Photo_HelperApp.swift
Key responsibilities: Creates the WindowGroup and routes to either OnboardingView (first launch) or ContentView (returning user) via @AppStorage("com.pixscan.v2.hasCompletedOnboarding")

OnboardingView

Purpose: Interactive first-launch tutorial
Location: PixScan/OnboardingView.swift (~815 lines)
Key responsibilities:
- Four-step flow: Welcome → Practice Swipes → Feature Highlights → Permission Request
- Practice cards with SF Symbol compositions on gradient backgrounds
- Post-swipe interactions: mock delete queue sheet, mock OCR notes sheet
- Pulsing toolbar overlays guiding users to tap trash/OCR icons
- PHPhotoLibrary permission request with double-tap guard
- Sets hasCompletedOnboarding = true to transition to ContentView

ContentView

Purpose: Main user interface and interaction handling
Location: PixScan/ContentView.swift
Key responsibilities:
- Orchestrates the swipe gesture detection and direction-based actions
- Manages navigation to OCRNotesView and DeleteQueueView via sheets
- Presents FullScreenPhotoView via long press and fullScreenCover
- Triggers image prefetching via PHCachingImageManager on index changes
- Shows permission request, completion state, and instruction bar

PhotoCard

Purpose: Renders the current photo as a swipeable card
Location: PixScan/PhotoCard.swift (34 lines)

PhotoPermissionView

Purpose: Photo library permission request and denied state UI
Location: PixScan/PhotoPermissionView.swift (45 lines)

DeleteQueueView

Purpose: Sheet for reviewing and batch-deleting queued photos
Location: PixScan/DeleteQueueView.swift (216 lines)
Key responsibilities: Displays thumbnails with metadata, actual file sizes via PHAssetResource, selective and full batch deletion, full-size photo preview overlay

OCRNotesView

Purpose: Sheet for viewing, searching, copying, sharing, and exporting OCR text
Location: PixScan/OCRNotesView.swift (135 lines)

FullScreenPhotoView

Purpose: Full-screen immersive photo viewer with pinch-to-zoom and pan
Location: PixScan/FullScreenPhotoView.swift
Key responsibilities: Loads full-resolution image via PHImageManager, supports MagnifyGesture (1x–5x zoom), DragGesture for panning with boundary clamping, double-tap to toggle zoom, tap/swipe-down to dismiss

InstructionBar

Purpose: Compact icon-based instruction panel showing swipe actions
Location: PixScan/InstructionBar.swift
Key responsibilities: Stateless presentational component displaying 5 color-coded icon items (Delete, Keep, OCR Delete, OCR Keep, Undo) with SF Symbols and labels

ProgressStatsView

Purpose: Compact progress display showing processed/total counter, animated progress bar, and stats row
Location: PixScan/ProgressStatsView.swift
Key responsibilities: Stateless presentational component displaying processed count, progress bar, and kept/deleted/OCR'd stats

PhotoThumbnail / PhotoPreview

Purpose: Thumbnail (60x60) and full-size image loading from PHAssets
Locations: PixScan/PhotoThumbnail.swift, PixScan/PhotoPreview.swift

PhotoViewModel

Purpose: Central state management and business logic
Location: PixScan/PhotoViewModel.swift (368 lines)
Key responsibilities:
- Fetches photos from the photo library via PhotoKit
- Manages the current photo index and navigation
- Runs OCR text recognition via the Vision framework
- Maintains the delete queue and selection state
- Persists processed photo IDs, recognized texts (JSON), and delete queue (localIdentifiers) to UserDefaults
- Prefetches nearby images using PHCachingImageManager
- Provides unmarkPhotoAsProcessed() for undo/go-back support
- Supports filter mode (all photos / screenshots) with persistence

Shared Utilities

SwipeDirection (SwipeDirection.swift): Enum for gesture direction
ButtonPress (ButtonPress.swift): Custom ViewModifier for press-effect buttons
Collection+Safe (Collection+Safe.swift): Safe subscript to avoid index-out-of-bounds crashes
ViewControllerUtils (ViewControllerUtils.swift): Shared findTopMostViewController function for presenting UIKit sheets
FilterMode (PhotoViewModel.swift): Enum for photo filter modes (allPhotos, screenshots)
OnboardingStep (OnboardingView.swift): Enum state machine for onboarding flow progression
PracticePhase (OnboardingView.swift): Enum for post-swipe interaction phases (swipe, tapTrash, tapOCR, confirmation)
SampleCard (OnboardingView.swift): Data model for practice swipe cards with SF Symbol compositions

Data Flow

App Launch: Photo_HelperApp checks @AppStorage("com.pixscan.v2.hasCompletedOnboarding"). First-launch users see OnboardingView (interactive tutorial → permission request). Returning users go directly to ContentView, where PhotoViewModel loads persisted processedPhotoIds from UserDefaults, fetches all photos sorted by date (newest first), and skips to the first unprocessed photo. Loads persisted filter mode and applies screenshot predicate if active.
User Swipe: ContentView detects a DragGesture, determines swipe direction based on offset, and calls the appropriate PhotoViewModel method (processCurrentPhoto, queuePhotoForDeletion, performOCR). On index change, PHCachingImageManager prefetches the next 3 images for smoother scrolling.
OCR Processing: The view model loads the full-resolution image on a background thread, runs VNRecognizeTextRequest, and appends a TextEntry (with extracted text and timestamp) to the recognizedTexts array.
Delete Queue: Photos swiped left or up are added to deleteQueue. The user can review, select/deselect, preview, and batch-delete via PHAssetChangeRequest.deleteAssets().
Persistence: After each photo is processed, its localIdentifier is added to a Set<String> and serialized to UserDefaults. Recognized OCR texts are persisted as JSON, and the delete queue is persisted via asset localIdentifiers. On next launch, already-processed photos are automatically skipped and the delete queue and OCR texts are restored.

External Integrations

Service	Purpose	Documentation
Apple Photos (PhotoKit)	Read/delete photos from user library	PhotoKit Docs
Apple Vision	On-device OCR text recognition	Vision Docs
UserDefaults	Lightweight key-value persistence	UserDefaults Docs

Key Architectural Decisions

MVVM with a Single ViewModel

Context: The app has one primary screen with multiple interaction modes (swiping, OCR results, delete queue).
Decision: A single PhotoViewModel manages all state, presented through sheet-based sub-views.
Rationale: The app's scope is focused enough that splitting into multiple view models would add unnecessary complexity. One ObservableObject keeps state synchronized across all views.

On-Device Processing Only

Context: The app handles personal photos and extracted text, which are sensitive data.
Decision: All OCR and image processing happens locally using Apple's native frameworks. No network calls, no cloud APIs.
Rationale: Maximizes user privacy, eliminates latency, and works offline. Apple's Vision framework provides sufficient OCR accuracy.

UserDefaults for Persistence

Context: The app needs to remember which photos have already been processed across sessions, along with OCR texts and the delete queue.
Decision: Store processed photo IDs as a serialized Set, recognized texts as JSON-encoded Codable structs, and delete queue as an array of asset localIdentifiers — all in UserDefaults.
Rationale: The data is simple enough for UserDefaults and doesn't warrant a full database like Core Data or SQLite. JSON encoding for TextEntry structs keeps the data structured and portable.

Modular File Architecture

Context: The original ContentView was 1,561 lines handling UI, gestures, navigation, sheets, and utility functions.
Decision: Extract focused files — view components (PhotoCard, FullScreenPhotoView, InstructionBar, PhotoPermissionView, DeleteQueueView, OCRNotesView, PhotoThumbnail, PhotoPreview) and utilities (SwipeDirection, ButtonPress, Collection+Safe, ViewControllerUtils).
Rationale: Xcode 16+ with PBXFileSystemSynchronizedRootGroup auto-discovers new .swift files, making extraction zero-friction. Each file has a single responsibility and is independently testable.

Swipe Gesture as Primary Interaction

Context: Users need to quickly triage large photo libraries.
Decision: Four-direction swipe gestures map to four distinct actions (keep, delete, OCR+keep, OCR+delete).
Rationale: Inspired by dating app UX (Tinder-style swiping), this provides fast, one-handed operation. Users can process hundreds of photos quickly without navigating menus.

Back to All Projects