Toolkit-Shadows · report

IKEA MPP uplift — research notes

Reference for whoever picks this up. Captures the working pipeline, the findings that matter, and the questions still open. Brief: RFP/brief.md. API references: photoroom-shadows.md, photoroom-relight.md.


Pipelines

Two end-to-end workflows running today over the 22 IKEA test SKUs. Both share input reframing and the soft-light composite stage; they differ in how the shadow + lighting get produced.

Workflow A — Gemini-as-scene + composite. Pad inputs → Gemini generates lit/shadowed render → soft-light composite over a cutout to restore subject detail.

Workflow B — Photoroom shadow + Gemini relight + composite. Pad inputs → Photoroom generates a clean directional shadow → Gemini relights the subject only with the shadow preserved → soft-light composite over a cutout.

Workflow B trades one extra API call for two improvements: the cast shadow comes from Photoroom (deterministic, brief-compliant geometry, never clips edges) and the lighting strength is a controllable knob (subtle / standard / strong prompts), independent from the shadow.

1. Reframe inputs to brief spec

scripts/pad_inputs.pyIKEA_PHOTOS_padded/.

Per file: detect subject bbox via luminance threshold, scale longer dim to ≤3200 px, center in a 4000×4000 white canvas, preserve embedded ICC profile. 22/22 brief-compliant by construction (subject ≤ 3200×3200, margin ≥ 400 px on every side). Verified by scripts/test_brief.py.

This step does most of the work for downstream framing. Gemini in particular stops over-filling once it sees brief-framed inputs.

2. Shadow generation — two parallel paths

Path A — Photoroom shadow. src/photoroom.mjs + scripts/batch.mjs. Locked params: ai.auto-with-overrides, direction=behindRight, subjectPose=upright, padding=0.10. Four softness/intensity/spread variants (default, subtle, shorter, softest) run across all 22 SKUs at 4000×4000 JPEG. Each variant has its own config.json.

Path B — Gemini 3 Pro Image. src/gemini.mjs + scripts/batch-gemini.mjs. gemini-3-pro-image-preview, aspectRatio=1:1, imageSize=4K (4096²), temperature 0.0. Three samples per SKU. Each run dir snapshots the prompt, reference image, and full config.

Photoroom is reliable and brief-compliant on margins. Gemini gives directional control Photoroom can't expose (front-left key light, soft shadow falling right-and-back as a single instruction), at the cost of resampling subject pixels — products lose pixel-exact fidelity.

3a. Workflow B — Gemini relight (subject only, shadow preserved)

src/gemini.mjs (RELIGHT_VARIANTS) + scripts/relight-gemini.mjs. Reads from outputs/photoroom/shadows/<variant>/, sends each Photoroom-shadowed image to Gemini with a relight prompt that explicitly tells the model to leave the cast shadow and white background untouched. Outputs to outputs/gemini-relit/<ts>_<relight-variant>/<sku>/<sku>.jpg.

Three lighting-intensity variants, all on the same 22 SKUs, same temperature (0.0), same imageSize=4K:

Variant Prompt characteristic
subtle Almost flat. Gentle hint of front-left direction, high ambient fill.
standard Soft front-left key, mild ambient — the "default" relight.
strong Pronounced directional key, low ambient, deeper shading on right-facing surfaces.

Each run dir contains its own prompt.txt and config.json (model, temperature, aspect ratio, image size, source shadow variant, full prompt text, SDK + Node versions, literal CLI argv). Runs are reproducible by name.

3b. Composite — fuse lighting with original detail

src/composite.mjs exports two composite functions:

Soft-light composite (softLightComposite + scripts/composite-run.mjs). Used for Workflow A. Blends a cutout over the Gemini scene render. Where the cutout is opaque, original detail comes through; where transparent, Gemini's pixels (and its cast shadow) pass through unchanged.

Preserve-shadow composite (preserveShadowComposite + scripts/composite-relight.mjs). Used for Workflow B. Necessary because Gemini doesn't actually do selective edits — it re-renders the full frame, so even with a "do not modify the cast shadow" instruction in the relight prompt, the model produces a slightly different shadow than the input. To preserve the Photoroom shadow exactly:

  1. Soft-light blend cutout (original detail) over relit (Gemini lighting) → relit-with-detail.
  2. Mask relit-with-detail to the cutout alpha → subject-only (transparent everywhere else).
  3. Composite subject-only over the Photoroom shadow output.

Result: subject region carries Gemini's relight + original detail; everything outside the subject (shadow, white background) is untouched Photoroom output. Both Workflow B composite modes are produced for comparison — softlight_gemini-relit_<run>... (drift-prone) and preserve-shadow_<run>... (shadow guaranteed intact).

Cutouts come from one of two sources, both stored at 4000×4000 PNG with alpha:

Source Path Speed Cost at scale
BiRefNet (PyTorch + MPS) cutouts/ ~1.7 s/image Free, fully local
Photoroom removeBackground=true cutouts_photoroom/ network round-trip Paid per call

Photoroom cutouts are visually cleaner on this catalog — its segmenter is tuned for studio product photography on white, where BiRefNet's general-matting strengths don't add value. For 10k–100k volume per the brief's handover model, the cost question matters; BiRefNet matches "good enough" on most SKUs but loses on the hard cases.


Findings

Photoroom shadow parameter selection

Initial exploration before locking the shadow batch's parameters lives in:

  • outputs/photoroom/sweeps/sweep_PE1016720/ — 3×3×3 grid (softness × intensity × spread) on one SKU, 27 variants, sandbox key. Established the rough ranges where shadows look brief-aligned (soft and subtle, not crushing the underside of the product).
  • outputs/photoroom/sweeps/sweep_PE1003461/ — same grid, second SKU, used to check the picks generalize.
  • outputs/photoroom/sweeps/sweep2_PE{1016720,979531,995758}/ — tighter 3×3×2 grid across three SKUs spanning easy/medium/hard shadow-shape cases (chair, sofa, lamp). 18 variants each. Confirmed picks before committing.

Locked picks (in scripts/batch.mjs):

Param Value Why
direction behindRight Brief mandates light from front-left, so shadow falls back-and-right.
subjectPose upright Default for the test set (no flatlay-only SKUs in the 22). Per-SKU override would be needed for flat-lying products like KAJPLATS bulbs (brief flags this).
softness 0.6 Mid-range; sharper than 0.85 (which goes diffuse-blob) but softer than 0.4 (which produces hard penumbras).
intensity 0.55 Visible but not heavy. ≥0.7 starts crushing readability of the contact area.
spread long Brief explicitly says "shadow needs room"; long matches that. medium shortens too much for the seated/standing furniture cases.

Three production variants run alongside the locked default, in case category-specific tuning becomes necessary downstream:

Slug softness intensity spread When to use
default 0.6 0.55 long Standard catalog look.
subtle 0.85 0.40 long Lighter; products that read busy with a stronger shadow.
shorter 0.6 0.55 medium Smaller / flat-lying products where a long shadow would dominate.
softest 0.95 0.35 long Almost-no-shadow look; useful for proposals where the brief asks for minimal shadow on certain SKUs.

Photoroom relight is mislabeled

Verified by the pipeline_matrix/ runs (3 SKUs × 6 chained variants). The lighting.mode endpoint performs subject segmentation, applies a tone curve to the cutout, and discards the rest of the scene — including any cast shadows. It is not a photographic relight.

Implications: - Chaining relight before shadow contributes nothing — the shadow stage re-segments and discards relight's tone change. - Chaining shadow before relight strips the shadow. - The mode toggle (ai.auto vs ai.preserve-hue-and-saturation) only varies the tone curve. - The one useful application is brief deliverable #2 (PNG, no shadow, on white) — relight with background.color=FFFFFF produces it in a single call.

Real catalog-cohesion relighting needs a different model. Candidates: IC-Light (local, free, fits the brief's handover model), Freepik AI relight (RFP-preferred provider), or a custom diffusion pass.

Photoroom's padding parameter

padding = fraction of the output frame that becomes margin per side. For the brief's 400 px margin in a 4000 frame: padding = 0.10. Earlier runs used 0.15 (= 600 px margin) — over-margined but brief-compliant.

Photoroom does not preserve input framing. It auto-detects the subject and reframes to its own (subject + 2·margin = frame) target. To get a Photoroom cutout that aligns with our reframed input (IKEA_PHOTOS_padded/), call with padding=0.10 so its target subject size is the same 3200 px we used.

Caveat: for SKUs where pad_inputs.py left subjects below 3200 px (already small in input), Photoroom upscales them on the cutout call and the alignment with the padded input drifts. BiRefNet preserves position exactly, so the BiRefNet composite is more reliable on those cases.

Gemini overfills the frame without input pre-framing

Initial Gemini batch (raw IKEA inputs): subjects at ~84% of the 4096 frame, soft shadows clipping the right and bottom edges. The model matches the reference image's framing (~86%) more than the input's framing.

After pad_inputs.py reframes to brief spec, subject framing is in tolerance on every sample — the model now matches the input's framing. Soft shadow occasionally still clips an edge in the 4096 output; resolved by a lanczos resize 4096→4000 plus (optionally) tightening the reframe target from 3200 to ~2800.

Gemini's imageSize is fixed-step

@google/genai@1.50.1. Accepts "512" | "1K" | "2K" | "4K" — uppercase K required. 4K = 4096 px on the longest side. Arbitrary pixel sizes are not available, so brief's exact 4000 must be reached by post-resize on Gemini outputs.


Outputs

Stage Path Format
Originals IKEA_PHOTOS/ varies
Reframed inputs IKEA_PHOTOS_padded/ 4000×4000 JPEG, ICC preserved
BiRefNet cutouts cutouts/ 4000×4000 PNG with alpha
Photoroom cutouts cutouts_photoroom/ 4000×4000 PNG with alpha
Photoroom shadows (4 variants) outputs/photoroom/shadows/{default,subtle,shorter,softest}/ 4000×4000 JPEG
Photoroom relight (cleanup, 2 modes) outputs/photoroom/relight/{preserve,auto}/ 2000×2000 PNG
Gemini renders outputs/gemini/<run-id>/<sku>/<sku>_<n>.jpg 4096×4096 JPEG
Gemini relit (Workflow B, 3 variants) outputs/gemini-relit/<ts>_{subtle,standard,strong}/<sku>/<sku>.jpg 4096×4096 JPEG
Soft-light composites (both workflows) outputs/composites/softlight_<base>_<run-id>_s<n>_op<x>_<cutouts-source>/ 4096×4096 JPEG

Every output dir has a config.json capturing the run parameters.


Decisions made

  • Padding 0.10 on Photoroom calls matches the brief's 400 px margin exactly. Earlier 0.15 gave 600 px and was over-margined.
  • ai.preserve-hue-and-saturation over ai.auto for relight cleanup — texture/color fidelity is brief-critical and auto may shift hue.
  • Photoroom relight is not a pipeline relight stage. Used only for the no-shadow-on-white deliverable.
  • Gemini at temperature 0.0 for repeatability across the catalog. Higher temperatures produced visibly inconsistent shadows and framing across products in the same batch.
  • Reframe inputs before Gemini, not after. Trying to fight Gemini's framing tendencies in the prompt was unreliable; reframing the input is deterministic.
  • Workflow B exists alongside Workflow A. Photoroom shadow + Gemini relight gives deterministic shadow geometry plus controllable lighting intensity — useful when the brief's "shadow needs room" is non-negotiable and Workflow A's edge-clipping risk is unacceptable. Cost is one extra Gemini call per SKU.
/ SKU · / stage · Esc close