THD Training Data - Internal Dashboard (Apr 15, 2026)

Total Training Rows

269K

across 5 parquets

T2I Data

224.7K

50% of training (64 GPUs)

IE Data

41.0K

40% of training (32 GPUs)

MultiRef Data

4,017

10% of training (16 GPUs)

Training GPUs

128

16 nodes x 8 H200

THD:Non-THD

80:20

per-task ratio

Text-to-Image (T2I) 224,707 images

HD video frames + stock photos, SCAP v2 captions, 50% GPU allocation

View Details ➔

HD 1280x720

HD 1920x1080

HD Frame

Sampled from 40,530 Home Depot frames | Scenes: wallpaper, pavers, furniture, lawn care, measurement, DIY

Image Editing (IE) 40,964 pairs

Before/After frame pairs with edit instructions, 40% GPU allocation

View Details ➔

➔

EDIT:

➔

EDIT:

➔

EDIT:

Sources: Brightcove 49% | Missions 30% | YouTube 20% | Color Corrected 0.5%

Multi-Reference (MultiRef) 4,017 triplets BOTTLENECK

Scene + Tool references ➔ Composite target, 10% GPU allocation

View Details ➔

Scene

Tool

➔

Composite

EDIT:

Scene

Tool

➔

Composite

EDIT:

Vendor batches P2-P6 | 95.4% are 2-reference | avg caption 2,770 words

Training Data by Task (Total: 269,688 rows)

T2I HD 62K

T2I Stock 162K

IE 41K

MR

T2I Home Depot (62,227 - 23.1%) T2I Stock (162,480 - 60.3%) IE (40,964 - 15.2%) MultiRef (4,017 - 1.5%)

GPU Allocation vs Data Volume Mismatch

GPU Allocation (128 total)

T2I 50% (64)

IE 40% (32)

MR 10% (16)

non-THD

Config: 20260330_lite_thd.yaml

Data Volume (269K rows)

T2I 83.4%

IE 15.2%

MR

MultiRef gets 10% GPUs but only has 1.5% of data - BOTTLENECK

Parquet Files Downloaded to foundry_parquets/

File	Task	Rows	Size	Key Columns	S3 Bucket	Caption Format
t2i_3_18_2026.parquet	T2I	62,227	46.2 MB	frame_hash_id, caption, s3_frame_path, width, height, is_home_depot	foundry-thd-enterprise-adobe-assets	SCAP v2 JSON
stock_3_18_2026.parquet	T2I	162,480	120.7 MB	strImagehash, caption, s3_frame_path, width, height, query, is_home_depot	mldp-image	SCAP v2 JSON
ie_3_18_2026.parquet	IE	40,964	34.8 MB	frame_hash_id, caption, target_image, reference_images, edit_instruction	foundry-thd-enterprise-adobe-assets	SCAP + Edit Instruction
multiref_3_18_2026.parquet	MultiRef	4,017	19.0 MB	frame_hash_id, caption, edit_instruction, reference_images, target_image, num_reference, source	foundry-thd-enterprise-adobe-assets	Rich Caption + Edit Instruction
multiref_subset_02262026.parquet	MultiRef-v0	2,543	3.7 MB	unique_id, reference_images, target_image, source, edit_instruction	foundry-thd-enterprise-adobe-assets	After Caption + Edit Instruction

THD vs Non-THD Split (T2I only)

T2I Home Depot Parquet (62,227)

Home Depot 40,530

Non-HD 21,697

65.1% Home Depot, 34.9% non-HD within this parquet

Stock Parquet (162,480)

All Non-Home Depot 162,480

100% non-HD (professional stock photos matched by query)

Training config note: The YAML uses 80:20 THD:non-THD ratio at training time via sampling weights. Effective THD data for T2I = 40,530 (from HD parquet) used at 80% weight. Stock 162K provides the non-THD portion + keyword-matched THD-adjacent content.

Text-to-Image Training Data t2i_3_18_2026.parquet + stock_3_18_2026.parquet

Total T2I Rows

224.7K

62,227 HD + 162,480 stock

Home Depot

40,530

65.1% of HD parquet

Non-HD (in HD parquet)

21,697

34.9% of HD parquet

Stock Images

162.5K

all non-HD, query matched

GPU Allocation

50%

64 of 128 GPUs

Caption Format

SCAP

v2 JSON structure

Resolution Distribution

T2I HD Parquet - Top Resolutions

Orientation: 59,623 landscape | 2,531 portrait | 73 square

Width: min=324, max=2320, mean=1630 | Height: min=270, max=1080, mean=942

Stock Parquet - Resolution Stats

Much higher resolution than HD frames

Width

mean=5,141 | min=1,170 | max=17,718

Height

mean=3,755 | min=1,276 | max=12,239

Stock images are professional photography - significantly higher resolution than video-extracted HD frames. Training config uses DuckDB filter: height >= 1080

Stock Image Categories (by search query) 162,480 images across ~40 categories

Top 20 Stock Queries (THD-related content)

Caption Analysis

SCAP v2 Caption Structure

scene Scene description - keep_prob: 1.0

background Background detail - keep_prob: 0.8

type_open_set Image type - keep_prob: 1.0

lighting_open_set Lighting - keep_prob: 0.8

composition Composition - keep_prob: 0.8

camera Camera angle - keep_prob: 0.8

entities Object entities - keep_prob: 0.8

Caption drop rate: 10% (random null injection for classifier-free guidance)

Caption Length Statistics

HD Parquet

mean=1,815 chars | range: 378 - 4,375

Stock Parquet

mean=1,526 chars | range: 323 - 3,949

HD captions average ~251 words (1,815 chars).
Stock captions slightly shorter at ~212 words (1,526 chars).
Both well above minimum threshold for SCAP quality.

Sample SCAP Caption

T2I HD Parquet - Row #0

{"scene": "A light gray wood-clad twin-gabled residence with white roofs stands among trees, joined by a central glass entry, with a stepped concrete walkway leading from the foreground lawn to the front door and a covered recess on the right wing.", "type_open_set": "architectural photography, exterior, photorealistic", "type_closed_set": "photo", "lighting_open_set": "soft overcast daylight with even facade illumination", "lighting_closed_set": "soft_light", "background": "Dense green forest of mixed deciduous and coniferous trees", ...}

Sample T2I Training Images (Home Depot) 12 random samples from 40,530 HD frames

Image Editing (IE) Training Data ie_3_18_2026.parquet

IE Rows

40,964

single-reference pairs

GPU Allocation

40%

32 of 128 GPUs

Avg Doc Width

1,702

min=324, max=2,336

Avg Doc Height

978

min=270, max=1,080

Edit Instruction

72 ch

mean length per instruction

Caption Len

1,765

mean chars (SCAP)

IE Data Structure

Each IE row contains:

reference_images Source frame (before edit)

target_image Result frame (after edit)

edit_instruction Natural language instruction

caption SCAP v2 target description

frame_s3_path S3 key on foundry-thd bucket

Training config:

system_prompt: "image_to_image"
target SCAP drop prob: 0.5
edit instruction drop prob: 0.0

Resolution Distribution

Width

mean=1,702 (324-2,336)

Height

mean=978 (270-1,080)

Predominantly landscape (16:9) video frames.
Sources: YouTube how-to, Brightcove tutorials, Missions.

Sample Edit Instructions

IE Example 1

Reposition the snowman cutouts so the adult holds two larger ones and the children hold two smaller ones, adjusting their poses.

IE Example 2

Refine the position of the hand holding the phone to be more centered and steady.

IE Example 3

Continue sanding the edge of the wooden slab with the orbital sander.

IE Example 4

Continue drilling the hole in the wooden plank, producing wood shavings.

IE Example 5

Remove the hand from the thermostat.

Sample IE Frame Pairs (Before → After) 8 random pairs from 40,964 IE entries

Training Config

i2i_1024px_singleref_THD_IE.yaml

SQL:
SELECT * FROM read_parquet('s3://adobe-xingtail/foundry/thd/ie_3_18_2026_split_train_val.parquet')
WHERE split = 'train'

Config:
system_prompt: "image_to_image"
target_scap_caption_drop_prob: 0.5
edit_instruction_drop_prob: 0.0
GPU slots: singleref-1024p(2) + singleref-2048p(2)

Multi-Reference Training Data multiref_3_18_2026.parquet + multiref_subset_02262026.parquet

MultiRef Rows

4,017

TRAINING BOTTLENECK

Subset (v0)

2,543

earlier version, all 2-ref

GPU Allocation

10%

16 of 128 GPUs

Avg Caption

2,770

words (very rich)

Edit Instruction

328 ch

mean length

S3 Access

OK

via foundry_aws_gateway

Source Breakdown (Vendor Batches)

MultiRef by Vendor Batch (4,017 total)

P5: 1,356

P6: 1,190

P4: 1,094

P3: 203

P2: 174

Reference Image Count Distribution

Main Parquet (4,017 rows) - num_reference

95.4% are 2-reference. Training config filters: num_reference >= 2 AND <= 4

Subset Parquet (2,543 rows) - Sources

All 2,543 rows are exactly 2-reference. Earlier captioning version.

Sample MultiRef Edit Instructions

MultiRef Example 1

Place the stone lion from Before_Image1 in the scene. Use the person's hands and tape measure from Before_Image2 to measure the lion's head. Set the background to the building and concrete ground from Before_Image1.

MultiRef Example 2

Use the background from Before_Image1 but add wooden planks on the ground. Place the tree from Before_Image1 in the scene. Take the hacksaw from Before_Image2 and position it as if cutting the tree.

MultiRef Example 3

Place the stool from Before_Image1 on the red tarp. Use the mallet from Before_Image2 to repair the stool. Combine the background from Before_Image1 (red tarp) with the tiled ground from Before_Image2.

Sample MultiRef Triplets (Scene + Tool → Composite) 8 random triplets from 4,017 entries

Training Config

i2i_1024px_varied_ref_THD.yaml

SQL:
SELECT * FROM read_parquet('s3://adobe-xingtail/foundry/thd/multiref_3_18_2026_split_train_val.parquet')
WHERE split = 'train' AND num_reference >= 2 AND num_reference <= 4
AND edit_instruction IS NOT NULL

Config:
system_prompt: "image_to_image_multiref"
uses: build_multiref_image_assets (dynamic asset construction)
GPU slots: multiref-1024p(2) + multiref-2048p(2)

Filtering effect:
After filter (num_ref 2-4 + edit_instruction not null): ~3,952 rows usable
This is the smallest dataset by far - 66x smaller than T2I

Training Data Pipeline Parquet -> Data Config YAML -> GPU Layout

Architecture: GenRender6 (GR6) DiT MoE model training on 16 nodes x 8 H200 GPUs = 128 GPUs.
Each task type has dedicated GPU slots with separate data configs. The training YAML (20260330_lite_thd.yaml) orchestrates the data flow.
80:20 THD:non-THD ratio is enforced per task via sampling weights.

S3 Parquets

t2i_3_18_2026.parquet

HD video frames + non-HD
Bucket: foundry-thd-enterprise-adobe-assets

62,227 rows

stock_3_18_2026.parquet

Professional stock photography
Bucket: mldp-image

162,480 rows

ie_3_18_2026.parquet

IE frame pairs (before/after)
Bucket: foundry-thd-enterprise-adobe-assets

40,964 rows

multiref_3_18_2026.parquet

Multi-ref triplets (2-8 refs)
Bucket: foundry-thd-enterprise-adobe-assets

4,017 rows

➔

Data Config YAML

img_1024p.yaml (T2I)

DuckDB SQL on S3 parquet
Filter: width>height, height>=1080
SCAP keep_prob: [1.0,0.8,1.0,0.8,0.8,0.8,0.8]
Caption drop: 10%

stock_1024p.yaml (T2I)

Stock images as non-THD portion
Same SCAP structure
Mixed with HD data at 80:20

i2i_singleref_THD_IE.yaml

IE frame pairs
system_prompt: "image_to_image"
target SCAP drop: 0.5
edit drop: 0.0

i2i_varied_ref_THD.yaml

MultiRef triplets
system_prompt: "image_to_image_multiref"
Filter: num_ref 2-4
build_multiref_image_assets

➔

GPU Layout (128 total)

T2I THD (8 GPU slots)

512p x 2 GPUs
1024p x 2 GPUs
2048p x 4 GPUs

50% of training

T2I Non-THD (8 GPU slots)

Mirrors THD layout
Stock data fills this portion

20% fill ratio

IE SingleRef (4 GPU slots)

1024p x 2 GPUs
2048p x 2 GPUs

40% of training

MultiRef (4 GPU slots)

1024p x 2 GPUs
2048p x 2 GPUs

10% - BOTTLENECK

Inference / Eval Datasets

Name	Type	Resolution	Eval N	Config Path
i2i_1024p_multiref_thd	MultiRef	1024	12	foundry_home_depot_eval_set.json
i2i_1024p_multiref_thd_sampled	MultiRef	1024	125	foundry_home_depot_inference_set.json
i2i_1024p_product_swap_thd	ProductSwap	1024	10	thd_multiref_x2x_ready_gen6.json
i2i_1024p_singleref_ie_thd	IE	1024	10	thd_ie_eval_conversational_gen6.json
t2i_thd_mixed_benchmark	T2I	1024/2048	45	thd_t2i_mixed_benchmark_conversational_gen6.json
*_rewrite variants (6 sets)	Rewrite	1024/2048	10-45	prompt-rewrite-03272026/gen6/*.json

Data Cycling Analysis

Epochs per 1K training steps (estimated)

Task	Data Size	GPUs	Est. Epochs/1K steps	Overfitting Risk
T2I	224,707	64	~0.3	Low
IE	40,964	32	~0.8	Medium
MultiRef	4,017	16	~4.0	HIGH

Key Insight

MultiRef data cycles ~13x faster than T2I data. At 1K training steps, MultiRef sees each sample ~4 times while T2I barely completes 0.3 epochs.

This severe imbalance means:
- MultiRef will overfit first
- Adding more MultiRef data has highest marginal value
- Even 1,000 new MultiRef triplets would reduce cycling by 25%

24 Target Categories - Progress Tracker Click headers to sort

Covered

0

have product bank data

Missing

0

no product bank data

Partial

0

internal data only

Avg Readiness

0%

across all 24 categories

#	Category	Group	Products	Spin Frames	Lifestyle	Triplets	Internal Data	Readiness	Status

Gap Analysis & Priority Actions

Critical Gaps

3

blocking training quality

High Priority

4

significant improvement

Resolved

1

IAM access via gateway

RESOLVED: IAM GetObject access for foundry-thd-enterprise-adobe-assets bucket now works via foundry_aws_gateway library + PLUTO_AUTH_TOKEN. No IAM ticket needed.

Critical (P0)

🔴
MultiRef data only 4,017 entries - severe training bottleneck Gets 10% GPU allocation but cycles ~4x per 1K steps (vs 0.3x for T2I). Overfitting risk is HIGH.
Adding just 1,000 new triplets would reduce cycling by 25%. Our local 560 4-tuple triplets could help.
P0
🔴
17 of 24 categories have ZERO product bank images Smart Home: Locks, Switches, Indoor/Outdoor/Floodlight Cameras, Assistants, Streaming Devices (7 missing)
Cordless: String Trimmers, Mowers, Drills, Circ Saws, Recip Saws, Nailers, Multi-Tools, Sanders, Ratchets, Inflators (10 missing)
P0
🔴
Stock 162K images are ALL non-Home Depot The stock parquet provides quantity but zero THD-specific product shots. Training relies on 40,530 HD-specific frames for THD T2I learning.
P0

High Priority (P1)

🟠
Scale MultiRef: convert local 560 triplets to multiref format Our 4-tuple entries (product + scene + caption) can be reformatted to match multiref parquet schema. Potential +14% data increase.
P1
🟠
THD website scraper for missing 17 categories Hero images, "Projects & Ideas", customer review photos. Highest-value source for missing category coverage.
P1
🟠
Filter Stock 162K by 24-category keywords Stock has queries like "smart lock" (2,296), "cordless drill" (3,855), "sander" (5,019) - directly map to missing categories.
P1
🟠
Download actual multiref images via gateway for analysis ~7,600 images from multi-ref-vendor-data/Foundary_Homedepot_P*/. Now accessible via foundry_aws_gateway.
P1

THD Internal Training Data

t2i_3_18_2026.parquet

stock_3_18_2026.parquet

ie_3_18_2026.parquet

multiref_3_18_2026.parquet

img_1024p.yaml (T2I)

stock_1024p.yaml (T2I)

i2i_singleref_THD_IE.yaml

i2i_varied_ref_THD.yaml

T2I THD (8 GPU slots)

T2I Non-THD (8 GPU slots)

IE SingleRef (4 GPU slots)

MultiRef (4 GPU slots)

Critical (P0)

High Priority (P1)