THD Internal Training Data

v3.0 - Parquet Analysis Dashboard
Updated: Apr 15, 2026
Overview
T2I (224K)
IE (41K)
MultiRef (4K)
Training Flow
24 Categories
Gap Analysis
Total Training Rows
269K
across 5 parquets
T2I Data
224.7K
50% of training (64 GPUs)
IE Data
41.0K
40% of training (32 GPUs)
MultiRef Data
4,017
10% of training (16 GPUs)
Training GPUs
128
16 nodes x 8 H200
THD:Non-THD
80:20
per-task ratio
Text-to-Image (T2I) 224,707 images
HD video frames + stock photos, SCAP v2 captions, 50% GPU allocation
View Details ➔
HD 1280x720
HD 1280x720
HD 1920x1080
HD 1920x1080
HD 1920x1080
HD Frame
Sampled from 40,530 Home Depot frames | Scenes: wallpaper, pavers, furniture, lawn care, measurement, DIY
Image Editing (IE) 40,964 pairs
Before/After frame pairs with edit instructions, 40% GPU allocation
View Details ➔
EDIT:
EDIT:
EDIT:
Sources: Brightcove 49% | Missions 30% | YouTube 20% | Color Corrected 0.5%
Multi-Reference (MultiRef) 4,017 triplets BOTTLENECK
Scene + Tool references ➔ Composite target, 10% GPU allocation
View Details ➔
Scene
Tool
Composite
EDIT:
Scene
Tool
Composite
EDIT:
Vendor batches P2-P6 | 95.4% are 2-reference | avg caption 2,770 words
Training Data by Task (Total: 269,688 rows)
T2I HD 62K
T2I Stock 162K
IE 41K
MR
T2I Home Depot (62,227 - 23.1%) T2I Stock (162,480 - 60.3%) IE (40,964 - 15.2%) MultiRef (4,017 - 1.5%)
GPU Allocation vs Data Volume Mismatch
GPU Allocation (128 total)
T2I 50% (64)
IE 40% (32)
MR 10% (16)
non-THD
Config: 20260330_lite_thd.yaml
Data Volume (269K rows)
T2I 83.4%
IE 15.2%
MR
MultiRef gets 10% GPUs but only has 1.5% of data - BOTTLENECK
Parquet Files Downloaded to foundry_parquets/
FileTaskRowsSizeKey ColumnsS3 BucketCaption Format
t2i_3_18_2026.parquet T2I 62,227 46.2 MB frame_hash_id, caption, s3_frame_path, width, height, is_home_depot foundry-thd-enterprise-adobe-assets SCAP v2 JSON
stock_3_18_2026.parquet T2I 162,480 120.7 MB strImagehash, caption, s3_frame_path, width, height, query, is_home_depot mldp-image SCAP v2 JSON
ie_3_18_2026.parquet IE 40,964 34.8 MB frame_hash_id, caption, target_image, reference_images, edit_instruction foundry-thd-enterprise-adobe-assets SCAP + Edit Instruction
multiref_3_18_2026.parquet MultiRef 4,017 19.0 MB frame_hash_id, caption, edit_instruction, reference_images, target_image, num_reference, source foundry-thd-enterprise-adobe-assets Rich Caption + Edit Instruction
multiref_subset_02262026.parquet MultiRef-v0 2,543 3.7 MB unique_id, reference_images, target_image, source, edit_instruction foundry-thd-enterprise-adobe-assets After Caption + Edit Instruction
THD vs Non-THD Split (T2I only)
T2I Home Depot Parquet (62,227)
Home Depot 40,530
Non-HD 21,697
65.1% Home Depot, 34.9% non-HD within this parquet
Stock Parquet (162,480)
All Non-Home Depot 162,480
100% non-HD (professional stock photos matched by query)
Training config note: The YAML uses 80:20 THD:non-THD ratio at training time via sampling weights. Effective THD data for T2I = 40,530 (from HD parquet) used at 80% weight. Stock 162K provides the non-THD portion + keyword-matched THD-adjacent content.
Text-to-Image Training Data t2i_3_18_2026.parquet + stock_3_18_2026.parquet
Total T2I Rows
224.7K
62,227 HD + 162,480 stock
Home Depot
40,530
65.1% of HD parquet
Non-HD (in HD parquet)
21,697
34.9% of HD parquet
Stock Images
162.5K
all non-HD, query matched
GPU Allocation
50%
64 of 128 GPUs
Caption Format
SCAP
v2 JSON structure
Resolution Distribution
T2I HD Parquet - Top Resolutions
Orientation: 59,623 landscape | 2,531 portrait | 73 square
Width: min=324, max=2320, mean=1630 | Height: min=270, max=1080, mean=942
Stock Parquet - Resolution Stats
Much higher resolution than HD frames
Width
mean=5,141 | min=1,170 | max=17,718
Height
mean=3,755 | min=1,276 | max=12,239
Stock images are professional photography - significantly higher resolution than video-extracted HD frames. Training config uses DuckDB filter: height >= 1080
Stock Image Categories (by search query) 162,480 images across ~40 categories
Top 20 Stock Queries (THD-related content)
Caption Analysis
SCAP v2 Caption Structure
scene Scene description - keep_prob: 1.0
background Background detail - keep_prob: 0.8
type_open_set Image type - keep_prob: 1.0
lighting_open_set Lighting - keep_prob: 0.8
composition Composition - keep_prob: 0.8
camera Camera angle - keep_prob: 0.8
entities Object entities - keep_prob: 0.8
Caption drop rate: 10% (random null injection for classifier-free guidance)
Caption Length Statistics
HD Parquet
mean=1,815 chars | range: 378 - 4,375
Stock Parquet
mean=1,526 chars | range: 323 - 3,949
HD captions average ~251 words (1,815 chars).
Stock captions slightly shorter at ~212 words (1,526 chars).
Both well above minimum threshold for SCAP quality.
Sample SCAP Caption
T2I HD Parquet - Row #0
{"scene": "A light gray wood-clad twin-gabled residence with white roofs stands among trees, joined by a central glass entry, with a stepped concrete walkway leading from the foreground lawn to the front door and a covered recess on the right wing.", "type_open_set": "architectural photography, exterior, photorealistic", "type_closed_set": "photo", "lighting_open_set": "soft overcast daylight with even facade illumination", "lighting_closed_set": "soft_light", "background": "Dense green forest of mixed deciduous and coniferous trees", ...}
Sample T2I Training Images (Home Depot) 12 random samples from 40,530 HD frames
Image Editing (IE) Training Data ie_3_18_2026.parquet
IE Rows
40,964
single-reference pairs
GPU Allocation
40%
32 of 128 GPUs
Avg Doc Width
1,702
min=324, max=2,336
Avg Doc Height
978
min=270, max=1,080
Edit Instruction
72 ch
mean length per instruction
Caption Len
1,765
mean chars (SCAP)
IE Data Structure
Each IE row contains:
reference_images Source frame (before edit)
target_image Result frame (after edit)
edit_instruction Natural language instruction
caption SCAP v2 target description
frame_s3_path S3 key on foundry-thd bucket
Training config:
system_prompt: "image_to_image"
target SCAP drop prob: 0.5
edit instruction drop prob: 0.0
Resolution Distribution
Width
mean=1,702 (324-2,336)
Height
mean=978 (270-1,080)
Predominantly landscape (16:9) video frames.
Sources: YouTube how-to, Brightcove tutorials, Missions.
Sample Edit Instructions
IE Example 1
Reposition the snowman cutouts so the adult holds two larger ones and the children hold two smaller ones, adjusting their poses.
IE Example 2
Refine the position of the hand holding the phone to be more centered and steady.
IE Example 3
Continue sanding the edge of the wooden slab with the orbital sander.
IE Example 4
Continue drilling the hole in the wooden plank, producing wood shavings.
IE Example 5
Remove the hand from the thermostat.
Sample IE Frame Pairs (Before → After) 8 random pairs from 40,964 IE entries
Training Config
i2i_1024px_singleref_THD_IE.yaml
SQL:
SELECT * FROM read_parquet('s3://adobe-xingtail/foundry/thd/ie_3_18_2026_split_train_val.parquet')
WHERE split = 'train'

Config:
system_prompt: "image_to_image"
target_scap_caption_drop_prob: 0.5
edit_instruction_drop_prob: 0.0
GPU slots: singleref-1024p(2) + singleref-2048p(2)
Multi-Reference Training Data multiref_3_18_2026.parquet + multiref_subset_02262026.parquet
MultiRef Rows
4,017
TRAINING BOTTLENECK
Subset (v0)
2,543
earlier version, all 2-ref
GPU Allocation
10%
16 of 128 GPUs
Avg Caption
2,770
words (very rich)
Edit Instruction
328 ch
mean length
S3 Access
OK
via foundry_aws_gateway
Source Breakdown (Vendor Batches)
MultiRef by Vendor Batch (4,017 total)
P5: 1,356
P6: 1,190
P4: 1,094
P3: 203
P2: 174
Reference Image Count Distribution
Main Parquet (4,017 rows) - num_reference
95.4% are 2-reference. Training config filters: num_reference >= 2 AND <= 4
Subset Parquet (2,543 rows) - Sources
All 2,543 rows are exactly 2-reference. Earlier captioning version.
Sample MultiRef Edit Instructions
MultiRef Example 1
Place the stone lion from Before_Image1 in the scene. Use the person's hands and tape measure from Before_Image2 to measure the lion's head. Set the background to the building and concrete ground from Before_Image1.
MultiRef Example 2
Use the background from Before_Image1 but add wooden planks on the ground. Place the tree from Before_Image1 in the scene. Take the hacksaw from Before_Image2 and position it as if cutting the tree.
MultiRef Example 3
Place the stool from Before_Image1 on the red tarp. Use the mallet from Before_Image2 to repair the stool. Combine the background from Before_Image1 (red tarp) with the tiled ground from Before_Image2.
Sample MultiRef Triplets (Scene + Tool → Composite) 8 random triplets from 4,017 entries
Training Config
i2i_1024px_varied_ref_THD.yaml
SQL:
SELECT * FROM read_parquet('s3://adobe-xingtail/foundry/thd/multiref_3_18_2026_split_train_val.parquet')
WHERE split = 'train' AND num_reference >= 2 AND num_reference <= 4
AND edit_instruction IS NOT NULL

Config:
system_prompt: "image_to_image_multiref"
uses: build_multiref_image_assets (dynamic asset construction)
GPU slots: multiref-1024p(2) + multiref-2048p(2)

Filtering effect:
After filter (num_ref 2-4 + edit_instruction not null): ~3,952 rows usable
This is the smallest dataset by far - 66x smaller than T2I
Training Data Pipeline Parquet -> Data Config YAML -> GPU Layout
Architecture: GenRender6 (GR6) DiT MoE model training on 16 nodes x 8 H200 GPUs = 128 GPUs.
Each task type has dedicated GPU slots with separate data configs. The training YAML (20260330_lite_thd.yaml) orchestrates the data flow.
80:20 THD:non-THD ratio is enforced per task via sampling weights.
S3 Parquets
t2i_3_18_2026.parquet
HD video frames + non-HD
Bucket: foundry-thd-enterprise-adobe-assets
62,227 rows
stock_3_18_2026.parquet
Professional stock photography
Bucket: mldp-image
162,480 rows
ie_3_18_2026.parquet
IE frame pairs (before/after)
Bucket: foundry-thd-enterprise-adobe-assets
40,964 rows
multiref_3_18_2026.parquet
Multi-ref triplets (2-8 refs)
Bucket: foundry-thd-enterprise-adobe-assets
4,017 rows
Data Config YAML
img_1024p.yaml (T2I)
DuckDB SQL on S3 parquet
Filter: width>height, height>=1080
SCAP keep_prob: [1.0,0.8,1.0,0.8,0.8,0.8,0.8]
Caption drop: 10%
stock_1024p.yaml (T2I)
Stock images as non-THD portion
Same SCAP structure
Mixed with HD data at 80:20
i2i_singleref_THD_IE.yaml
IE frame pairs
system_prompt: "image_to_image"
target SCAP drop: 0.5
edit drop: 0.0
i2i_varied_ref_THD.yaml
MultiRef triplets
system_prompt: "image_to_image_multiref"
Filter: num_ref 2-4
build_multiref_image_assets
GPU Layout (128 total)
T2I THD (8 GPU slots)
512p x 2 GPUs
1024p x 2 GPUs
2048p x 4 GPUs
50% of training
T2I Non-THD (8 GPU slots)
Mirrors THD layout
Stock data fills this portion
20% fill ratio
IE SingleRef (4 GPU slots)
1024p x 2 GPUs
2048p x 2 GPUs
40% of training
MultiRef (4 GPU slots)
1024p x 2 GPUs
2048p x 2 GPUs
10% - BOTTLENECK
Inference / Eval Datasets
NameTypeResolutionEval NConfig Path
i2i_1024p_multiref_thdMultiRef102412foundry_home_depot_eval_set.json
i2i_1024p_multiref_thd_sampledMultiRef1024125foundry_home_depot_inference_set.json
i2i_1024p_product_swap_thdProductSwap102410thd_multiref_x2x_ready_gen6.json
i2i_1024p_singleref_ie_thdIE102410thd_ie_eval_conversational_gen6.json
t2i_thd_mixed_benchmarkT2I1024/204845thd_t2i_mixed_benchmark_conversational_gen6.json
*_rewrite variants (6 sets)Rewrite1024/204810-45prompt-rewrite-03272026/gen6/*.json
Data Cycling Analysis
Epochs per 1K training steps (estimated)
TaskData SizeGPUsEst. Epochs/1K stepsOverfitting Risk
T2I 224,707 64 ~0.3 Low
IE 40,964 32 ~0.8 Medium
MultiRef 4,017 16 ~4.0 HIGH
Key Insight
MultiRef data cycles ~13x faster than T2I data. At 1K training steps, MultiRef sees each sample ~4 times while T2I barely completes 0.3 epochs.

This severe imbalance means:
- MultiRef will overfit first
- Adding more MultiRef data has highest marginal value
- Even 1,000 new MultiRef triplets would reduce cycling by 25%
24 Target Categories - Progress Tracker Click headers to sort
Covered
0
have product bank data
Missing
0
no product bank data
Partial
0
internal data only
Avg Readiness
0%
across all 24 categories
# Category Group Products Spin Frames Lifestyle Triplets Internal Data Readiness Status
Gap Analysis & Priority Actions
Critical Gaps
3
blocking training quality
High Priority
4
significant improvement
Resolved
1
IAM access via gateway
RESOLVED: IAM GetObject access for foundry-thd-enterprise-adobe-assets bucket now works via foundry_aws_gateway library + PLUTO_AUTH_TOKEN. No IAM ticket needed.

Critical (P0)

High Priority (P1)