Experiments: Understand #label samples impact
|
Data pool: 0 label sample |
Data pool: 1 label samples |
Data pool: 2 label samples |
Data pool: 5 label samples |
Contract |
|
|
|
|
Receipt |
|
|
|
|
10-Q SEC |
|
|
|
|
Acord Form |
|
|
|
|
Mortgage Application |
|
|
|
|
Even more label samples….
|
Data pool: 0 label sample |
Data pool: 1 label sample |
Data pool: 5 label samples |
Data pool: 40 label samples |
Data pool: 100 label samples |
Invoice |
|
|
|
|
|
Experiments: Understand labeling quality impact
|
Good Data Sample |
Bad Data Sample |
Contract |
|
|
Receipt |
|
|
10-Q SEC |
|
|
Acord Form |
|
|
Mortgage Application |
|
|
Experiments: Understand the embedding vector normalized distance impact
|
1-Norm |
2-Norm |
Contract |
|
|
Receipt |
|
|
10-Q SEC |
|
|
Acord Form |
|
|
Mortgage Application |
|
|
Experiments: Understand the LLM chunk size impact
* to support 64k, 128k experiments we need to collect/synthesize data.
|
8k |
16k |
64k |
128k |
no limits |
Financial Report |
|
|
|
|
|
Experiments: Understand the retrieved sample chunk impact
|
NO Retrieved samples |
Retrieve chunk samples for whole document |
Retrieve chunk samples with labels |
Financial Report |
|
|
|
Experiments: Understand the retrieved sample chunk size impact
|
256 |
512 |
2k |
4k |
8k |
Contract |
|
|
|
|
|
Financial Report |
|
|
|
|
|