Reason for enumerated file names instead of actual file names #745
Replies: 2 comments 1 reply
-
The file names are attributes of the index already. The chunks are simple appending a number to the end of the original file name that was uploaded. However, many attributes are added to the search index already. See the search index schema at https://github.com/microsoft/PubSec-Info-Assistant/blob/main/azure_search/create_vector_index.json |
Beta Was this translation helpful? Give feedback.
-
Hi @dayland Thanks for your replies! We have done additional testing and would like to follow up on this topic. A test was performed to load 2 PDF files to accelerator, among other PDF files already uploaded. File 1:
File 2:
Prompt used:
File 1:
File 2:
Additional root cause analysis performed: Number of sources was increased from 20 to 50 (maximum). In this case dummy example file started to show up in Spported contents, but at the very end of the sources list. Root cause hypothesis: When searching for optimal chunk during search, filename seems to play a minor role. More priority in search indexing is given to the file contents than file name. Solution ideas:
We would appreciate your feedback on the following items:
Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hello Microsoft team,
A general question:
Issue observed now:
Currently in our deployment of the accelerator filenames are supplied to LLM as File0 File1, etc. However, file name usually contains useful information / metadata that should be part of indexing, for example year of the document.
I am wondering if you see any issues or risks with teams modifying the chunk to include the file name as well? There could be other considerations Im overlooking
Thanks in advance for your replies
Beta Was this translation helpful? Give feedback.
All reactions