If you look inside an index directory (ex: ~/index/mozilla-central
on an AWS
server), these are the files and sub-trees you may find, and who put them there.
Simpler configurations with fewer platforms will be simpler.
Note that many of the files referenced here are from the https://github.com/mozsearch/mozsearch-mozilla configuration repository, most specifically referencing the "mozilla-central" tree in its config1.json.
Directories:
analysis
: Directory hierarchy that directly corresponds to the paths exposed by the searchfox UI as a single unified namespace where objdir files are folded into__GENERATED__
. Each file has the name of its corresponding source/generated file and contains JSON analysis data. It is populated by the indexing process. For mozilla-central and similar builds, some of the indexing (ex: C++) occurs on the taskcluster build machine and is inherently per-platform, withprocess-gecko-analysis.sh
using the per-platformprocess-tc-artifacts.sh
to process the per-platform data and thencollapse-generated-files.sh
andmerge-analyses.rs
to merge the per-platform data into merged analysis files.description
: Searchfox Directory hierarchy with per-file text files that contain extracted summaries from files extracted bydescribe.rs
using heuristics that usually involve extracting the contents of comments found in the file. These summaries are produced byoutput-file.rs
as a byproduct of writing the HTML for the files to disk during theoutput.sh
stage ofmkindex.sh
.dir
: HTML files for the directory listings for each file. It forms a parallel hierarchy to thefile
directory. This is necessary because the directory HTML files are placed atindex.html
inside each directory which would collide if there is a source file with that name. The nginx config uses a lookup sequence to make this work. Produced byoutput-dir.js
.file
: HTML files for each source/generated file. Produced byoutput-file.rs
from the source/generated file itself, the corresponding analysis file found underanalysis/
, thejumps
all-files aggregate file, thederived-per-file-info.json
all-files aggregate file, the corresponding per-file aggregate file found underper-file-info/
.gecko-blame
: git repository containing pre-computed per-file blame/annotate info built bybuild-blame.rs
. The directory name is specific per repository configuration.gecko-dev
: Source git repository (using git-cinnabar). The directory name is specific per repository configuration. The indexed revision will be the currently checked out working directory.objdir
: The conceptual source directory for things found under__GENERATED__
in Searchfox's unified namespace. Whenoutput-file.rs
is generating an HTML file for__GENERATED__/foo
,foo
will be found under this directory.objdir-*
: Leftover per-platform files fromprocess-gecko-analysis.sh
, probably just rust "save-analysis" files. Most files are destructively consumed or moved during the process. These leftovers are retained for debugging purposes.detailed-per-file-info
: Directory hierarchy likeanalysis
for storing detailed per-file information in a JSON file that's too large to put in the single aggregateconcise-per-file-info.json
file or not useful for summary purposes.templates
: Holds the result ofoutput-template.js
which buildssearch.html
using the hardcoded HTML generation logic inoutput-lib.js
andoutput.js
to build the searchfox UI scaffolding and save it so thatrouter.py
can inline the JSON search results from a query.
Files:
all-dirs
:repo-dirs
andobjdir-dirs
concatenated together byfind-objdir-files.sh
after derivingobjdir-dirs
. Exists for the benefit of crossref for now.all-files
:repo-files
andobjdir-files
concatenated together byfind-objdir-files.sh
and shuffled after derivingobjdir-files
. This exists so thatoutput.sh
can use--pipe-part
which needs a real file on disk rather than a pipe from dynamicallycat
-ing the source files. Also now the crossref script consumes this file.analysis-dirs-*.list
:find -type d
for each per-platform analysis directory. Produced by the per-platformprocess-tc-artifacts.sh
script and concatenated into the unified list byprocess-gecko-analysis.sh
. The paths are all relative to the per-platform directory and so don't actually include the platform-specific path.analysis-dirs.list
: A concatenated list of the above per-platform lists created byprocess-gecko-analysis.sh
and unique-ified, used to ensure the appropriate directories are created in theanalysis/
tree for the script's invocation ofmerge-analyses
with stdout redirection (which can't mkdir -p itself).analysis-files-*.list
:find -type f
for each per-platform analysis directory. Produced by the per-platformprocess-tc-artifacts.sh
script and concatenated into the unified list byprocess-gecko-analysis.sh
. The paths are all relative to the per-platform directory and so don't actually include the platform-specific path.analysis-files.list
: A concatenated list of the above per-platform lists created byprocess-gecko-analysis.sh
and unique-ified and then passed to aparallel
invocation ofmerge-analyses
with stdout redirection. Note that generated files are handled separately and usegenerated-files.list
.android-armv7.*
: A bunch of per-platform files downloaded byfetch-tc-artifacts.sh
that we retain for debuggingprocess-gecko-analysis.sh
.bugzilla-components.json
: Downloaded byfetch-tc-artifacts.sh
and integrated into per-file information byderive-per-file-info.rs
when invoked bycrossref.sh
.crossref
: The big database produced bycrossref.rs
that has all the per-symbol information that gets returned by (symbol) search results byrouter.py
after first mapping from pretty human names to machine symbol names usingidentifiers
. See crossref.md for more info.concise-per-file-info.json
: Produced byderive-per-file-info.rs
when invoked bycrossref.sh
.downloads.lst
: List of curl download commands accumulated byfetch-tc-artifacts.sh
so that it can run them in parallel.generated-files-*.list
:find -type f
for each per-platform generated-files directory. Produced by the per-platformprocess-tc-artifacts.sh
script and concatenated into the unified list byprocess-gecko-analysis.sh
. The paths are all relative to the per-platform directory and so don't actually include the platform-specific path.generated-files.list
: A concatenated list of the above per-platform lists created byprocess-gecko-analysis.sh
and unique-ified and then passed to aparallel
invocation ofcollapse-generated-files.sh
which takes on responsibility for runningmkdir -p
directly and so doesn't need a-dirs
variant of this list.help.html
: The file you see at the root of the searchfox UI that is basically the HTML contents of the config repo'shelp.html
with all the searchfox UI scaffolding wrapped around it byoutput-help.js
which usesoutput-lib.js
andoutput.js
just like is done fortemplates/search.html
.identifiers
: A text file mapping pretty human-readable symbol names to machine-readable (AKA mangled C++) symbol names. Generated bycrossref.rs
and part ofrouter.py
's search logic. See crossref.md for more info.idl-files
: A list of all the '.idl' files in the tree produced byfind-repo-files.py
found and that the per-configrepo_files.py
didn't veto. Used byidl-analyze.sh
to know what files to process when invoked bymkindex.sh
.ipdl-files
: A list of all the '.ipdl' files in the tree produced byfind-repo-files.py
found and that the per-configrepo_files.py
didn't veto. Used byipdl-analyze.sh
to know what files to process when invoked bymkindex.sh
.ipdl-includes
: A list of all the '.ipdlh' files in the tree produced byfind-repo-files.py
found and that the per-configrepo_files.py
didn't veto. Used byidl-analyze.sh
to know what files to process when invoked bymkindex.sh
.js-files
: A list of all the '.js' files in the tree produced byfind-repo-files.py
found and that the per-configrepo_files.py
didn't veto. Used byjs-analyze.sh
to know what files to process when invoked bymkindex.sh
.jumps
: Lookup table that maps from machine symbol names to their canonical definition point. Produced bycrossref.rs
and consumed byoutput-file.rs
so that the context menus can in the HTML files can generate definition links without having to involve any server queries. See crossref.md for more info.linux64.*
: A bunch of per-platform files downloaded byfetch-tc-artifacts.sh
that we retain for debuggingprocess-gecko-analysis.sh
.livegrep.idx
: This is the output file generated by thecodesearch
invocation inbuild-codesearch.py
and contains the full-text index that thecodesearch
tool uses to do full-text search. It gets loaded by thecodesearch
invocation on the web-server instance, inrouter/codesearch.py
.macosx64.*
: A bunch of per-platform files downloaded byfetch-tc-artifacts.sh
that we retain for debuggingprocess-gecko-analysis.sh
.macosx64-aarch64.*
: A bunch of per-platform files downloaded byfetch-tc-artifacts.sh
that we retain for debuggingprocess-gecko-analysis.sh
.objdir-dirs
: A list of the directories found underobjdir/
for scripting and indexing purposes using in a bunch of places. This is necessary because source files are exposed via the UI at/PATH
and come fromgecko-dev/PATH
and generated files are exposed at/__GENERATED__/PATH
and come fromobjdir/PATH
. Produced byfind-objdir-files.sh
which is invoked bymkindex.sh
early in the indexing process.objdir-files
: File variant ofobjdir-dirs
, see above for more info.repo-dirs
: A list of the directories that correspond to source files tracked by revision control produced byfind-repo-files.py
which actually runsgit ls-files
so if you don't check your files into git they won't show up. As withobjdir-dirs
, this needs to exist because of the split between source files and generated files.repo-files
: File variant ofrepo-dirs
, see above for more info.target.json
: Downloaded byresolve-gecko-revs.sh
as part of the process of identifying the most recent successful searchfox indexing jobs run on taskcluster for the given channel/tree. Taskclusters' routes mechanism means that the most recent job will be exposed via both its specific revision and "latest", so if we fetch the "latest" version, we'll get a real revision in this JSON file and can then use it to make sure all other fetched results come from the exact same revision.test-info-all-tests.json
: Downloaded byfetch-tc-artifacts.sh
and integrated into per-file information byderive-per-file-info.rs
when invoked bycrossref.sh
.win64.*
: A bunch of per-platform files downloaded byfetch-tc-artifacts.sh
that we retain for debuggingprocess-gecko-analysis.sh
.wpt-metadata-summary.json
: Downloaded byfetch-tc-artifacts.sh
and integrated into per-file information byderive-per-file-info.rs
when invoked bycrossref.sh
.