Crawling data#

If we want to share a database of our own, either as an official release, or as a valuable product that will be used for the community we have to use the --crawl_my_data command. This command updates and reindexes all the databases that are pending under certain path.

Per default it loads the whole users’ projectdata directory /path_to/projectdata/user-<username>/:

$ freva --crawl_my_data

But if we already have a lot of data it is convenient (and much faster) to take just a sub-directory, for example

$ freva --crawl_my_data --path=/path_to/projectdata/user-<username>/observations/
Please wait while the system is crawling your data
Sending last 3 entries and 3 entries to latest core

success
0
Finished.
Crawling took 2.5140349865 seconds

The added datasets are now searchable under the project=user-<username> facet either by –databrowser command line or via website’s databrowser.

Note

be aware of the correct path!: in order to crawl your own dataset you have to previously set your folder paths correctly, for that you will need to consult your project’s structure in that regard.