Allow getting many documents by id #99

ghost · 2021-04-17T17:45:22Z

ghost
Apr 17, 2021

After reading around I think it's only possible to get one document by it's unique id. It would be nice if we could give an array of ids and get the result back in a single request. Right now I am making a request for each id.

Thanks.

paligiannis · 2021-04-17T18:35:17Z

paligiannis
Apr 17, 2021

I need to address this issue too. Any ideas of implementing that?

0 replies

gmourier · 2021-04-18T08:21:07Z

gmourier
Apr 18, 2021
Maintainer

Hello @san-serif and @paligiannis! Thank you for using MeiliSearch and taking the time to contribute to this project. Currently, we are putting a lot of effort into the release of the new engine but it may be something that we want to address in the future. Can you please tell me more about your needs regarding this feature? Thanks!

0 replies

paligiannis · 2021-04-18T14:10:25Z

paligiannis
Apr 18, 2021

@gmourier It seems like we need something to fetch specific records from the database. Usually, the use case is something like getting ids from database like products[ids]=100,300,400,500. So it's something like filtering specific results and them use them with facets of any other implementation of Meilisearch.

0 replies

ghost · 2021-04-18T17:53:51Z

ghost
Apr 18, 2021

Hey @gmourier! This is kind of low priority for me as MeiliSearch is already working great for me. I want this feature because I want to get some data without directly querying my main database (mostly for backoffice stuff).

0 replies

paligiannis · 2021-04-18T19:47:00Z

paligiannis
Apr 18, 2021

@san-serif How have you implemented such a thing?

0 replies

ghost · 2021-04-18T20:43:16Z

ghost
Apr 18, 2021

@paligiannis

I'm using javascript and I essentially did it like this:

(async () => {
  const client = new MeiliSearch({ host: "https://endpoint" });
  const index = client.index("products");

  const ids = [100,300,400,500];

  const promises = ids.map(id => index.getDocument(id));

  const documents = await Promise.all(promises);

})();

0 replies

gmourier · 2021-04-19T08:20:24Z

gmourier
Apr 19, 2021
Maintainer

Hi! Thank you for your answers.

I mark the issue as a feature request to track it in our backlog. You can also propose this feature as an idea here. It allows people to vote for it and make it more visible in terms of importance.

0 replies

ghost · 2021-04-19T11:04:17Z

ghost
Apr 19, 2021

@MarinPostma Just to clarify I think this is low priority because @gmourier mentioned you guys are working on a new engine, I did not want to give the impression that I think this is super important.

I don't know how to explain my use case better. I want to query data directly from MeiliSearch because I think it would be faster than my database.

0 replies

jiminy-billy-bob · 2021-04-27T16:29:44Z

jiminy-billy-bob
Apr 27, 2021

We solved this by declaring the id field as an attribute for faceting. Then we simply do something like this:

await meilisearch.getIndex('myIndex').search(undefined, {
  facetFilters: [documents.map(d => `id:${d.id}`)],
  limit: 999999
});

This should return documents with id foo and bar.

0 replies

tty2 · 2021-09-29T13:21:09Z

tty2
Sep 29, 2021

I see that this discussion is outdated but I'm curious if there is a convenient way to get documents by ids array using current engine?
I'm using golang api and can't find common way to do that.

1 reply

mikerogerz Sep 29, 2021

I was unable to find any optimal way via the API, so I did something like this (in PHP) to pull and update existing documents with new data:

$offset = 0;
$moreDocs = true;
do {
	$docs = collection($index->getDocuments([
		'limit' => 10000,
		'offset' => $offset,
		'attributesToRetrieve' => 'objectID,id,another_attr'
	]));
	if (!$docs->isEmpty()) {
		// match docs to update (based on id, etc), update attribute, and push updates to queue via API updateDocuments()
		
		$offset += 10000;
	} else {
		$moreDocs = false;
	}
} while ($moreDocs);

Not the most elegant solution, but it is/was the only reliable way I was able to match all documents, especially when indexing long-form content (multiple docs sharing the same id, but with different chunks of data to match on), so pulling individual documents wouldn't work either. Being able to pull by array would be extremely useful, in my opinion.. to grab a few hundred/thousand at a time.

tacman · 2023-07-11T18:34:21Z

tacman
Jul 11, 2023

Although this is an older post, I'm surprised it hasn't been solved yet. Getting a single document by primary key is simple, it shouldn't be hard to get a set of documents by an array of primary keys. Am I missing something?

0 replies

derekperkins · 2023-07-20T02:19:12Z

derekperkins
Jul 20, 2023

This has been solved now

Get Documents by Batch #644 (comment)

6 replies

irevoire Jul 25, 2023
Collaborator

Thanks, do you know what kind of API you want to see?

tacman Jul 25, 2023

I think expanding on /documents (GET) and /fetch (POST) would be seamless.

curl -X GET 'http://localhost:7700/indexes/dogs/documents?limit=1&filter=doggo=bernese'

curl -X GET 'http://localhost:7700/indexes/dogs/documents?ids=cody,finn,brandy,gambit'

curl -X POST http://localhost:7700/indexes/dogs/documents/fetch
-H 'Content-Type: application/json'
--data-binary '{ "limit": 1, "ids": ["cody","finn"] }'

That way, it could be combined with filter and search, which would rock.

// psuedo-code

$ids = $this->getStudentIds(['class' => 'civics101']); // some database query
$index->getDocuments(['ids'=>$ids, 'filter'=> ['gender'=>'nonbinary']);

irevoire Jul 25, 2023
Collaborator

Oh, interesting I was wondering if the filter and ids should be excluding each other or could be used together.
I’ll send your proposition to the team; thanks for your time 🔥

tacman Jul 25, 2023

basically, everywhere you can send in a filter, you should be able to send in a list of ids, since really the primary key is a property but because it's unique, it is handled differently than a filter. But it's fundamentally just a way of accessing the data.

macraig Jan 8, 2025
Maintainer

@tacman , in your proposal are the ids and the filter working as an AND or an OR? That is, do you want the intersection of ids AND documents that match the filter, or any document that has one of the requested ids OR matches the filter? We're assuming it's an AND, but wanted to check with you to make sure we don't miss your use case.

tacman · 2023-12-05T12:16:55Z

tacman
Dec 5, 2023

Here's an example of why this is important. I have two indexes, movies and actors. Both have a million rows (which meili, because it's awesome, handles without a sweat).

When I look up a movie, I want to get the actors and some data about them, but that data depends on the application. One application may want birthday, another nationality or languages. I don't want to store all the actor data in the movie database, I simply want an array of actors by code or imdb id or whatever.

So I get my movie results back and aggregate actor list so I have a single set of all the actors referenced from the movie result. I now want to fetch the documents in the actor index, by id, get back all the data (or the data my application specifies), merge it with my result set and send it off to the next step. Obviously fetching the actor data by id may need to be done in batches.

I don't want to iterate through the list of hundreds of actors one at a time (loading the document by id), I want to load all the actor data as efficiently as possible.

Facets are a terrible approach. They're not designed for millions of individual ids -- that's the role of the ID!

In a relational database, this would be a simple inner join between the movie and actor tables. In meili, it's a simple 2-index lookup, but is impossible without the ability to lookup a batch of documents by id.

The meili developers have created a blazingly fast lookup for searching, I imagine it would be very simple for them to bypass the "hard" searching and simple return a collection of documents based on the id.

My actual application is more complicated than this, but I hope this example is clear enough to warrant prioritizing batch lookup by id in the next release. Thank you for your consideration.

0 replies

tacman · 2023-12-05T12:17:02Z