Often, different parties possess data of mutual interest. They might wish to share portions of this data for collaborative work, but consider the leak of unrelated portions to be a privacy issue for themselves or their clients. Thus, methods that provide a well-defined and secure sharing of the data between untrusting parties can be useful tools. One such method is the ability for a client to search the information residing on another server without revealing to the server his identity or the content of his query; at the same time, it is desirable to guarantee that query capability is only granted to appropriate clients and that they do not learn anything unrelated to the query. Such a tool is useful in deciding and agreeing upon information-sharing between parties who do not initially know if they have data worth sharing with each other, and do not want to share information until they do. Once the information of mutual interest is established, the next step is providing a protocol that allows the retrieval of this data without violating privacy guarantees that were achieved during the search.

We developed a system for Secure Anonymous Database Search (SADS), which provides the following privacy and security guarantees: only authorized users are allowed to submit search queries, we provide anonymity for each query within the pool of all authorized users. The query results contain only pointers to documents relevant to the query and no irrelevant information about the database is revealed to the querier. At the same time the owner of the database does not learn anything about the queries that have been submitted. A major requirement for our system from a practical point of view is that the search time for query response is sublinear in the number of searchable tokens in the database. We achieve this efficiency using combining techniques for deterministic encryption and Bloom filters. In addition we employ two intermediary parties query router and index server that facilitate the search protocol. We discuss the set of assumption and guarantees with respect to these parties.

We extended the SADS system with a protocol for document retrieval that allows the queries to obtain the relevant document to his search without revealing the identities of these documents to the database owner. While the primary functionality for our system is keyword search, we show how we can leverage it to handle Boolean queries and range queries, and we analyze the corresponding privacy guarantees that we achieve for these new types of queries.



We have performed a number of tests to evaluate the performance of SADS, both the original and the extended designs. The left figure above shows the trade-off between performance and security. Clearly, the more secure the search is, the more resources are needed, but, even in the worst case -- most secure -- the search times are acceptable for practical purposes. In addition, secure document retrieval adds negligible overhead on top of secure document transfer, as shown in the middle figure. Overall, the performance of the extended SADS system (with the best privacy guarantees and secure document retrieval enabled) is comparable to a simply configured MySQL server.


The source code and the evaluation data are available upon request.



© Vasilis Pappas, Columbia University