SharePoint Permissions for AI Search: A Civil Engineering Guide
AI DOCUMENT SEARCH • SHAREPOINT PERMISSIONS • May 2026
SharePoint Permissions for AI Search: A Civil Engineering Guide
SalemWise builds permission-aware AI document search for civil engineering firms: SharePoint permissions are honored on every query, so a sealed or conflict-walled file never returns to someone walled off from it. SalemWise hosts the whole system, evaluates access at query time across SharePoint, network drives, and ProjectWise, and charges flat, not per seat. From $18,000, $1,800/month.
At 4:47 on a Friday, the managing principal of a 40-engineer Pacific Northwest civil firm gets a call she does not want. A senior PE has just received a Microsoft 365 Copilot answer that cites a sealed expert-witness report from a 2019 litigation the PE was conflict-walled off from. Copilot did exactly what Microsoft built it to do: it returned what Microsoft Graph could see. The problem is that a partner had moved the folder six weeks earlier, nobody re-ran permission propagation, and the index was serving a stale access-control list. A conflicted partner now holds a document he was obligated not to have, and the firm's professional-liability carrier has to be told.
This is not a Copilot bug, and it is not rare. It is the predictable result of treating document-level permissions as a thing you set once when files are indexed rather than a thing you check on every question. For the principal who carries partner-level liability on stamped work, a permission leak is not an IT inconvenience; it is a malpractice-shaped exposure with a dollar figure attached. For the engineer who actually has to trust the tool, it is the reason a generic AI search never gets used on the matters that count. This article covers how SharePoint permissions break in a 30-year engineering archive, what a single leak now costs in a hardened 2026 insurance market, and what an AI search has to do so it never happens. Pricing is published below.
What does an AI search tool actually see when you point it at SharePoint?
It sees what Microsoft Graph indexes, and that is not your archive. Microsoft Graph covers SharePoint Online, OneDrive, Exchange, and Teams. It does not index ProjectWise vaults, NTFS shares on the legacy file server, or the 1996 plan set scanned at 200 dpi sitting on a standalone drive. In the typical 30-to-250-person civil firm, the share of project content that lives in surfaces Graph can see is well under half. Everything else needs a separate ingestion path, and every separate path needs its own permission-propagation path. The Copilot connector library, which we covered in the Microsoft Copilot review, has more than 100 entries and no ProjectWise connector. The permission version of that same gap is harder, because a tool either propagates the correct ACL or it leaks the wrong document. There is no partial credit.
Three permission failures account for most of the risk, and each maps to a specific event that happens at a partnership firm on an ordinary week.
1. Inheritance the index never re-reads
SharePoint permissions inherit down a hierarchy: site to library to folder to file, unless someone breaks inheritance at a level. Microsoft's preview SharePoint indexer for Azure AI Search captures those access-control lists, but per Microsoft Learn the capture happens once, at first ingestion: "During preview, ACL ingestion applies to initial indexing only. ACLs are captured on the first ingestion of each file." When a partner re-organizes a project folder, the captured ACL and the live ACL diverge, and nothing re-reads the source until something forces it to.
2. Group types the indexer cannot resolve
The preview indexer resolves Entra users and groups. It does not resolve native SharePoint groups. Per Microsoft Learn, unsupported permission types include "SharePoint groups that can't be resolved to Microsoft Entra groups (such as Owners, Members, Visitors groups)." Firms that built their project access on the default Owners/Members/Visitors model, which is most of them, have a permission structure the preview indexer was never designed to honor.
3. The library that silently stops accepting permissions
SharePoint caps unique permissions per library. Per Microsoft's service description, "the supported limit of unique permissions for items in a list or library is 50,000. However, the recommended general limit is 5,000." A major project with per-discipline subgroups and per-sheet carve-outs can blow past 5,000 in a single lifecycle, and query performance degrades long before the hard cap. What actually breaks at the cap is covered later in this article.
The three queries below are the ones that expose permission failures rather than retrieval failures. That distinction is the whole point of this article.
Query 1: "Pull the 2019 expert-witness report on the [litigated project]." The report was sealed and walled off from any PE with a conflict on the matter. If the index captured the ACL before the conflict wall was applied, or if the wall lives in a SharePoint group the indexer cannot resolve, the report comes back to a conflicted engineer who should never see it. That is a disclosable event to the affected client, not a search-quality miss.
Query 2: "Find the geotech report for [acquired-firm project], restricted to the [office] partners." After an acquisition, two firms' tenants and security groups have to merge, and the cleanup lags the close by months. During that window a restricted report is permitted to the wrong group, and an engineer at the acquiring office retrieves a document the acquired firm's partners had deliberately fenced. The permission model was correct in the source the day before the migration and wrong the day after.
Query 3: "Show me the draft RFP response for the [competing pursuit]." BD walled this draft off from a principal who happens to sit as a referee on the client's selection panel. An AI assistant that returns it has just handed the principal a document whose contents he is obligated not to have. None of the three queries is a failure to find the right file. Each is a failure to respect who is allowed to see it.
Why do generic AI search tools mishandle engineering ACLs?
Because most of them evaluate permissions against a snapshot taken at ingestion, and an engineering archive's permissions change constantly: a junior joins a project team, a partner moves a folder, an acquisition closes, a litigation hold lands. Two named tools handle this honestly and differently, and neither is wrong for its own buyer.
Microsoft 365 Copilot honors Graph access controls natively at query time, which is genuinely strong: per Microsoft Learn, "Copilot can only summarize or reference content that the user is authorized to access." The limitation for civil engineering is that this guarantee only covers content Graph indexes, and Graph does not reach the network shares and ProjectWise vaults where most of the archive lives. Glean enforces permissions in real time across its connectors and markets that "Glean enforces real-time indexing and respects existing permissions, so users only access what they're allowed to see." The limitation for a partnership firm is the per-seat enterprise price floor, which is hard to justify when eight people, not a hundred, actually run searches. SalemWise differs on three things: it evaluates ACLs at query time against the live Entra ID store across SharePoint, network drives, and ProjectWise alike; it detects the SharePoint permission-scope cliff that the others surface only as degraded performance; and it processes the archive with its own self-hosted model, so a firm's stamped documents are never routed to a public AI API.
| M365 COPILOT | GLEAN | SALEMWISE | |
|---|---|---|---|
| Permission evaluation | Query-time, against Microsoft Graph ↗ Microsoft Learn | Real-time, mirrored across connectors ↗ glean.com | Query-time, against live Entra ID ↗ salemwise.com |
| Surfaces covered | SharePoint Online, OneDrive, Exchange, Teams | 100+ SaaS connectors | SharePoint, network drives, ProjectWise |
| Network drive ACLs | Not native | Via connector if exposed | NTFS ACLs read over read-only VPN |
| SharePoint groups | Native | Supported | Resolved to Entra groups at ingest |
| 5,000-permission cliff | Surfaces as degraded performance | Surfaces as degraded performance | Detected and reported in audit |
| Where content is processed | Semantic index inside your M365 tenant | Central index, Glean-hosted cloud | SalemWise-hosted, tenant-isolated environment |
| AI inference | Microsoft-operated models in the service boundary | Glean-hosted models over the copied index | Our own self-hosted model; never a public AI API |
| Pricing model | $30/user/mo Enterprise | ~$60K enterprise floor | Flat, from $18K + $1,800/mo |
| Best for | Firms whose archive is fully inside M365 | Tech-company horizontal search at scale | Civil firms with hybrid SharePoint + drives + ProjectWise |
What does permission-safe AI search have to do for a civil firm?
Five things have to be true before a firm should trust AI search on stamped work. A platform that misses any of them is not a real option for an engineering archive, regardless of how good the demo looks.
-
01Evaluate ACLs at query time, not ingest time Permissions checked against the live Entra ID store on every retrieval, so a folder a partner moved an hour ago is reflected now. An index-time snapshot is acceptable for a static public corpus; it is not acceptable for active litigation, M&A diligence, or conflict-walled matters where freshness is the whole control.
-
02Resolve every group type, including broken inheritance SharePoint Owners/Members/Visitors groups, Entra security groups, and per-folder broken-inheritance carve-outs all map cleanly into the access decision. A folder that broke inheritance to lock a sealed report has to be honored, not flattened back to the parent's permissions.
-
03Reach the surfaces Graph cannot see NTFS ACLs on network shares and security-set membership in ProjectWise carry their own permission models. The system reads them over a read-only path and maps them into the same identity decision as SharePoint, so one query honors three permission stores at once.
-
04Detect the SharePoint permission-scope cliff Per Microsoft, a library at the unique-permission limit refuses further inheritance breaks with the error "You cannot break inheritance for this item because there are too many items with unique permissions in this list." The audit flags libraries approaching 5,000 scopes so the firm reorganizes before the failure, not after.
-
05Maintain a do-not-index allowlist Litigation holds, sealed expert-witness work, and conflict-walled matters get an explicit exclusion that no permission change can override. A document under hold should not enter the index at all, regardless of who later gains folder access.
How does query-time ACL enforcement actually work under the hood?
The mechanism is a different problem shape than indexing text. Indexing answers "what does this document say." Permission enforcement answers "may this specific person, right now, see it." The first can be cached; the second cannot, because the answer changes the moment a folder moves. The pipeline below is the SalemWise reference architecture. The cold-start problem, getting permissions right on day one, is solvable with care. The steady-state problem, staying correct after the archive changes, is where most deployments break, and it is the reason ACLs are evaluated against the live identity store rather than a snapshot.
-
01Identity resolves from the Entra ID token A query carries the engineer's Microsoft Entra principal. The retrieval layer expands that principal into its full group membership at request time, including nested groups, so a person added to a project team this morning is evaluated with this morning's membership, not last night's crawl.
-
02Permission metadata is stored alongside each chunk At ingestion, each document's resolved ACL, the SharePoint inheritance chain, the NTFS ACL, or the ProjectWise security set, is stored as filter metadata on every retrievable chunk. This is the snapshot. It is necessary for performance but never trusted as the final word.
-
03The live store re-validates before any chunk is returned Candidate chunks from the semantic and keyword search are filtered against the stored ACL, then the surviving set is re-checked against the live Entra membership for sensitive scopes. If the live store and the snapshot disagree, the live store wins and the chunk is dropped. Microsoft's own preview indexer instead serves the snapshot until a resync runs, which is the stale-ACL gap this step closes.
-
04Failures fail closed If the identity store is unreachable, the query returns no results rather than an unfiltered set. A permission system that fails open is worse than no permission system, because it leaks silently. Fail-closed is the only acceptable default for stamped engineering work.
-
05Continuous delta-sync keeps the snapshot honest A scheduled delta-sync re-reads changed ACLs across SharePoint, network drives, and ProjectWise, so the stored snapshot the fast path relies on never drifts far from the source. Moved folders, new group members, and revoked access propagate on the sync cadence rather than waiting for a full re-crawl.
-
06Every answer carries a page-level, permission-checked citation The returned answer names the document, the page, and the section, and that source passed the same ACL check as the chunk it grounds. The PE who stamps the work gets the traceback for QA and litigation defense; the principal gets an audit trail the firm's professional-liability carrier can read.
There is a second half to this trust story, and it matters as much to the principal as the ACL layer. The permission layer decides which human may see a document. The inference architecture decides where the document goes to be read. In a SalemWise deployment, the archive is ingested into a SalemWise-hosted, tenant-isolated environment, and the model that generates the answer is SalemWise's own self-hosted model running inside that environment. A query and the documents it retrieves are processed only by that model. They are never sent to OpenAI, Anthropic, Google, or any third-party AI API, and they never become training data for anyone's model. The most sensitive material, sealed expert-witness work and conflict-walled matters, is excluded from ingestion entirely by the do-not-index allowlist, so it never reaches the environment at all. For a firm whose archive holds documents like these, that combination, no public AI API plus a hard ingestion exclusion for the untouchable material, is not a feature line; it is the difference between an architecture a professional-liability carrier can sign off on and one it cannot.
For the global mining consultancy version of this problem, the permission boundary is not a folder ACL but identity federation across SSO realms. A firm with nine offices on five continents typically runs separate Entra tenants per region for data-residency and acquisition reasons, with cross-tenant access settings governing guest and contractor access. The retrieval layer has to know which home tenant issued a token and apply that tenant's ACL space, so a document permitted to home-tenant employees is still blocked to a cross-tenant guest. That is precisely the case Microsoft's single-tenant preview indexer was not built for, and it is a separate engineering conversation from the Pattern A folder-ACL story above.
What does a real civil engineering deployment look like?
A 40-engineer civil firm in the Pacific Northwest, founded in the 1960s, partnership-structured. Water and wastewater is half the practice, transportation a third, structural and site civil the rest. The archive is roughly 7 TB on SharePoint Online plus a shared network drive, with about 2 TB of legacy scanned drawings on a separate file server. M365 E5 firmwide; Copilot Enterprise turned on for the partners and senior PEs in late 2025. Federated identity through Entra ID, SAML to two line-of-business applications.
The trigger was the Friday-afternoon call in the opening of this article. A folder holding a sealed 2019 expert-witness report had been moved during a routine project-archive cleanup, the conflict wall lived in a SharePoint Members group, and the firm's existing index served the pre-move ACL. A conflicted senior PE received the report in a Copilot answer. Nothing in the source was technically misconfigured the day the wall went up; the index simply never re-read it. The managing principal called her professional-liability broker, and the broker asked the one question the firm could not answer: which people had been able to retrieve that document, and on which dates.
SalemWise ingested the archive into a tenant-isolated environment it hosts, reached the SharePoint, network-drive, and ProjectWise sources over a read-only VPN connection, and delivered retrieval through a secure web interface inside the firm's network. The firm bought no hardware and stood up no infrastructure. The sealed 2019 report was added to the do-not-index allowlist and never entered the index at all. For the rest of the archive, the conflict-walled and restricted documents no longer return to a conflicted principal, because the access decision is made against live Entra membership on every query rather than against a stale snapshot, and no query or document is ever routed to a public AI API. The audit trail now answers the broker's question directly: who could see a document, and when. Time-to-permission-answer on that question dropped from a multi-day forensic reconstruction to a single query. Just as important to the engineers: because the citations are page-level and the answers are grounded in the actual reports, the senior PEs started using it on the matters that count, which is the test a generic tool had failed. The firm kept Copilot for email and meetings; the archive retrieval runs alongside it.
What does a permission leak actually cost a 40-engineer firm?
The first four SalemWise articles framed return on investment as hours saved per engineer per week. For the economic buyer weighing a security spend against partner equity, that is the wrong number. The number that matters is the cost of a single permission-leak incident, and in 2026 that cost moved in two directions at once.
AI used in stamped engineering work just acquired its own insurance exclusion. In January 2026 ISO introduced Form CG 40 47, which per construction-law analysis "presents language for claims arising from generative AI outputs" under commercial general liability policies. At the same time the professional-liability market hardened: the Ames & Gough 2026 survey of 15 leading A/E insurers found 73% planning rate increases, and per the survey 80% now view AI adoption by design firms as a potential market disruptor, with civil engineering ranked the second-highest-severity discipline at 73%. A firm that ships an AI-generated answer built on a document that was sealed for conflict reasons is exposed twice: under E&O for the consequences of the disclosure, and under a freshly excluded CGL gap for the AI provenance of the answer.
The math does not need to be precise to be defensible. It needs to be honest and sourced.
| Option | Year 1 cost | Permission-leak exposure |
|---|---|---|
| Generic AI search, index-time ACL snapshot | License cost only | One stale-ACL incident per 5-year window carries deductible + next-renewal rate multiplier + partner-hour breach response, against a hardening market where 73% of insurers are raising rates |
| SalemWise, query-time ACL evaluation | ~$39,600 ($18K+ setup plus $21,600 ongoing) | Same incident probability reduced by the live-identity permission layer and the do-not-index allowlist; documented audit trail for the carrier |
| Do nothing; keep archive un-searchable | $0 direct | No leak risk from AI, but the retrieval problem stays unsolved and senior-PE archive time stays leaked |
The recommended row is the second. For the principal, the argument is not "it saves hours," it is "it closes a partner-level liability the firm's own CGL now excludes." A single disclosable conflict-wall breach, between the E&O consequences and the partner hours burned on the breach response, exceeds a year of the contract before the rate-hike multiplier is even counted. And the contract is flat: hosted on SalemWise infrastructure, so the firm buys no GPU, and priced as one deployment rather than per seat, so a firm where eight people run searches is not billed like a hundred. The per-seat math that makes enterprise search painful for a partnership does not apply.
When is SalemWise the wrong tool for the permission problem?
Several cases disqualify a firm, and an honest IT evaluation should rule them out fast.
Firms whose archive is entirely inside M365
A firm founded recently, on SharePoint Online from day one, with no network drives, no ProjectWise, and clean Entra-group hygiene, already has query-time permission enforcement through Copilot. The cross-surface permission layer SalemWise adds solves a problem that firm does not have.
Firms that have not cleaned up SharePoint oversharing first
If a tenant is riddled with "Everyone except external users" grants, no AI permission layer fixes that, because the source permission is wrong, not the propagation. Microsoft's own guidance is to remediate oversharing before enabling Copilot. The same applies here: clean the source ACLs first, then add the retrieval layer.
Firms under 30 engineers
Where the managing principal still does most of the searching personally and there are no conflict walls in play, the permission-leak exposure that justifies the spend is too small to carry the setup cost.
CMMC Level 3 CUI subcontractors with data-residency clauses
SalemWise hosts the index and the model on its own infrastructure, which is the right trade for most commercial firms and the wrong one for a subcontractor whose contract requires controlled unclassified information to stay inside a government-authorized boundary. If your work carries CUI residency obligations, this is a disqualifier, not a configuration option, and we will say so on the first call. As with the Deltek Dela review's treatment of federal scope, this is flagged, not glossed.
How does the deployment and security review actually run?
Yermek Ibrayev is an ex-Meta and ex-Google software engineer who builds RAG for civil and mining engineering firms. SalemWise hosts the entire system, the index and our own self-hosted model, in a tenant-isolated environment on infrastructure we run and secure, so the firm stands up no hardware and carries no GPU cost. Queries and documents are processed only by our internal model and never sent to a public AI API. SalemWise is building toward SOC 2 Type I attestation, with Type II to follow for federal-adjacent work. More at salemwise.com/about.
-
01Discovery and threat model (week 1) A free 30-minute call maps the archive surfaces, the identity topology in Entra ID, and the conflict-wall and litigation-hold requirements. The output is a threat model and a deployment-and-security reference architecture the firm can put in front of its partners and its professional-liability carrier.
-
02Permission audit (weeks 1-2) A read-only sweep classifies the SharePoint libraries, flags those approaching the 5,000-unique-permission cliff, surfaces "Everyone except external users" overshares, and inventories the broken-inheritance carve-outs and do-not-index candidates. This runs before any content is indexed.
-
03Deployment and indexing (weeks 2-4) The retrieval stack and SalemWise's self-hosted model stand up in a SalemWise-hosted, tenant-isolated environment. The archive is ingested over a read-only VPN connection to the firm's SharePoint, network drives, and ProjectWise, with identity federated to the firm's Entra ID. Inference stays inside the SalemWise environment; no query or document is sent to a third-party AI API, and the do-not-index allowlist is enforced before anything is indexed.
-
04Pilot on one discipline (weeks 4-6) A single discipline's recent project archive is indexed with full query-time ACL evaluation, and the conflict-wall and do-not-index rules are validated against deliberately constructed permission-leak test queries before any wider rollout.
-
05Firm-wide rollout (weeks 6-8) Indexing extends across disciplines, delivered through a secure web interface inside the firm's network over VPN, or Microsoft Teams where the firm prefers it, with continuous delta-sync keeping ACLs current and an audit trail recording every authorized retrieval by identity, source, page, and date.
Common objections from the partner meeting
"We already have AI, Copilot respects our permissions."
Copilot respects the permissions on content Microsoft Graph indexes, and that guarantee is real. The gap is the half of the archive Graph never sees, and the stale-snapshot window on the half it does. Query-time evaluation against the live store across all three surfaces is what closes both.
"You'd be copying our documents into another environment, that's a new attack surface."
Yes, and we will not pretend otherwise: the archive is ingested into a SalemWise-hosted, tenant-isolated environment, because we run the infrastructure so the firm carries no GPU cost and stands up no hardware. Four things bound the risk. The environment is isolated per firm. The model is our own, self-hosted, so nothing is routed to a public AI API or used to train an external model. The most sensitive material, sealed expert-witness work and conflict-walled matters, is excluded from ingestion entirely by the do-not-index allowlist, so it never arrives. And every retrieval is permission-checked at query time and logged for the audit trail. That is the trade: you do not host the system, and in exchange the data-handling has to be airtight, which the free audit and the deployment reference architecture exist to prove before you commit.
"Does our archive get sent to an outside AI provider?"
No. Queries and the documents they retrieve are processed only by SalemWise's self-hosted model inside our tenant-isolated environment. Nothing is sent to OpenAI, Anthropic, Google, or any third-party AI API, and nothing is used to train an external model. This is the architectural difference that lets a firm handling sealed or conflict-walled work use AI retrieval at all.
"How do we prove to the E&O carrier that the permission layer works?"
The audit trail records which identity was authorized for which source on which date, and the pilot validates the conflict-wall rules against constructed leak tests. That artifact is what the broker asks for, and it is what a snapshot-based index cannot produce after the fact.
"What about a litigation hold that lands after we've indexed?"
The do-not-index allowlist removes held documents from the index, and the continuous delta-sync enforces the removal on the next cycle. The hold scope is documented so preservation obligations are auditable.
What to do about this before your next renewal
Three actions. The first two cost nothing and can happen this week.
1. Ask your professional-liability broker one question. Does our policy now carry an AI exclusion, and is our CGL endorsed with ISO Form CG 40 47? If the answer is yes, or "let me check," you have just learned that an AI-provenance claim may fall into a coverage gap, which reframes every AI tool the firm runs as a liability question, not an IT one.
2. Pull one litigated or conflicted project and ask who can see the sealed documents in it. Have whoever administers your archive confirm which people can currently retrieve the walled-off files, then have an engineer run the test directly: from an account that should not have access, ask your existing AI tool to summarize one restricted document, move the document to a different folder, wait an hour, and ask again. If the restricted content still comes back, your index is serving stale permissions, which is the exact failure that generates the Friday call.
3. Book the free 30-minute archive audit. We map your archive surfaces and identity setup, tell you whether your permissions are ready for AI search, flag the libraries near the SharePoint permission cliff, and quote what an 8-week deployment would cost. No slide deck, and the first two actions above tell you whether you even need the call.
Map your archive. Quote a deployment. 30 minutes.
A free discovery audit: we tell you whether your archive is ready for permission-aware AI search, where the permission risk sits, and what 8 weeks of deployment would look like. SalemWise hosts everything, so you buy no hardware and pay flat, not per seat.
BOOK THE FREE ARCHIVE AUDITYermek Ibrayev, ex-Meta/Google, builds RAG for civil and mining engineering firms at SalemWise Solutions. salemwise.com/contact
Sources cited in this article: Microsoft Learn (SharePoint indexer ACL preview) for the 2025-11-01-preview API behavior, the initial-indexing-only ACL capture, the stale-permissions failure mode, and the unsupported SharePoint group types; Microsoft Learn (SharePoint limits) for the 5,000 recommended / 50,000 hard-cap unique-permission limits per library; Microsoft Learn (manage permission scopes) for the inheritance-break error language; Microsoft Learn (Copilot data protection) for Copilot's query-time authorization model; Glean for its real-time permission-enforcement claim; Ames & Gough 2026 A/E PL Survey for the 73% rate-increase, 80%-AI-disruptor, and civil-engineering-second-severity figures and the Jared Maxwell quote; Cohen Seglias for the ISO Form CG 40 47 January 2026 generative-AI CGL exclusion. Related SalemWise reading: Microsoft 365 Copilot for Civil Engineering Firms and Deltek Dela for Civil Engineering Firms: Honest Review. Disclosure: Microsoft®, Microsoft 365®, Copilot™, SharePoint®, OneDrive®, Teams®, Microsoft Entra ID™, Microsoft Graph™, and Azure® AI Search are trademarks of Microsoft Corporation; Glean™ is a trademark of Glean Technologies, Inc.; ProjectWise® and Bentley® are trademarks of Bentley Systems, Inc.; Deltek®, Vantagepoint®, and Dela™ are trademarks of Deltek, Inc. SalemWise Solutions is not affiliated with, endorsed by, or sponsored by any of these companies. Pricing, API version strings, and insurance form designations are current as of May 2026 and may change; verify before relying on them.