-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to new implementation of NPLD limitations #54
Comments
Updated following clarification of download limits. |
To clarify, the default behavior is that a resource remains locked, unless it is explicitly unlocked by the same client, right? Eg. If client with cookie A locks OR Each Or perhaps I am missing something? |
Regarding 4), one possible tricky edge case may be if a certain type of resource could be either downloaded or embedded in the page. I guess maybe the only example is PDF, unless there is a custom viewer for ms-word somewhere... We know that for certain that if And for PDFs, I think if you use the default PDF plugin the browser provides, it is still possible to download the PDF from there.. I don't think there's a way to prevent that from the default PDF viewer. |
On locks, it's the second case. The idea was:
So, the lock-till-midnight should rarely happen, as it is simply a fall-back in case for some reason the client-side locking protocol fails. On (4), I was imagining we'd sniff the |
- When pinged at /_locks/ping, the lock for the referring url is set to expire after LOCK_PING_EXTEND_TIME seconds - Ping is set to happen every LOCK_PING_INTERVAL seconds - Copy and/or selection of text is limited to SELECT_LIMIT_WORD words (if any) for single-use-lock collections - config 'add_headers' option can be used to specify extra cache-control and other headers - config 'content_type_redirects' option can be used to specify content-types for which to redirect to a custom viewer or block page, and <any-download> also adds redirects for any response with 'content-disposition: attachment' set - update docs/locks.md with new features - update config.yaml with add_headers and content_type_redirects - tests: update tests to test new features, add WARCs to test custom content-types and content-disposition headers - selection limits via static/selection-limits.js
Yep, kept the initial lock mechanism, but ping shortens the lock for the referring url. Updated docs for the new features: |
Generally, looks good. Unfortunately, I think the content-type block will need some way to act more like a allow list than a block list. e.g. unknown or unspecified formats should not be downloadable, so we'll need some way of saying 'web formats allowed' (html/jpg/css/js/png/etc.). Which is unpleasant but necessary. |
content type redirects: support default block list, with specific allow rules, per #54
How about something like this? content_type_redirects:
# allows
'text/': 'allow'
'image/': 'allow'
'video/': 'allow'
'audio/': 'allow'
'application/javascript': 'allow'
'text/rtf': 'https://example.com/viewer?{query}'
'application/pdf': 'https://example.com/viewer?{query}'
'application/': 'https://example.com/blocked?{query}'
# default redirects
'<any-download>': 'https://example.com/blocked?{query}'
'*': 'https://example.com/blocked?{query}' The content-disposition is checked first so always takes precedence, then exact match, If this makes sense, can expand it with more mime types. |
We are moving to a new, simpler implementation of the NPLD limitations. Rather than going through a remote desktop, clients will access Wayback directly. This means we need to do a few things:
Single-Concurrent-Use
The will be no login/logout hooks, so a simple alternative locking mechanism is proposed.
The default behaviour is that all 'top-level' URLs will be lock to a user's cookie session, set to time-out at midnight later that day. As before, transcluded items should not be locked. These locks are managed server-side.
To enable the lock to be released earlier, the lock can be polled and repeatedly updated from the Wayback JavaScript client, with a time-out set to a few minutes in the future. While a page is being viewed, it will still be locked to the current user, but once they move on it should time out in a few minutes as the lock is no longer being updated.
This means files that get downloaded will be locked for the whole day, but most pages should be released promptly.
Limit cut-and-paste
The client-side JavaScript should intervene during cut/copy events and limit the text to a configurable amount.
Limit local caching
The server should add headers to limit local caching, as per https://stackoverflow.com/questions/9884513/avoid-caching-of-the-http-responses -- this may be better done via NGINX?
Prevent downloads of non-web content
We need to try to prevent content being downloaded to local machines, and use a secondary service for rendering some formats to HTML.
First step is to intercept direct downloads of content other than HTML. These will then either be blocked (probably with a custom 451 error) or passed to an external service for rendering.
We will need some lookup table that maps Content Types to URL templates, e.g.
Or similar. When we hit a non-web type, we should open up the block page, and if there's a mapping, offer to redirect the user to that URL for access. For all types, we should ensure the
Content-Disposition
header is blocked so downloads can't be forced that way.i.e. this is similar to the old Interject idea (source code & tech docs here).
The text was updated successfully, but these errors were encountered: