Synapse stuff

From Zaori

This document outlines tasks completed or attempted while working on Synapse - effectively akin to a research journal but as part of a production-intended project instead of a more specific study.

I came on toward the beginning of the project, when it was largely only an idea - the general audience, proposed functionality, and possible directions were reasonably established, but measurable direction of the project was uncertain. Initial research helped pin down what functionality was plausible while the others interviewed potential users, and from there we moved onto the MVP as would be presented more broadly in order to assess user pull. Amidst more research, prototype designs were proposed, and Houman implemented the first app iteration. From there, more feasibility research was done for the chosen platform, google drive, and also for audio handling in general, though less was found regarding that than would have been hoped. A final round of usability and performance testing served to wrap up my involvement with the project, and while the results brought up issues that merit further looking into, they also indicate there is indeed reasonable hope for Synapse as a viable product.

The journal was originally in a google doc, but was subsequently moved here for accessibility purposes.

Task 1 - usable existing products

Different ways to collaboratively edit a document - generally you do want a platform for that instead of just emailing a thing around or whatever...

Wikis

  • Shared document/page - open for editing, save changes, folks see changes
  • Issues with edit conflicting
  • Generally cannot easily have multiple copies of the same thing

Available potentially extendable products:

Real-time editors

Probably want a real-time editor. Going with Google Drive for now for ease of setup. Will need something that can be integrated into the main app

  • Edit live - automatically saves as one goes
  • Folks can see each other type
  • Danger of messing things up - change mind about changes, need to manually reload previous save; not possible if others editing at the same time

Available potentially extendable products:

  • Etherpad
    • Open source, free to use, has APIs, kind of messy front-end, would probably work with hacking.
  • Google Drive
    • Simple solution: link to a drive document, go there and use that
    • Integratable? Check terms of use, APIs, might work perfectly.
  • BeWeeVee
    • Specifically created for integrating into other things, uses .Net/js, free for academic/NC use - would probably work fairly well without too much trouble.

Task 2 - MVP demo

Work on script, record an audio...

Fiddled with it, and came up with something which fortunately wasn't used.

Task 3 - questions

How can we get people signed up for our private beta?
How can we run A/B testing for our demo?

Ways to reach people to bring them to the landing launchy page

  • Search engine results - SEO is crap/nonexistent, so only was is through ads on results.
  • Ads on sites through some campaign manager
  • Social networking - linkedin, facebook, twitter
  • Mail spam to potential users - cu mailing list, perhaps, others
  • Word of mouth/getting people to share it themselves - reddit, clear share button on page, etc

Would probably want to do all of those, if possible. Emphasis on the last three because they cost less, and emphasis on the first one because it’s likely to find people the same way they’d be finding the product itself later once it does have a proper web presense, so if it would work later, it should work now.

How will this relate to google hangout? (commonly used by folks meeting with remote colleagues)

A/B testing:

Set up 1-2 more launch pages, Houman will make the actual landing thing in js, which should serve to scare off anyone like me, which is a good demographic to keep away from such things.

Prototypes

Prototype components:

  • Logo/wordmark and slogan
  • About
  • (Feature highlight - shared notes, shared discussion, easy access later to organise archives)
  • Video(s)
  • Sign up for private beta
  • (Toggle between business/school)

General flow of page:

  1. This thing!
  2. You want this thing!/You want this thing because!
  3. Now that you're interested, watch this for more info!
  4. Now sign up for this thing!


If that doesn't work it's hopeless. Or something is wrong, at any rate.

Task 4 - prototypes

Mockups/prototypes to give folks a feel for the product...

Wireframes

General site layouts for academic version.

Mockups

Task 5 - feasibility stuff

  • How to parse a google doc (API results, etc, what would be pulled out)?
  • Something about how to match up snippets of audio from different sources - algorithms, established methods, possible products, etc.

Parsing google doc

From developers.google.com:

Download the content as described (probably as plain text since we're unlikely to care about images, text format, etc for this).

To parse, regex would probably be more than sufficient:

  • Find sentences ending with questions
  • Find hashtags
  • Etc


Need to have a background process on the server or something to update - use the changes feed; check new changes(?) and reparse as needed.

Matching audio clips

The aim is to find overlap between multiple different recordings in order to combine disjoint recordings of the same meeting into a single, higher-quality file.

We probably want to match the acoustic fingerprints of audio files as the first part of the process. Normally this sort of technology tends to be used for identifying music, but applications developed for that may work for this as well. In essence a fingerprint is a simplification of an audio file that focusses the perceptual characteristics (what a person would hear), and from this a basic comparison allowing for a certain range of variance between file fingerprints should serve our needs.

An existing open source solution such as Echoprint or AcoustID would probably be ideal if they would work.

The problem is how to figure out what, if any such option, would actually work.

  • Recordings from different devices and places will contain considerably more variance and differing noise than reencodings of a song.
  • Depending on how an implementation handles partial recordings, these may or may not even work to find overlap in the recordings.

Alternately generating our own 'fingerprints' for each snippet and then running them through a sloppy comparison possibly using the same methods as to generate the fingerprints in the first place - are they similar enough? - to match them up might be another, potentially more robust, option.

A paperly overview of some of the technology can be found here: Fingerprint-Cano.pdf

More questions

  • How can we add a sentence to a Google doc using the api? (to allow students to ask anonymous questions.)
    Using a generic user to do the add (such that said user would be what shows up in the history instead of the student by name), the studen sends it the sentence which the generic user then adds using patch...
  • How can we replace a word in a Google doc using the api? (why? we'll have templates for new created Google docs. Each template will have some placeholders (lecture title, instructor name) on top of the page which need to be replaced with real data from the corresponding lecture.)
    Probably patch for this as well.
  • Does the Google API allow to turn off the chat feature?
    The documentation does not seem to cover the chat feature, at least not that was found. But either it shows up or not when the object is rendered and can be enabled/disabled or not, but the fastest way to check this would probably be to set up the thing itself. Even if chat is there without option when we don't want it, if all else fails we could probably make it go away on our end easily enough.
  • Do Echoprint or AcoustID allow to find the exact overlapping points of two audio files or all they tell is if the fingerprints match up or not?
    From what I could find there really aren't any (effectively advertised) solutions to what we want to do. Echoprint and AcoustID are intended to check if files are the same, but because they do so by matching fingerprints, these fingerprints could also be adapted to our purposes to check for partial overlap and sameness as well, thus providing the overlap points.

Task 6 - usability and performance testing

Record three lectures

  • Planned to get volunteers to test the app while also recording it ourselves. This worked for one (performance modelling), another did not (startups class; Arjun asked the class to record from laptops, but nobody actually did), and third was just us recording from phones (chaotic dynamics), but in a room with the whole technical setup specifically for recording, so we could compare results.
  • Rationale: Multiple recordings increases pool of audio we have to work with and test future functionality. User testers can find problems with usability and also give insight into usage patterns and different use cases.

Things that come up

Taking pictures is kind of awkward.
  • Picking up idling phone in general and holding it up is strange behaviour in a class - feels strange to do, and potentially distracting to the lecturer as well, since a student is holding up a phone randomly.
  • App itself lagged while holding it up to take the picture, which made it more awkward (probably lagged due to changing orientation)
  • Phone likes to go to sleep; unlocking it takes time and can prevent actually getting the picture.
  • Folks tended not to take pictures.
There's no good way to record gestures, examples demonstrated with an object (like holding up a pen and showing how it can fall in different situations), etc, which may be relevant. Pictures better than nothing, maybe, but just not the same.
Class structures can present issues.
  • Makes sense for lectures.
  • Can make sense for presentations, but need a way to mark who is presenting, when presenters transition, etc.
    • (There was a comment from Zach about how recording things is also useful for the presenters to improve their presentation skills)

What if speakers are on all sides of the room, or just students asking questions? Some quieter than others, talking in different directions... (Something to pay particular attention to when examining test results? Definitely came up in the startups and chaos classes.)

Chaos dynamics

This lecture takes place in a fancy room specifically set up for recording. Said setup ran into trouble while we were also there recording from the app, where the recording system failed. Speakers cut out and it was uncertain what all was working and not - thought perhaps the recording was still working and that just the speakers had gone out, but it turned out to be both and five minutes of lecture were lost as a result. The folks running the thing were somewhat frantic as well, with apparently no clear feedback on either end as to what all was working/not working.

So feedback should be a concern - something in the app itself to tell people it's working (maybe just a displayed wave of the recording, or show the levels, or whatever), and something they would notice when it quits working. This is a valid reason to sacrifice simplicity of the interface, because above all else users want the thing to work, and if it isn't working, they want to know. They don't want to lose stuff.

Lecture itself is done digitally. Looked like a presentation on a tablet, professor drawing curves on and adding notes to slides.

  • Add screencapping software to supplement the app for the presenter machine, which tracks timestamps for later syncing with audio?

Note on audio/video syncing

Nevermind the fancy audio analysis needed to find splice points, should probably line up the separate audio recordings based on timestamps in general, and this would work for video/slides too. These apps should all be on internet-enabled devices, so they should be able to sync timestamps with a central server while recording and include those in the upload.