okay are are about the non here one my name's is on you and uh

i'm from national institute of informatics

uh in japan

so today a uh this talk is on our recent work

but i know use a temporal recurrence hashing our reason

uh for mining commercials

from a to a string

so come from mining is an important uh preprocessing task you know you though

uh a cost "'em" at i an can and uh uh market research

it i'm i'm at detecting and localising a

duplicate commercial sequences from a large scale

uh a video archive

or to scheme and a

one month

us archive there could be

chains

oh of uh thousands of uh a duplicate

uh commercial sequences

so many

uh detecting

and the localising these so many uh commercial sequences is too time consuming and uh

uh labour intensive

so an automatic

a commercial mining technique is needed

so one direction of commercial mining

is known in is not each based uh commercial mining it use it the

uh the intrinsic uh

characteristics of commercials

forty as a up to an eight for detecting on

for instance

a a use some can trees

the T V stations may uh at some

one a monochrome

or silence frames uh into that under speech into neighbouring commercial segments

so if we can detect the position of these frames then we can use it to get the

uh duration uh i mean the location of the commercial sequence

so most

and no each based

a techniques are uh efficient

but it's not generate enough because of that data dependent

uh a knowledge they use

uh because these up real knowledge

uh maybe be barry this can trees and time

i where there is one up to an order that can be used for detecting commercial but to view

uh not to be scanned all times

that is

commercials are reputations

so this kept to just take you never change

and uh inspired the

another uh direction known as reputation based commercial mining

so most uh uh reputation based techniques are

and super wide

generate more generate

but uh a lot uh can board

so

in this study we proposed a simple but very effective

a call reason

uh for for the and supervised

uh generate each and out high speed commercial mining

so in this study there is no

uh training that he's not harry provide a before and what do we have a is only a very wrong

uh to tree

and uh the a priest does not depend on any prior knowledge that my

they have can't result time

and uh also the are rings is

very fast

uh for ten hour stream

uh the

person time was only a four seconds

and for one man

a

the time was this simple forty two minutes

and for for five here we do stream the processing time was this some twenty one hours

it's very far

so the proposed

uh uh we're is is

a a two-stage hashing out

and uh i i really should i we explain the a two state one by one

and before explaining the first stage by like to discuss that

difference between commercials

and and new duplicate video

and

a it's you mean know a by definition your to decades

a carriers that's all approach

to to D uh identical videos

it's videos are normally are derived from an original

deal

by means of various is transformation

such as uh and coding your train picture

and something that

and on the other hand

commercials

ah

exact duplicates derived from the original video deal on any transformation

so commercials can be considered as a special case of near to the kate we'd

so this is the main difference between commercials and near to K

and in the case of commercials

uh the fragments

for example the frames all the shops sub shots

oh the but videos can be translated into

uh a a be every compact

uh in

so that identical fragments across that duplicates can be mapped to exactly the same thing to for

so based on this assumption if we insert

the fragments

i into a hash table by regarding the fingerprint as a you but in in X

uh a a hash collision

we occur in the corresponding has pocket

so

uh

the duplicate

uh fragments can be easily detect it

by based on a

uh could each a tech process

so these assumptions

uh

uh not to reasonable for in the case of near to

as but only reasonable for exact duplicates like commercial

so in this study we propose

applying a luminance based fingerprint stage she to the be do stream and also a light

uh

and use it all your hashing

technique to the audio stream

and we didn't

test all

the the you existing techniques but to be believe that the

the proposed our reason is performance

uh you by rent to with a

fingerprint sticky so any existing one can you

so we apply these two fingerprinting straight is two

all frames of the but

and uh

so here the same color indicates

uh frames

uh with the same fingerprint

and uh the canteen hours frames

with the same finger

uh a a and boat into a fraction

so

is a huge hole based on the

the condition shouldn't the hash collision uh a tech that's we can use the to detect duplicate fragments

but please note that

the goal of commercial mining is not to detect

these do you keep experiments but to detect

do P eight

see

a uh the commercial sequence is normally composed of

uh site chains all few hundred all

a fragment

so here we read got

uh that duplicate fragment parents that the basic unit

and a project them to of the time X

uh from this figure we can also so strong temporal consistency

uh among these pairs

for instance the

uh positions of the fragments of consecutive

and of the temporal interval

each in each to fragment

almost the same

so this kind of temporal consistency is very useful for distinguishing

duplicate sequences from non duplicate one

and the the time more time

the commercial mining a task can be formulated into a

searching for

duplicate fragment carrots with high temporal consistent

so one sort of and to this is to a i

a a pairwise matching based on temporal information to all

pairs scope duplicate right

so people and P do you know the number of the

the confusion cost

is in your to the scale and P

and we can see and P stomp be a very very large number

and uh a little or

uh can can in cost can be obtained by of lighting pair

was making based on temporal information

to all

uh all sets

of duplicate fragments

uh so give a and all you noting the that's

oh

this this actually

and all use than then all the

uh a the are of beans in the hash table

and the sake even

and no you know from this so the condition cost

a unit know to the sky or and all it's

so you

uh not

you you shouldn't enough

and uh

the besides size is two solutions there is another interesting

uh study which of flights of fragment only

to with that you paid for and pair

and the

but

in your two and P

uh

which is very efficient

but the because the single operation cost

oh with the

fragment growing straight is

five so the overall process time in this case

almost

a that in the previous of

so

in this study you propose applying a second stage had she to that duplicate that paris

so that's the computer cost

can be in your two

and P

we uh a lower single operation

a you're of B we got each duplicate we can then to pair at the basic unit

and uh we propose

two hash

functions

uh to translate the temporal information into

in prince

so the first fingerprints is the temporal position

a more right

and in this case the just it is that to many it's so that uh uh uh as get of

a fragments

can be a a mapped to the same it is the neighbouring ring finger

and uh the second you print is the temporal interval

between the two of fragment

and uh the is that second

and based on these

to different

all pairs of

uh

a duplicate for not pairs

i insert it into a two dimensional hash table

and the

uh but doing so that duplicate frame paris

with high temporal

is this since C can be ultimate sent into the same

B

so that the time-consuming cameras making can be

avoid

and the to detect a the high temporal sit

we

a a used would uh

recurrence hashing histogram

from the hash table

and uh because that you hate

a a friend of pairs with fight "'em" consistency

have been

but same boat into the same B so this in normally

form a local maxima

uh in this

it

and in this case the been embedding

uh indicates that temporal duration of duplicate

uh second

so you eight

sick is can be easily detected by

uh searching for local mixing

from

uh hashing kids

so this fall for the explanation of the proposed are reason

and uh it's very simple but it's

because we

didn't you ever making on this yeah

each you face and here

and is that um

hash table

so that can see in if the be note and B which is much

a lower than that of related stuff

and we

uh a better at the actress

oh

the proposed a reason by using a ten hour as

and also uh one man

stream and uh five years being were used for evaluating the vision

and uh

we i was

you in both C

and frame level

the sequence level really phase how this side

our

yeah

uh detect and the identify the commercial segments

and of the frame that

in right

uh for precise at that

are can local

a commercial sick

for example four

from each frame the sequence start

and the uh to be trained the sequence in

of the results uh some right

this table

uh we implemented to state of the art that studies for comparison

the first one for a

right

i five ring of light

uh

pairwise matching matching to with a duplicate kate for the pair

and the second one proposed by green

uh

uh of lights

uh pairwise matching to

uh

all sets of right

or we can say all

the nonzero the overall

hash

it's

uh

okay

and uh

able T are straight here in case our camp roll

recurrence hashing of

and a B here in the case of video to mine

and uh for the statistics

uh a P R F

a respectively precision recall or and F one score

uh the sub fix

S

and uh

uh the subjects as

means the frame the sequence table and F for the frame level

find the T here in place of for that simple

so from this table we can see that to

uh our or

uh uh of formant

the baseline

all right

and and especially the much sort of the time

uh them straight i

uh oh

the

besides also implemented a an existing

all you hashing technique for

for reason

and again in this case

uh the

propose are is out of from that the uh baseline

uh uh for all current you have

uh

well yeah i've got introduce that's uh this baseline of light

the frame

uh a role in speech to that you kate

a fragment pair

and uh is not that

uh up to this point to we

in D V really of like a lot to the video and audio stream

so one question here

what we have to you be in the great

these two streams

uh so

uh the we and audio streams can and the reading

uh it

frame level uh in the print the able or is

and for efficiency and the scalability

uh reason we propose in the region

uh and integration add to that is a

so from these table you of the of that's to the sequence level rick or

the sequence that we also almost

one hundred present in both places

so this is my a to you in section variation to combine the detected a commercial second

and that would or the

uh rate of false alarms

uh

results in the number all misses

and on that of the vision is that uh a uh in the case of frame level of innovation

the preceding a is your T five or then the recall in both we do and

all of is

and this of this uh i by task

a union

vision

to combine the detected

commercials

a frame

and uh

so

uh

the results a some fries in this table and we can see that's to

uh the

six as the devil it once for you to

uh ninety eight point one percent

and the frame level if one score in

uh

intro two

ninety seven one four was

uh so

the them and without yeah demonstrated to the vector

oh the proposed uh in separation street

oh whether be applied

uh the i reason to the one month we stream

and the process and find was this and fifty

uh i it's for we do and this then

uh forty to me

uh for audio

so this again them stated that the height of regions

oh

our

find the we applied to

a of you applied our our isn't to a five here year string

this stream was divide

uh in into sixty one month sub streams

and uh our our reason was in D V do already of to each sub string

and we

for formant uh how low computing

and uh

conducted a the recipes

for used fifteen months

for spread

and the the final process time was this and twenty one

oh

you be that the person this you will be don't can see

the commercial mining

commercial detection

we just some of the

to take cost

uh at least to five

five years for us to

so

what i want to say

our our

yeah

so we also conducted some

statistics

for each detected most

so

uh

we hope this that he things could be have

market research

uh and uh

commercial producers and uh uh complete company

so

this example is a beer promotion

uh for this he's around the horizontal axis indicates the final day

and the was co wine case the for a

three

so from this speaker we can see that a for this commercial there was

no broadcast from two em to yeah yeah

and actually this very

somehow somehow go to stand

so i of another one

in this case

the

horizontal axis

also in case that i'm of that

but the what cool one indicates a they a week from a send a man they

to set a

so we can have so of a real data

shape from

actually when be for long as before

a might a

but actually know

uh this very yeah data

shape was also from from all other

i'll call related commercials

and we group of this find a be found the reason

actually this is because in japan

there is of what entry restriction

reach probably bits

the pro house

uh of code or oracle

our whole commercials from

a five yeah

two

five P M

on the

and from five em to you have i am

get

so this is one of them

and a example is of she's actually

and uh we can see that this my thought was

most

a to be row house

a a wrong

five T M

actually a

so this i believe D V believe this is because

uh

five P M is of fine or the white

i i with two

what in or

and the commercial have them

uh no don't need to just by or Z

or something

and the the third example is a

how a commercial

and uh we can also the time zones is high

a broad cross frequency

uh for instance uh problem

six the M two

from six am to

eight A and than most fathers

having reference

and the

a

scalable clock we that stop

and uh after six P M

uh

when most fathers have finished their work and watching T V

um

so

uh actually these observations can also be observed in this figure

and the one explanation of this could be the car

is

okay it towards uh

uh a rather than you know five

and the not the explanation for be

i know you most uh uh a japanese and the father Y

i mean you in the money so the card either

a a pretty for most

uh popular T V vol

for

mail than female what

uh

so so that time limitation i'm going to us keep

the can i like

the

this will you for a kind of change

we

a a i have a as we can jump then use a and not advertising companies going to use your

technique need to to do some the marketing research

uh

currently be no yeah but to be a a considering to contact low companies

and C uh better the are interested in our research

and a normally i

i i i don't think from things are interested T in this but uh

uh actually there are more

some commercial producers

uh who

uh

that would interest in this work and the contract cost we are considering each a about the corporation of the

collaboration

uh

true

maybe we can provide some

uh so is

uh for

so for so and i is or something like that

or most a search yeah

thank you the set point is stuff i i'm standing are doing the commercial we you pattern and discovery right

so uh i think to extend you want to discover some out types also deals

something like a

um

suppose we you all some

do use we do

yeah i and the should be do is a very special types of would be to with a specialised up

at patterns

uh you mean

for detecting is

duplicate be deal yeah all other oh i we maybe

um of L out to depends on the type of that you could be was that you want to detect

for instance

uh in case of sports D also uh

suppose we deal in "'cause" sparse we do they are some rib or cost of the sports aims for video

uh of course

a lot of channels that means in this case the the

the but also almost same but in the news broadcast

the the the the

T V then use program produce a make at some

uh a oh out or

text

captions in to the B also that

the fingerprinting based

straight she is not suitable for in this case

and more robust

uh

technique

is needed so this why i i

i pointed out that that's this work is quite difficult is quite different from

uh do near kate

a detection it's a kind of exact duplicate detection

okay i to so much

thank you