补丁的汇编看不懂, 我也就是做些编辑的工作吧,
在2007年,comcat的这个补丁, 某些指令的写法跟2008年的binutils的龙芯2有些不同,
我在 http://forum.openrays.org/read-htm-tid-3808.html 找到一篇comcat写的汇编清单,
然后对照binutils-2.20.1里面的as的op-mips.c 可以找到汇编指令的不同之处,
比如在补丁中用到fxor, D,V,T, 0x47800002, 0xffe0003f
在binutils-2.20.1中是{"xor", "D,S,T", 0x47800002, 0xffe0003f, RD_S|RD_T|WR_D|FP_D, 0, IL2E } IL2E是loongson2e
于是,需要修改几十处, fxor -> xor
其他还有一些格式上的修改, 可以参照 libavcoder/ps2 的mmi扩展来修改
在mpegvideo_loongson2.c的最后有一行被我注释掉了, 这应该是某个函数的loongson2替换, 现在格式或者函数名对不上了, 暂时取消,以后再修改测试
// draw_edges = draw_edges_loongson2;
在原补丁中,使用mips3的汇编,就可以使用loongson2的加速指令, 在现在必须使用loongson2f汇编, 所以,所有指定mips3汇编的地方都去掉了, 而依靠gcc的参数 -march=loongson2f来启用
所有的godson的字符串改成了loongson
附件分别是修改前和修改后的2个补丁
===========================================================
以下是comcat写的文档
[quote]
发布龙芯2E多媒体指令分析、测试文档
因为没有公开的多媒体指令文档,这个让要做多媒体优化的弟兄们比较郁闷
手头刚好有打了龙芯补丁的汇编器as, 于是将龙芯补丁提取出来,稍作整理,得到了龙芯扩展的指令列表:
/* godson2 extensions */
faddu, D,V,T, 0x45800000, 0xffe0003f
for, D,V,T, 0x45a00000, 0xffe0003f
fadd, D,V,T, 0x45c00000, 0xffe0003f
fdadd, D,V,T, 0x45e00000, 0xffe0003f
pavgh, D,V,T, 0x46400000, 0xffe0003f
pavgb, D,V,T, 0x46600000, 0xffe0003f
pmaxsh, D,V,T, 0x46800000, 0xffe0003f
pminsh, D,V,T, 0x46a00000, 0xffe0003f
pmaxub, D,V,T, 0x46c00000, 0xffe0003f
pminub, D,V,T, 0x46e00000, 0xffe0003f
paddsh, D,V,T, 0x47000000, 0xffe0003f
paddush, D,V,T, 0x47200000, 0xffe0003f
paddh, D,V,T, 0x47400000, 0xffe0003f
paddw, D,V,T, 0x47600000, 0xffe0003f
paddsb, D,V,T, 0x47800000, 0xffe0003f
paddusb, D,V,T, 0x47a00000, 0xffe0003f
paddb, D,V,T, 0x47c00000, 0xffe0003f
paddd, D,V,T, 0x47e00000, 0xffe0003f
fsubu, D,V,T, 0x45800001, 0xffe0003f
pasubub, D,V,T, 0x45a00001, 0xffe0003f
fsub, D,V,T, 0x45c00001, 0xffe0003f
fdsub, D,V,T, 0x45e00001, 0xffe0003f
pcmpeqw, D,V,T, 0x46400001, 0xffe0003f
pcmpgtw, D,V,T, 0x46600001, 0xffe0003f
pcmpeqh, D,V,T, 0x46800001, 0xffe0003f
pcmpgth, D,V,T, 0x46a00001, 0xffe0003f
pcmpeqb, D,V,T, 0x46c00001, 0xffe0003f
pcmpgtb, D,V,T, 0x46e00001, 0xffe0003f
psubsh, D,V,T, 0x47000001, 0xffe0003f
psubush, D,V,T, 0x47200001, 0xffe0003f
psubh, D,V,T, 0x47400001, 0xffe0003f
psubw, D,V,T, 0x47600001, 0xffe0003f
psubsb, D,V,T, 0x47800001, 0xffe0003f
psubusb, D,V,T, 0x47a00001, 0xffe0003f
psubb, D,V,T, 0x47c00001, 0xffe0003f
psubd, D,V,T, 0x47e00001, 0xffe0003f
fsll, D,V,T, 0x45800002, 0xffe0003f
fdsll, D,V,T, 0x45a00002, 0xffe0003f
pextrh, D,V,T, 0x45c00002, 0xffe0003f
pmaddhw, D,V,T, 0x45e00002, 0xffe0003f
psllw, D,V,T, 0x46400002, 0xffe0003f
psllh, D,V,T, 0x46600002, 0xffe0003f
pmullh, D,V,T, 0x46800002, 0xffe0003f
pmulhh, D,V,T, 0x46a00002, 0xffe0003f
pmuluw, D,V,T, 0x46c00002, 0xffe0003f
pmulhuh, D,V,T, 0x46e00002, 0xffe0003f
pshufh, D,V,T, 0x47000002, 0xffe0003f
packsswh, D,V,T, 0x47200002, 0xffe0003f
packsshb, D,V,T, 0x47400002, 0xffe0003f
packushb, D,V,T, 0x47600002, 0xffe0003f
fxor, D,V,T, 0x47800002, 0xffe0003f
fnor, D,V,T, 0x47a00002, 0xffe0003f
fand, D,V,T, 0x47c00002, 0xffe0003f
pandn, D,V,T, 0x47e00002, 0xffe0003f
fsrl, D,V,T, 0x45800003, 0xffe0003f
fdsrl, D,V,T, 0x45a00003, 0xffe0003f
fsra, D,V,T, 0x45c00003, 0xffe0003f
fdsra, D,V,T, 0x45e00003, 0xffe0003f
psrlw, D,V,T, 0x46400003, 0xffe0003f
psrlh, D,V,T, 0x46600003, 0xffe0003f
psraw, D,V,T, 0x46800003, 0xffe0003f
psrah, D,V,T, 0x46a00003, 0xffe0003f
punpcklwd, D,V,T, 0x46c00003, 0xffe0003f
punpckhwd, D,V,T, 0x46e00003, 0xffe0003f
punpcklhw, D,V,T, 0x47000003, 0xffe0003f
punpckhhw, D,V,T, 0x47200003, 0xffe0003f
punpcklbh, D,V,T, 0x47400003, 0xffe0003f
punpckhbh, D,V,T, 0x47600003, 0xffe0003f
pinsrh_0, D,V,T, 0x47800003, 0xffe0003f
pinsrh_1, D,V,T, 0x47a00003, 0xffe0003f
pinsrh_2, D,V,T, 0x47c00003, 0xffe0003f
pinsrh_3, D,V,T, 0x47e00003, 0xffe0003f
fseq, S,T, 0x46800032, 0xffe007ff
fseq1, S,T, 0x46a00032, 0xffe007ff
fsltu, S,T, 0x4680003c, 0xffe007ff
fslt, S,T, 0x46a0003c, 0xffe007ff
fsleu, S,T, 0x4680003e, 0xffe007ff
fsle, S,T, 0x46a0003e, 0xffe007ff
biadd, D,V, 0x46800005, 0xffff003f
pmovmskb, D,V, 0x46a00005, 0xffff003f
/* godson2 paired single */
add.gps, D,V,T, 0x45600000, 0xffe0003f
sub.gps, D,V,T, 0x45600001, 0xffe0003f
mul.gps, D,V,T, 0x45600002, 0xffe0003f
abs.gps, D,V, 0x45600005, 0xffff003f
mov.gps, D,S, 0x45600006, 0xffff003f
neg.gps, D,V, 0x45600007, 0xffff003f
c.f.gps, S,T, 0x45600030, 0xffe007ff
c.un.gps, S,T, 0x45600031, 0xffe007ff
c.eq.gps, S,T, 0x45600032, 0xffe007ff
c.ueq.gps, S,T, 0x45600033, 0xffe007ff
c.olt.gps, S,T, 0x45600034, 0xffe007ff
c.ult.gps, S,T, 0x45600035, 0xffe007ff
c.ole.gps, S,T, 0x45600036, 0xffe007ff
c.ule.gps, S,T, 0x45600037, 0xffe007ff
c.sf.gps, S,T, 0x45600038, 0xffe007ff
c.ngle.gps, S,T, 0x45600039, 0xffe007ff
c.seq.gps, S,T, 0x4560003a, 0xffe007ff
c.ngl.gps, S,T, 0x4560003b, 0xffe007ff
c.lt.gps, S,T, 0x4560003c, 0xffe007ff
c.nge.gps, S,T, 0x4560003d, 0xffe007ff
c.le.gps, S,T, 0x4560003e, 0xffe007ff
c.ngt.gps, S,T, 0x4560003f 0xffe007ff
于是边假设,边测试就有了这个文档。 看过MMX, SSE的兄弟应该对前面80条指令的大部分很眼熟的。
文档涵盖80条整型多媒体指令,尚有22条双单精度指令(Paired-Singl, 同时操作2个单精度浮点数)没有一一测试,因为
这些在一般的多媒体优化中用的较少,以后再熟悉吧。
文档附带了一些多媒体指令编程的小技巧,以Tips的条目形式列出。对多媒体中指令中的一些重要概念做了概要的说明。
文档从程序员的角度出发写的,不太喜欢一堆罗嗦的文字, 一些指令的操作一部分用图形示意,复杂点的就用伪代码表示了:)
因为很多指令实现的MMX的思想,对类MMX一看就知道的就没有去测试,绝大部分指令经测试,所有用到的测试程序在这里找: http://people.openrays.org/~comcat/ 有疑问的可以自己再测测。
可以供在龙芯平台做多媒体优化的兄弟参考.
文档共63页.
下载链接: http://people.openrays.org/~comcat/mydoc/godson2e.mmi.pdf
[/quote]
附件 | 大小 |
---|---|
ffmpeg-0.cvs20060823-godson2.mmi.patch.gz | 10.81 千字节 |
ffmpeg-0.5.1-loongson2mmi.patch.gz | 10.67 千字节 |